C Board  

Go Back   C Board > Platform Specific Boards > Linux Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 08-09-2006, 08:18 AM   #1
Registered User
 
Join Date: Nov 2005
Posts: 20
How to convert raw text with accent to UTF8.

Hello,

I tried to extract string from text file with french accent and
convert it to wide character string UTF8 but problem appears while converting.

Function mbstowcs() returns me -1 as error value when it read "e" character
with accent.
The text is raw (only \r and \n as new line layout) typed on ms-windows
french lang. Typeface used is courrier new.

I think to save my source text file under UTF8 format but I do not how to deal
with the 0xFF 0xFE header of utf8 file. Is there a function like getline() that works with
UTF8 wide character file ?

Thank you.
Code:
#include <stdio.h>  //fopen()
#include <wchar.h>  //mbsrtowcs()
#include <locale.h> //setlocale()

FILE	*file_in;
FILE	*file_out_wide;
char	*ascii_in;
wchar_t wide_string_A[100];
int char_count=0;
int n;
char *kbufft;

int main(void)
{ 
 if(!setlocale(LC_ALL, "en_US.utf8")) return(1);
 file_in=fopen("./ascii_in.txt", "r");
 file_out_wide=fopen("./wide_out.txt", "w"); 
 //********** Get ascii string ******************** 
 getline(&ascii_in, &char_count, file_in); 
 //******* Remove new line and carriage return*****
 kbufft=&ascii_in[ strlen(ascii_in) -1];
 while( (*kbufft=='\r') || (*kbufft=='\n') )
      {
       *kbufft = '\0';
       --kbufft;
      } 
 //********* Convert ascii string to wide string****
 n= mbstowcs(&wide_string_A[0], &ascii_in[0], strlen(ascii_in)+1);
 printf("\n%ls\n", wide_string_A);
 printf("%d\n", n);
 //********* Write wide string to disk**************
 fputws(wide_string_A, file_out_wide); 
 
 fflush(file_in); fflush(file_out_wide);
 fclose(file_in); fclose(file_out_wide);
 free(ascii_in); 
 return(0);
}
intmail is offline   Reply With Quote
Old 08-09-2006, 10:47 AM   #2
and the hat of Jobseeking
 
Salem's Avatar
 
Join Date: Aug 2001
Location: The edge of the known universe
Posts: 21,710
> I think to save my source text file under UTF8 format but I do not how to deal
> with the 0xFF 0xFE header of utf8 file.
This isn't an ASCII file, nor is it UTF-8 encoded.

I'm not sure what you're trying to acheive, but every single char in that file is encoded in TWO bytes. Wide char to UTF-8 seems a plausable thing to do.
If you just have normal ASCII text in there, then you'll see that every other byte is actually zero (this is not good news for your strlen() calls).


> fflush(file_in);
fflush() isn't defined for input files.
__________________
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.

Salem is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert a text file to a binary file Cyber Kitten C Programming 16 02-04-2002 08:53 AM
Validating the contents of a char buffer mattz C Programming 3 12-09-2001 06:21 PM
Moving to the next structure array mattz C Programming 2 11-30-2001 03:43 PM
Structure problem mattz C Programming 10 11-30-2001 01:19 PM
Outputting String arrays in windows Xterria Game Programming 11 11-13-2001 07:35 PM


All times are GMT -6. The time now is 03:31 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22