![]() |
| | #1 |
| Registered User Join Date: Nov 2005
Posts: 20
| How to convert raw text with accent to UTF8. I tried to extract string from text file with french accent and convert it to wide character string UTF8 but problem appears while converting. Function mbstowcs() returns me -1 as error value when it read "e" character with accent. The text is raw (only \r and \n as new line layout) typed on ms-windows french lang. Typeface used is courrier new. I think to save my source text file under UTF8 format but I do not how to deal with the 0xFF 0xFE header of utf8 file. Is there a function like getline() that works with UTF8 wide character file ? Thank you. Code:
#include <stdio.h> //fopen()
#include <wchar.h> //mbsrtowcs()
#include <locale.h> //setlocale()
FILE *file_in;
FILE *file_out_wide;
char *ascii_in;
wchar_t wide_string_A[100];
int char_count=0;
int n;
char *kbufft;
int main(void)
{
if(!setlocale(LC_ALL, "en_US.utf8")) return(1);
file_in=fopen("./ascii_in.txt", "r");
file_out_wide=fopen("./wide_out.txt", "w");
//********** Get ascii string ********************
getline(&ascii_in, &char_count, file_in);
//******* Remove new line and carriage return*****
kbufft=&ascii_in[ strlen(ascii_in) -1];
while( (*kbufft=='\r') || (*kbufft=='\n') )
{
*kbufft = '\0';
--kbufft;
}
//********* Convert ascii string to wide string****
n= mbstowcs(&wide_string_A[0], &ascii_in[0], strlen(ascii_in)+1);
printf("\n%ls\n", wide_string_A);
printf("%d\n", n);
//********* Write wide string to disk**************
fputws(wide_string_A, file_out_wide);
fflush(file_in); fflush(file_out_wide);
fclose(file_in); fclose(file_out_wide);
free(ascii_in);
return(0);
}
|
| intmail is offline | |
| | #2 |
| and the hat of Jobseeking Join Date: Aug 2001 Location: The edge of the known universe
Posts: 21,710
| > I think to save my source text file under UTF8 format but I do not how to deal > with the 0xFF 0xFE header of utf8 file. This isn't an ASCII file, nor is it UTF-8 encoded. I'm not sure what you're trying to acheive, but every single char in that file is encoded in TWO bytes. Wide char to UTF-8 seems a plausable thing to do. If you just have normal ASCII text in there, then you'll see that every other byte is actually zero (this is not good news for your strlen() calls). > fflush(file_in); fflush() isn't defined for input files. |
| Salem is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Convert a text file to a binary file | Cyber Kitten | C Programming | 16 | 02-04-2002 08:53 AM |
| Validating the contents of a char buffer | mattz | C Programming | 3 | 12-09-2001 06:21 PM |
| Moving to the next structure array | mattz | C Programming | 2 | 11-30-2001 03:43 PM |
| Structure problem | mattz | C Programming | 10 | 11-30-2001 01:19 PM |
| Outputting String arrays in windows | Xterria | Game Programming | 11 | 11-13-2001 07:35 PM |