How to convert raw text with accent to UTF8.

This is a discussion on How to convert raw text with accent to UTF8. within the Linux Programming forums, part of the Platform Specific Boards category; Hello, I tried to extract string from text file with french accent and convert it to wide character string UTF8 ...

  1. #1
    Registered User
    Join Date
    Nov 2005
    Posts
    20

    How to convert raw text with accent to UTF8.

    Hello,

    I tried to extract string from text file with french accent and
    convert it to wide character string UTF8 but problem appears while converting.

    Function mbstowcs() returns me -1 as error value when it read "e" character
    with accent.
    The text is raw (only \r and \n as new line layout) typed on ms-windows
    french lang. Typeface used is courrier new.

    I think to save my source text file under UTF8 format but I do not how to deal
    with the 0xFF 0xFE header of utf8 file. Is there a function like getline() that works with
    UTF8 wide character file ?

    Thank you.
    Code:
    #include <stdio.h>  //fopen()
    #include <wchar.h>  //mbsrtowcs()
    #include <locale.h> //setlocale()
    
    FILE	*file_in;
    FILE	*file_out_wide;
    char	*ascii_in;
    wchar_t wide_string_A[100];
    int char_count=0;
    int n;
    char *kbufft;
    
    int main(void)
    { 
     if(!setlocale(LC_ALL, "en_US.utf8")) return(1);
     file_in=fopen("./ascii_in.txt", "r");
     file_out_wide=fopen("./wide_out.txt", "w"); 
     //********** Get ascii string ******************** 
     getline(&ascii_in, &char_count, file_in); 
     //******* Remove new line and carriage return*****
     kbufft=&ascii_in[ strlen(ascii_in) -1];
     while( (*kbufft=='\r') || (*kbufft=='\n') )
          {
           *kbufft = '\0';
           --kbufft;
          } 
     //********* Convert ascii string to wide string****
     n= mbstowcs(&wide_string_A[0], &ascii_in[0], strlen(ascii_in)+1);
     printf("\n%ls\n", wide_string_A);
     printf("%d\n", n);
     //********* Write wide string to disk**************
     fputws(wide_string_A, file_out_wide); 
     
     fflush(file_in); fflush(file_out_wide);
     fclose(file_in); fclose(file_out_wide);
     free(ascii_in); 
     return(0);
    }

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,422
    > I think to save my source text file under UTF8 format but I do not how to deal
    > with the 0xFF 0xFE header of utf8 file.
    This isn't an ASCII file, nor is it UTF-8 encoded.

    I'm not sure what you're trying to acheive, but every single char in that file is encoded in TWO bytes. Wide char to UTF-8 seems a plausable thing to do.
    If you just have normal ASCII text in there, then you'll see that every other byte is actually zero (this is not good news for your strlen() calls).


    > fflush(file_in);
    fflush() isn't defined for input files.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Convert a text file to a binary file
    By Cyber Kitten in forum C Programming
    Replies: 16
    Last Post: 02-04-2002, 07:53 AM
  2. Validating the contents of a char buffer
    By mattz in forum C Programming
    Replies: 3
    Last Post: 12-09-2001, 05:21 PM
  3. Moving to the next structure array
    By mattz in forum C Programming
    Replies: 2
    Last Post: 11-30-2001, 02:43 PM
  4. Structure problem
    By mattz in forum C Programming
    Replies: 10
    Last Post: 11-30-2001, 12:19 PM
  5. Outputting String arrays in windows
    By Xterria in forum Game Programming
    Replies: 11
    Last Post: 11-13-2001, 06:35 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21