Thread: How to change a string encoding from utf-8 to iso-8859-7

  1. #1
    Registered User
    Join Date
    Sep 2004
    Posts
    1

    How to change a string encoding from utf-8 to iso-8859-7

    Hi,

    I am using Ingres 6.4 (db) and C.
    At this moment the program I have developed retrieves a string data from an Ingres 6.4 db and inserts it to another database (Ingres 6.4).

    The data is stored in the database in utf-8 encoding.


    What I want to do is retrieve these data and before storing them in the other database, convert them from utf-8 to iso-8859-7.

    Could anyone help?

    Thank you.

  2. #2
    Registered User
    Join Date
    Jun 2004
    Posts
    84
    well... don't know about iso-8859-7, but wchar.h do have wctob() function. Might be worth a try.
    http://msdn.microsoft.com/library/de...char_wctob.asp

  3. #3
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    wctob() is for wide character strings. iso-8859-7 is a regular 8 bit string that is for greek letters (At least I think its greek).

    I did a quick search, and found something that does what you are asking in perl. I dont know perl, so I cant convert in to C, but it's pretty short, and I'll bet you can find someone to do the conversion.

    Code:
    while(<>) {
    	s/\316([\200-\277])/pack("C",unpack("C",$1)+48)/eg;
    	s/\317([\200-\217])/pack("C",unpack("C",$1)+112)/eg;
    	print;
    }

  4. #4
    Registered User
    Join Date
    Jun 2004
    Posts
    84
    wctob() is for wide character strings. iso-8859-7 is a regular 8 bit string that is for greek letters (At least I think its greek).
    yes, it is greek. so? how about setlocale?
    Code:
    #include <wchar.h>
    #include <locale.h>
    
    // TODO: get some strings
    setlocale(LC_CTYPE, "greek.1253");
    for (i = 0; i < wcslen(szFrom); i++)
      szTo[i] = wctob(szFrom[i]);
    szTo[i] = '\0';
    EDIT: I'm not sure about code page 1253 (I guess it's Windows). It seems that Greek is on codepages 813 (ISO) and 869 (DOS)
    Last edited by iwabee; 09-10-2004 at 03:48 AM.

  5. #5
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    UTF-8 is an encoding method for Unicode.
    ISO-8859-7 specifies what glyph should be seen for a set of character codes.

    So you don't convert one to the other....

    You can decode UTF-8 stream into Unicode using MultiByteToWideChar() while specifying CP_UTF8.

    gg

  6. #6
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    To get from UTF-8 to ISO-8859-7 you need to do a round trip. First from UTF-8 to native unicode and then from native unicode to ISO-8859-7. However, UTF-8 potentially allows over a million(currently only about 60000 are in use) code points from almost all modern scripts while ISO-8859-7 allows only 256. Therefore, you should be aware of data loss and try to keep your data in unicode if possible.

    Assuming Windows, you can use the round trip function provided in this post.
    Code:
    /* Note: 28597 is the Windows code page identifier for ISO 8859-7 Greek. */
    myGreekStr = MBStr2MBStr(myUTF8Str, CP_UTF8, 28597);
    
    /* Other code here. */
    
    /* Release memory for myGreekStr when you are done. */
    free(myGreekStr);

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  2. We Got _DEBUG Errors
    By Tonto in forum Windows Programming
    Replies: 5
    Last Post: 12-22-2006, 05:45 PM
  3. Message class ** Need help befor 12am tonight**
    By TransformedBG in forum C++ Programming
    Replies: 1
    Last Post: 11-29-2006, 11:03 PM
  4. problems with overloaded '+' again
    By Brain Cell in forum C++ Programming
    Replies: 9
    Last Post: 04-14-2005, 05:13 PM
  5. Another overloading "<<" problem
    By alphaoide in forum C++ Programming
    Replies: 18
    Last Post: 09-30-2003, 10:32 AM