Thread: unicode string manipulation & check in C (whar.h)

  1. #1
    Registered User
    Join Date
    Jan 2011
    Posts
    2

    unicode string manipulation & check in C (whar.h)

    Hi guys!

    I am developing a unicode application in C/Win32 using wchar.h.

    I want to do some string manipulation including concatenation of strings, string to number and vice versa conversions, in-string searching as well as string checks like isnumber, isalphanumeric and stuff like that. Are there any functions inside wchar.h or is there any library that have such kind of string functions?

    Thanks in advance

  2. #2
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413

  3. #3
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    From Stack Overflow:

    UTF-8 is specially designed so that many byte-oriented string functions continue to work or only need minor modifications.

    C's strstr function, for instance, will work perfectly as long as both its inputs are valid, null-terminated UTF-8 strings. strcpy works fine as long as its input string starts at a character boundary (for instance the return value of strstr).

    So you may not even need a separate library!

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    So all the things that you would want from ctype.h are then in wctype.h.

  5. #5
    Registered User
    Join Date
    Jan 2011
    Posts
    2
    Unicode and UTF8 are two different things. Unicode is a letter to 2-byte integer mapping and UTF8 is a way to represent that 2-byte integer. I don't really know if UTF8 is used when i include unicode support in my Win32 applications.

    What I want is to find some kind of functions which do the aforementioned stuff.

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by orestis1987 View Post
    Unicode and UTF8 are two different things. Unicode is a letter to 2-byte integer mapping and UTF8 is a way to represent that 2-byte integer. I don't really know if UTF8 is used when i include unicode support in my Win32 applications.

    What I want is to find some kind of functions which do the aforementioned stuff.
    You need to use wchar if you are using some encoding where every single character takes up more than one byte. If you are using an encoding where some characters are single byte and some characters are single bytes (like UTF8), then you don't want wchars.

    (And even though you are on Windows, you can still type "man wchar.h" and similar things into your browser search box to access the man pages for these headers.)

  7. #7
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    Unicode is a standard. Among other things, it assigns a code point to all characters covered by the standard.

    UTF8 is a method for encoding Unicode characters where each code unit is 8 bits.

    On Windows, wchar_t's are encoded as UTF16LE. That's 16 bit code units with little endian byte order. Most *nix flavors use UTF32 with native byte order.

    http://msdn.microsoft.com/en-us/library/t9zea13t.aspx
    http://msdn.microsoft.com/en-us/library/f0151s4x.aspx
    http://msdn.microsoft.com/en-us/library/0heszx3w.aspx

    The MS-CRT does not support UTF8. It only supports codepage encoded char (8bit) strings.

    gg

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Inheritance Hierarchy for a Package class
    By twickre in forum C++ Programming
    Replies: 7
    Last Post: 12-08-2007, 04:13 PM
  2. Message class ** Need help befor 12am tonight**
    By TransformedBG in forum C++ Programming
    Replies: 1
    Last Post: 11-29-2006, 11:03 PM
  3. Program using classes - keeps crashing
    By webren in forum C++ Programming
    Replies: 4
    Last Post: 09-16-2005, 03:58 PM
  4. class object manipulation
    By guda in forum C++ Programming
    Replies: 2
    Last Post: 10-09-2004, 10:43 AM
  5. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM