Thread: UTF-16 and sprintf

  1. #1
    Registered User
    Join Date
    Jan 2013
    Posts
    7

    Question UTF-16 and sprintf

    Hi,

    I have a "template" string, into which I insert values. Typicaly xml content.
    For instance:
    Code:
    static const char *template = "<value>%s</value>";
    char *value = "blabla";
    buf [256];
    int ret = sprintf(buf, template, value);
    This has been working fine but now all of a sudden I need to insert a value that is encoded in UTF-16BE. So my value string "Hello" for instance looks something like hex: 00 48 00 65 00 6c 00 6c 00 6f

    It appears that the 00 is blocking the string from being further read. Maybe this is the internal \0 representation? I say this because for test purposes I tried using little endian (utf-16LE) and I was able to get the 'H' but then nothing after that.

    As a java developer trying to fix an old peice of C code, im a bit confused as how to handle this

    Any help would be appreciated, thanks!

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    You will have to convert the UTF-16BE into the encoding that the XML expects, eg UTF-8: <?xml version="1.0" encoding="UTF-8"?>

    You could do it yourself or use library API's to do it for you. What OS(s) does this need to run on?

    gg

  3. #3
    Registered User
    Join Date
    Jan 2013
    Posts
    7
    Unfortunately the UTF-16 is a requirement, so ill be sending it out with encoding="UTF-16"
    Its running on linux, but I'm sending it out as a body of an http request, if that changes anything...

    I am not sure if that means I can put a UTF-16 value in a UTF-8 encoded xml document, or the entire document and value(s) must be UTF-16. The problem is that I can't get either one to work, as its ignoreing my utf-16 encoded values

  4. #4
    Registered User
    Join Date
    Apr 2013
    Posts
    1,658
    For microsoft compilers, _tprintf(), _stprintf(), and _TCHAR will be ascii or unicode-16 depending on compile options. wprintf(), swprintf() and wchar_t are always unicode 16 bit characters. A unicode 16 character literal is prefixed with L: L'a' or L"example".
    Last edited by rcgldr; 09-17-2013 at 03:38 PM.

  5. #5
    Registered User
    Join Date
    Jan 2013
    Posts
    7
    Thanks rcgldr, someone previously pointed me to swprintf, but it seems this only works when both my "template" and my "value" are wide chars. I will make a duplicate wide version of the template if I have to I guess, but it would be great to figure out how to get the wide chars into my ascii-utf8 template. I'm thinking at this point ill just write my own routine to copy the bytes...

  6. #6
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    The size and encoding of wchar_t's is implementation defined. So you should stay away from that if you'r not on Windows (where it is well-defined as UTF16-LE). If you must deal with UTF16 codepoints, just use a type that you know is 2 bytes in size, and forget about using sprintf or swprintf. Also, if you know the codepoints are in the machine's native byte-order, you could use htons() to ensure they are in network byte order (BE) before transport.

    >> Unfortunately the UTF-16 is a requirement, so ill be sending it out with encoding="UTF-16"
    Why is UTF-16, as the transport encoding, a requirement? Is that the only encoding the "receiver" supports? UTF8 would be much easier to deal with.

    So your HTTP header contains "charset = csUnicode"?
    http://www.w3.org/Protocols/rfc2616/...c3.html#sec3.4
    http://www.freesoft.org/CIE/RFC/1700/20.htm

    On Linux, you may find the iconv library useful for performing any conversions.
    http://linux.die.net/man/3/iconv_open
    Sample: http://www.gnu.org/software/libc/man...-Examples.html

    gg

  7. #7
    Registered User
    Join Date
    Jan 2013
    Posts
    7
    Ah good information, I was just wokring on a wchar_t possible solution..
    I am using iconv for conversion already, the real issue I am having is merging the template with the value.
    The http header will contain whatever the xml encoding is set as, probably UTF-16BE seems to be the only thing that will work for my requirements, so bothe content typeheader and xml encoding value will be set to that, and I will encode the full xml doc in utf-16be.

    At the end of the day, I have a string literal template, and I have a utf-16BE char*. I need to insert the char* into the template where the %s is, and the resulting string needs to be in UTF-16BE encoding.
    I can re-code any of the strings, but the only way I know how to insert easily into the template is using sprintf, but this doesn't seem to work for wide chars. So I was going to make my template use wide chars and try using swprintf, but now you say thats a bit sketchy on linux. I might just have to manipulate the bytes myself at this point, I just can't believe there isn't an easy way to do this in C.... so painful....

  8. #8
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    If you "normalize" your template and input strings to UTF8 first, then you can use sprintf() to build things up as UTF8. Then just before transport, convert UTF8 into UTF16BE.
    In other words, use a consistent "internal encoding" that easy to use. Once the data is ready to become "external", convert as needed.

    gg

  9. #9
    Registered User
    Join Date
    Jan 2013
    Posts
    7
    Yes, that's the same conclusion I just came to. Unfortunately it took me 2 days instead of right away for you Thanks! That will work I think...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 5
    Last Post: 02-12-2010, 08:02 PM
  2. sprintf
    By Bladactania in forum C Programming
    Replies: 1
    Last Post: 02-13-2009, 12:08 PM
  3. sprintf
    By bobthebullet990 in forum C Programming
    Replies: 7
    Last Post: 08-31-2007, 03:44 AM
  4. sprintf?
    By chico1st in forum C Programming
    Replies: 9
    Last Post: 08-30-2007, 11:46 AM
  5. sprintf(len+id ?
    By ReggieBE in forum C Programming
    Replies: 2
    Last Post: 05-25-2007, 07:46 PM