Thread: cast unsigned char* to (the default) signed char*

  1. #16
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by brewbuck
    Does the following link help
    Absolutely. Thanks. I think I can finally take it from here. I'll try to build my own just for fun and in the process better understand this concept of traits. However, if after enough testing deriving from char_traits<char> does not bring any problems, that will be the final solution.

    Most excellent. Thanks again.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #17
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Deriving from char_traits<char> doesn't sound like a good idea.

    Honestly, in my opinion, C++'s whole character handling is broken beyond repair. That's not much of a consolation to you, of course.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #18
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by CornedBee View Post
    Deriving from char_traits<char> doesn't sound like a good idea.
    I can't think of a reason why not.

  4. #19
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Because the members inherited have different signatures than those needed for char_traits<unsigned char>.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #20
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by CornedBee View Post
    Because the members inherited have different signatures than those needed for char_traits<unsigned char>.
    Hrm. Yeah. That sure is stupid. Methods in a "traits" class?

  6. #21
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Static methods, yes. That's what the traits class is for.

    OK, so by modern terminology it's a policy class.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #22
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    What exactly are you trying to DO with the string?

    If you're using Win32, I recommend you use the conversion function (MultiByteToWideChar) to get a UTF-16 string; the WinAPI can use UTF-16 (it's the native character set on anything Win2K and beyond).
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  8. #23
    Registered User
    Join Date
    May 2003
    Posts
    1,619
    Oh, and you absolutely can't just cast a UTF-8 string and expect it to properly display any characters outside of the ASCII values (0-127).

    Every character not in that range occupies a minimum of two bytes, and can be as many as 4 bytes. Any byte value between 0x80 and 0xFF (that is, any byte with the MSB set) is part of a two, three, or four byte character.

    Your best bet is to write an actual Unicode app; the Win32 API supports Unicode fully. Use UTF-16 (aka WCHAR, LPWCSTR, etc.) internally within your program, as that's the native character set of Windows and the character set it will accept all strings in, and then convert to and from UTF-8 when accessing the database.

    2. What if data loss becomes a concern? How can I handle this? The problem is that the character 'ó', for instance, is well within the capabilities of a signed char.
    Maybe. Maybe not. It certainly wouldn't work on my machine; it would not only incorrectly display that character, it would "eat" the following character assuming it to be the second byte of a multibyte Japanese character.

    It all depends on what character set the end user has set to default on their computer. Are they using Latin-1? Windows-1251? Windows-1252? Shift-JIS? Big5? EUC-KR? Unless you're specifically overriding the default character encoding, how Windows treats a char* string depends entirely on the language options the end user has set in their Control Panel. Some of those won't even HAVE the letter 'ó', and others will have it but encode it with a different value. Don't make assumptions that the user of your program has the same default code page you do.

    That's the beauty of Unicode, and why you should not "work around" it, but rather embrace it. Unicode is universal, it will work on my machine (default code page: Shift-JIS) just as easily as it works on yours.
    Last edited by Cat; 07-27-2007 at 03:32 AM.
    You ever try a pink golf ball, Wally? Why, the wind shear on a pink ball alone can take the head clean off a 90 pound midget at 300 yards.

  9. #24
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Probably that's my best bet. As you say, I need to embrace Unicode. And I'm sure that's the best advise one can give.
    I'm not prepared or willing to do it for now, though. I have other more emerging concerns; like learning C++ . And at my age, I no longer find it easy to multitask. I prefer the one thing at a time approach.

    Anyways, giving that there's no one-solution-fits-all to this issue. I decided to go for the next best thing; That is, lower my requirements to the bare essentials. I promise I'll sooner or later start to become interested in Unicode under C++. But until then...

    A recap:

    - SQLite text extracting funtions (there's actually two. But i'm only interested on the UTF-8 one) return a null-terminated const unsigned char*.
    - I need to translate this return value into a std::string.
    - My system defines char as signed char and doesn't implement basic_string<unsigned char>.
    - The code wants to be portable across systems.
    - The database is expected to contain only characters in the 0-128 ascii range. Conversion seems painless on this case. But...
    - I need to deal with those cases where exceptionally it may not.
    - The resulting std::string is expected to be used in every context where basic_string<char> is normally used; streams, for instance.
    - Programming for Unicode is not an option at this moment.

    Solution 1.
    reinterpret_cast<const char*>()

    Solution 2.
    typedef std::basic_char<unsigned char> utf8_string

    Solution 3.
    a class implementing:
    for each char in utf8 string
    . boost::numeric_cast<char>()

    Solution 1 is my current solution. The one I wish to not use. Solution 2 is appealing but seems to be more trouble than its worth and not possible to implement with streams. Or is it? Solution 3 is sexy but slow (in some contexts outside the C++ this is actually a good thing).

    What would you do?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  10. #25
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Implement a converter that does some minimal UTF-8 parsing: it should simply replace all UTF-8 characters by question marks.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 06-04-2009, 02:03 PM
  2. get keyboard and mouse events
    By ratte in forum Linux Programming
    Replies: 10
    Last Post: 11-17-2007, 05:42 PM
  3. Signed vs Unsigned int
    By osyrez in forum C++ Programming
    Replies: 18
    Last Post: 08-17-2006, 07:38 AM
  4. Obtaining source & destination IP,details of ICMP Header & each of field of it ???
    By cromologic in forum Networking/Device Communication
    Replies: 1
    Last Post: 04-29-2006, 02:49 PM
  5. can someone check this out and let me know ?
    By javaz in forum C Programming
    Replies: 5
    Last Post: 01-21-2002, 02:13 PM