Thread: How to convert string with double high/wide characters to normal string [VC++6]

  1. #1
    Registered User
    Join Date
    Aug 2009
    Posts
    5

    Question How to convert string with double high/wide characters to normal string [VC++6]

    My application typically recieves a string in the following format:
    " Item $5.69 "

    Some contants I always expect:
    - the LENGHT always 20 characters
    - the start index of the text always [5]
    - and most importantly the index of the DECIMAL for the price always [14]
    In order to identify this string correctly I validate all the expected contants listed above ....

    Some of my clients have now started sending the string with Doube-High / Double-Wide values (pair of characters which represent a single readable character) similar to the following:
    " Item $x80x90.x81x91x82x92 "

    For testing I simply scan the string character-by-character, compare char[i] and char[i+1] and replace these pairs with their corresponding single character when a match is found (works fine) as follows:

    Code:
    for (int i=0; i < sData.length(); i++)
    {
       char ch = sData[i] & 0xFF;
       char ch2 = sData[i+1] & 0xFF;
    
       if (ch == '\x80' && ch2 == '\x90')
          zData.replace("\x80\x90", "0");
       else if (ch == '\x81' && ch2 == '\x91')
          zData.replace("\x81\x91", "1");
       else if (ch == '\x82' && ch2 == '\x92')
          zData.replace("\x82\x92", "2");
       ...
       ...
       ...
    }
    But the result is something like this:
    " Item $5.69 "
    Notice how this no longer matches my expectation: the lenght is now 17 (instead of 20) due to the 3 conversions and the decimal is now at index 13 (instead of 14) due to the conversion of the "5" before the decimal point.


    Ideally I would like to convert the string to a normal readable format keeping the constants (length, index of text, index of decimal) at the same place (so the rest of my application is re-usable) ... or any other suggestion (I'm pretty much stuck with this)... Is there a STANDARD way of dealing with these type of characters?

    Any help would be greatly appreciated, I've been stuck on this for a while now ...
    Thanks,

  2. #2
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I see several issues:
    Your clients are somehow sending you presumably UTF8 data when you were expecting plain old ASCII. How does that occur, i.e. what does this data come from such that you not in control of it? Not from your own application I take it?

    If you code makes certain assumptions about exactly which bits are where in the string, and these assumptions are being violated, then have you not been validating those assumptions, and sanitising the input? Can you simply return an error and force the client to correct what they're sending?

    Your code that does replacements is brittle. It should probably pass the data through a UTF8 to ASCII converter (if that's what this data actually is).

    You have a problem with the fixed length of the string being different afterwards. This seems to me more like a problem of not receiving the entire string. You're probably accepting 20 bytes somewhere, and then converting it, when you should be accepting 20 characters (yes there's a difference), with any necessary conversion happening within the function receiving the data. Such that you are streamed some number of bytes, and you keep receiving until you have read 20 characters from that data stream, or you run out of data. Presumably the data is not being truncated on the client end.

  3. #3
    Registered User
    Join Date
    Aug 2009
    Posts
    5

    Question

    First point is what scares me the most, presumably UTF8 data (so far people have not recognized the encoding as Unicode, which is what I had assumed at first - maybe it is some form of print data - no clue yet).

    Correct, we get the data from TSNT which provides it from the customer back-end application running who-knows-what.

    As for the assumption, it is not bytes but characters - we check indexes in the string sData[x] for expected characters.

    As for UTF8 to ASCII converted, you mean something like WideCharToMultiByte?

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    > which represent a single readable character) similar to the following:
    " Item $x80x90.x81x91x82x92 "
    How similar?
    Like do you have actual examples to show us?

    Because by themselves, those examples are neither wide characters, unicode or UTF-8 encoded.

    > Some of my clients have now started sending the string with Doube-High / Double-Wide values
    Some archaic point-of-sale kit perhaps?
    Originally, these devices would have plugged into simple printers (and not your s/w). Such simple formatting would have made sense at that time.

    It seems unlikely there will be anything 'standard' about this.

    Dig out the reference manuals for the kit you're interfacing to, and then write a more general purpose decode module that can deal with whatever different data streams it can produce.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    How to convert string with double high/wide characters to normal string [VC++6] - Dev Shed
    Another crappy cross-poster!

    Damn, I knew that as soon as I saw the "linkback" in the corner - what a waste of time....
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Aug 2009
    Posts
    5
    Salem: Sorry for cross-posting, I wasn't getting much feedback in some forums so decided to "branch out" - I am sorry for breaking the rules ...

    And you are dead-on, this does come from a Point-Of-Sale application (QVS TSNT), I had also assumed it might not be unicode (but Print Data), however most people seemed to have recognized it (as unicode), I still have my doubts. I have the mapping already of what represents what (sample in the code tags of the original posting and full map from my posting on DevShed)... and so far I've not been able to come up with any solution that actually works ...

    Anyways, thanks for responding if with my rule-breaking - it was much appreciated and sorry for causing trouble.
    Thanks,

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 10
    Last Post: 07-10-2008, 03:45 PM
  2. Need some help...
    By darkconvoy in forum C Programming
    Replies: 32
    Last Post: 04-29-2008, 03:33 PM
  3. HUGE fps jump
    By DavidP in forum Game Programming
    Replies: 23
    Last Post: 07-01-2004, 10:36 AM
  4. lvp string...
    By Magma in forum C++ Programming
    Replies: 4
    Last Post: 02-27-2003, 12:03 AM

Tags for this Thread