Thread: "Unescaping" certain characters from binary file

  1. #1
    Registered User MartinR's Avatar
    Join Date
    Dec 2013
    Posts
    200

    "Unescaping" certain characters from binary file

    Hello,

    I have a binary file in which some bytes were escaped according to the following rule:
    Code:
    if (byte == XX || byte == YY || ...)
    {
    temp = byte;
    byte = 0x7d; 
    byte+1 = tmp ^ 0x20;
    }
    So when I receive such a file in my program I try to decipher it by this for loop:
    Code:
    char escaped_file[file_len];
    char original_data[file_len]
    for (i=0; j=0; i < file_len; i++, j++)
    {
        if (escaped_file[i] == 0x7d) {
            original_data[j] = escaped_file[++i] ^ 0x20;
        } else { 
            original_data[j] = escaped_file[i];
        }
    Now this works perfectly for text files. However when I do the same with binary ones there is sth wrong as md5sum doesnt match. I have no idea why is that, do you? Thanks for suggestions

  2. #2
    Registered User Sir Galahad's Avatar
    Join Date
    Nov 2016
    Location
    The Round Table
    Posts
    277
    And you're escaping this special value 0x7d as well I assume? If not, try that. Otherwise maybe post the code for the encoder. Also your decoding loop has a possible buffer-overflow bug; you increment `i` if the value is seen but you forget to check that the new index is in-bounds.

  3. #3
    Registered User
    Join Date
    May 2010
    Posts
    4,633
    However when I do the same with binary ones there is sth wrong as md5sum doesnt match.

    Could the problem be due to different line endings?

    But really you're not showing near enough content. For example a small sample of your input file and how you're computing the md5sum, etc.

  4. #4
    Registered User MartinR's Avatar
    Join Date
    Dec 2013
    Posts
    200
    Let me give you some more informations then. What I do is write a stub for GDB - so it parses commands (packets) that GDB sends and responds accordingly. This particular problem is with GDB's "remote put host_file target_file" which aims to transfer host_file to target mechine. As I said in the very first post, the transfer of TEXT files works FINE. In the case if binary files I though that the problem is in those "escaped characters" and maybe I "unescape" some that should not be or sth like that. However, today I disabled the unescaping procedure and checked if the data transfered md5sum is constant and as it turned out it isn't! Every time I send the file it has different md5sum! This is strange as GDB sends this file in XX small chunks alongside with a byte checksum which I of course verify at my end and proceed only if it match the host one. What do you think may cause such issue? oO

    As for how I calculate md5sum - I just use md5sum program, its from GNU core utilis I believe.

  5. #5
    Registered User
    Join Date
    May 2010
    Posts
    4,633
    What operating system are you using?

    What parameters are you using with md5sum?

    What is md5sum returning?

  6. #6
    Registered User MartinR's Avatar
    Join Date
    Dec 2013
    Posts
    200
    Quote Originally Posted by jimblumberg View Post
    What operating system are you using?

    What parameters are you using with md5sum?

    What is md5sum returning?
    Ubuntu 16.04. As for parameters of course I use the original file and the one transfered over gdb, what else could I use?

    And what is md5sum returning? Of course md5 sum of each file, again what else could it do? Btw, where are you heading?

    I also noticed that sometimes these md5sums maches! Sometimes they dont but are presistent and some yet another time md5sum changes each time I send the file oO

  7. #7
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Quote Originally Posted by MartinR View Post
    I have a binary file in which some bytes were escaped according to the following rule:
    Code:
    if (byte == XX || byte == YY || ...)
    {
    temp = byte;
    byte = 0x7d; 
    byte+1 = tmp ^ 0x20;
    }
    Is 'byte' a pointer? 'byte+1' isn't a lvalue and will cause a compilation error... and you are using 'tmp' instead of 'temp', did you notice?
    What this code is supposed to do?

  8. #8
    Registered User MartinR's Avatar
    Join Date
    Dec 2013
    Posts
    200
    @flp1969, yes you are right there are bugs, but I guess you realized this is a pseudo code to just ilustrate how the encoder works

    Anyway, I have fixed the bug, it was because sometimes GDB sends ack together with data packets itself, this is very rare case and i had there:

    Code:
    strcpy(packet, packet+1);
    to remove the preceding + sign. I don't know why I wrote it this was as simple:

    Code:
    packet = packet+1
    is both faster and safer.

    Thanks everybody involved in the discussion

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 12-06-2008, 02:43 PM
  2. "Unescaping" literal strings
    By yoshiznit123 in forum C Programming
    Replies: 4
    Last Post: 06-02-2006, 01:35 PM
  3. Replies: 4
    Last Post: 04-02-2006, 09:31 AM
  4. "itoa"-"_itoa" , "inp"-"_inp", Why some functions have "
    By L.O.K. in forum Windows Programming
    Replies: 5
    Last Post: 12-08-2002, 08:25 AM
  5. Replies: 4
    Last Post: 06-21-2002, 02:52 PM

Tags for this Thread