Thread: Help !! Any advice for reverse engineering an encoded network stream ??

  1. #1
    Registered User
    Join Date
    Feb 2010
    Posts
    8

    Unhappy Help !! Any advice for reverse engineering an encoded network stream ??

    Hi

    I work for A.Big.Bank and we use a third party application for consuming stock price messages from A.Big.Stock.Exchange. The Gateway connects to a remote system using BSD sockets with TCP and gets sent messages...it then processes them and sends them on to downstream subscribers. Anyways, we believe that the Gateway is badly written and is taking a long time to process the messages it receives. The messages it sends to the downstream clients are purely ASCII and I can tell exactly what they contain, however the ones sent from the Exchange to the Gateway over that TCP connection are encoded(though they are not compressed or encrypted since the actual information they contain is publicly available - for example via Yahoo Finance, albeit delayed by 15 mins). The messages are a mixture of binary and ASCII fields : this is how they appear from a libpcap session piped thru less :

    Code:
    ^@^@^@\^@^@^@<CC>^@^@^@^D^@^@^@^C^@?^@^@<FF><FF><FF><FF>K<90><CA><86>AOAKZ^@^@^@\^A^B^AAOAKZ100604200P^

    Now, I now that the message is composed of fixed length fields of different types, e.g :

    - message length
    - message sequence number
    - instrument
    - bid price
    - ask price

    and I know that each of the numerical fields is stored using the most appropriate storage type(for example on these types of fields the message length field is stored in a uint8_t. So I know that(given it's not compressed or encrypted) if I mess around for long enough I will be able to write some code that can decode all the fields correctly. The probelm is that this is horribly time consuming.

    the question is....is there a quicker way to do it ?? Does anyone have any ideas or know of a C lib that have functions that can do the work of programmatically decoding the message stream ?? Essentially this is reverse engineering of course.

    I haven't been using C very long but I'm not a n00b(I am usually required to write in PERL) but I'm making very slow progress and I need to get some speed up on this : (

    Any ideas on how to go about this in a more efficient way would be greatly appreciated : * )

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Probably this data is just "serialized" in a basic way.
    Quote Originally Posted by stevehicks View Post
    and I know that each of the numerical fields is stored using the most appropriate storage type(for example on these types of fields the message length field is stored in a uint8_t. So I know that(given it's not compressed or encrypted) if I mess around for long enough I will be able to write some code that can decode all the fields correctly.
    For an idea of what "serialization" is about:

    SourceForge.net: Serialization - cpwiki

    This is less often an issue of interest in perl because of the dynamic typing and such -- data is most often stored/transmitted as text or in a database with a formal protocol (eg YAML). Using pack() and unpack() in perl is APITA so transmission is almost always plain text.

    But sometimes the easiest thing to do in C is to not translate the data into text. The only "box" into which transmission must fit is an 8-bit byte format.* You can write() read() send() and recv() non-char types, which makes the data human-unreadable but very easy to use, because you do not have to apply scan() functions or something to extract, for example, numbers from a string. This is particularly true if both ends of the connection were written in a "strongly typed" language such as C (since no human needs to read the data, it is never turned into text).

    It sounds like you maybe have access to the source of at least one end of this software and should be able to find out the order and type of each unit of data.

    *maybe "must" is too strong a word.
    Last edited by MK27; 03-12-2010 at 05:33 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    Feb 2010
    Posts
    8

    Help !! Any advice for reverse engineering an encoded network stream ??

    Hi

    Thanks very much for your response and the link !!

    I do have access to the messages after they are decoded and are sent out of the gateway to the downstream client however because the gateway removes, adds and changes fields it's very difficult to use those messages to work out what's entering the gateway itself. This is why I have to decode the messages....as soon as I have done that I can compare them with the messages that are leaving the gateway and calculate the time spent in the gateway


    Cheers

    Steve

  4. #4
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by stevehicks View Post
    Does anyone have any ideas or know of a C lib that have functions that can do the work of programmatically decoding the message stream ?? Essentially this is reverse engineering of course.
    Can you post the code that you have written using libpcap, because if the message format and its member types is known, you can dump it into an in-core struct and print out the fields. The only thing faster than a C program would be one written in assembly.
    Quote Originally Posted by MK27 View Post
    ... "strongly typed" language such as C ...
    Btw, C is a "weakly typed" language as it allows conversion from one type to another with relative ease using casts, whereas a "strongly typed" language won't let you do that.

Popular pages Recent additions subscribe to a feed