Help !! Any advice for reverse engineering an encoded network stream ??

**stevehicks** · 03-12-2010

Hi

I work for A.Big.Bank and we use a third party application for consuming stock price messages from A.Big.Stock.Exchange. The Gateway connects to a remote system using BSD sockets with TCP and gets sent messages...it then processes them and sends them on to downstream subscribers. Anyways, we believe that the Gateway is badly written and is taking a long time to process the messages it receives. The messages it sends to the downstream clients are purely ASCII and I can tell exactly what they contain, however the ones sent from the Exchange to the Gateway over that TCP connection are encoded(though they are not compressed or encrypted since the actual information they contain is publicly available - for example via Yahoo Finance, albeit delayed by 15 mins). The messages are a mixture of binary and ASCII fields : this is how they appear from a libpcap session piped thru less :

Code:

^@^@^@\^@^@^@<CC>^@^@^@^D^@^@^@^C^@?^@^@<FF><FF><FF><FF>K<90><CA><86>AOAKZ^@^@^@\^A^B^AAOAKZ100604200P^

Now, I now that the message is composed of fixed length fields of different types, e.g :

- message length
- message sequence number
- instrument
- bid price
- ask price

and I know that each of the numerical fields is stored using the most appropriate storage type(for example on these types of fields the message length field is stored in a uint8_t. So I know that(given it's not compressed or encrypted) if I mess around for long enough I will be able to write some code that can decode all the fields correctly. The probelm is that this is horribly time consuming.

the question is....is there a quicker way to do it ?? Does anyone have any ideas or know of a C lib that have functions that can do the work of programmatically decoding the message stream ?? Essentially this is reverse engineering of course.

I haven't been using C very long but I'm not a n00b(I am usually required to write in PERL) but I'm making very slow progress and I need to get some speed up on this : (

Any ideas on how to go about this in a more efficient way would be greatly appreciated : * )

**MK27** · 03-12-2010

Probably this data is just "serialized" in a basic way.

Originally Posted by stevehicks

and I know that each of the numerical fields is stored using the most appropriate storage type(for example on these types of fields the message length field is stored in a uint8_t. So I know that(given it's not compressed or encrypted) if I mess around for long enough I will be able to write some code that can decode all the fields correctly.

For an idea of what "serialization" is about:

SourceForge.net: Serialization - cpwiki

This is less often an issue of interest in perl because of the dynamic typing and such -- data is most often stored/transmitted as text or in a database with a formal protocol (eg YAML). Using pack() and unpack() in perl is APITA so transmission is almost always plain text.

But sometimes the easiest thing to do in C is to not translate the data into text. The only "box" into which transmission must fit is an 8-bit byte format.* You can write() read() send() and recv() non-char types, which makes the data human-unreadable but very easy to use, because you do not have to apply scan() functions or something to extract, for example, numbers from a string. This is particularly true if both ends of the connection were written in a "strongly typed" language such as C (since no human needs to read the data, it is never turned into text).

It sounds like you maybe have access to the source of at least one end of this software and should be able to find out the order and type of each unit of data.

*maybe "must" is too strong a word.

**stevehicks** · 03-12-2010

Hi

Thanks very much for your response and the link !!

I do have access to the messages after they are decoded and are sent out of the gateway to the downstream client however because the gateway removes, adds and changes fields it's very difficult to use those messages to work out what's entering the gateway itself. This is why I have to decode the messages....as soon as I have done that I can compare them with the messages that are leaving the gateway and calculate the time spent in the gateway

Cheers

Steve

**itCbitC** · 03-12-2010

Originally Posted by stevehicks

Does anyone have any ideas or know of a C lib that have functions that can do the work of programmatically decoding the message stream ?? Essentially this is reverse engineering of course.

Can you post the code that you have written using libpcap, because if the message format and its member types is known, you can dump it into an in-core struct and print out the fields. The only thing faster than a C program would be one written in assembly.

Originally Posted by MK27

... "strongly typed" language such as C ...

Btw, C is a "weakly typed" language as it allows conversion from one type to another with relative ease using casts, whereas a "strongly typed" language won't let you do that.

Thread: Help !! Any advice for reverse engineering an encoded network stream ??

Thread Tools

Search Thread

Display

Help !! Any advice for reverse engineering an encoded network stream ??

Help !! Any advice for reverse engineering an encoded network stream ??