C Board  

Go Back   C Board > General Programming Boards > C Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 03-12-2010, 04:17 AM   #1
Registered User
 
Join Date: Feb 2010
Posts: 8
Unhappy Help !! Any advice for reverse engineering an encoded network stream ??

Hi

I work for A.Big.Bank and we use a third party application for consuming stock price messages from A.Big.Stock.Exchange. The Gateway connects to a remote system using BSD sockets with TCP and gets sent messages...it then processes them and sends them on to downstream subscribers. Anyways, we believe that the Gateway is badly written and is taking a long time to process the messages it receives. The messages it sends to the downstream clients are purely ASCII and I can tell exactly what they contain, however the ones sent from the Exchange to the Gateway over that TCP connection are encoded(though they are not compressed or encrypted since the actual information they contain is publicly available - for example via Yahoo Finance, albeit delayed by 15 mins). The messages are a mixture of binary and ASCII fields : this is how they appear from a libpcap session piped thru less :

Code:
^@^@^@\^@^@^@<CC>^@^@^@^D^@^@^@^C^@?^@^@<FF><FF><FF><FF>K<90><CA><86>AOAKZ^@^@^@\^A^B^AAOAKZ100604200P^

Now, I now that the message is composed of fixed length fields of different types, e.g :

- message length
- message sequence number
- instrument
- bid price
- ask price

and I know that each of the numerical fields is stored using the most appropriate storage type(for example on these types of fields the message length field is stored in a uint8_t. So I know that(given it's not compressed or encrypted) if I mess around for long enough I will be able to write some code that can decode all the fields correctly. The probelm is that this is horribly time consuming.

the question is....is there a quicker way to do it ?? Does anyone have any ideas or know of a C lib that have functions that can do the work of programmatically decoding the message stream ?? Essentially this is reverse engineering of course.

I haven't been using C very long but I'm not a n00b(I am usually required to write in PERL) but I'm making very slow progress and I need to get some speed up on this : (

Any ideas on how to go about this in a more efficient way would be greatly appreciated : * )
stevehicks is offline   Reply With Quote
Old 03-12-2010, 05:30 AM   #2
dat is, vast staat
 
MK27's Avatar
 
Join Date: Jul 2008
Location: SE Queens
Posts: 6,612
Probably this data is just "serialized" in a basic way.
Quote:
Originally Posted by stevehicks View Post
and I know that each of the numerical fields is stored using the most appropriate storage type(for example on these types of fields the message length field is stored in a uint8_t. So I know that(given it's not compressed or encrypted) if I mess around for long enough I will be able to write some code that can decode all the fields correctly.
For an idea of what "serialization" is about:

SourceForge.net: Serialization - cpwiki

This is less often an issue of interest in perl because of the dynamic typing and such -- data is most often stored/transmitted as text or in a database with a formal protocol (eg YAML). Using pack() and unpack() in perl is APITA so transmission is almost always plain text.

But sometimes the easiest thing to do in C is to not translate the data into text. The only "box" into which transmission must fit is an 8-bit byte format.* You can write() read() send() and recv() non-char types, which makes the data human-unreadable but very easy to use, because you do not have to apply scan() functions or something to extract, for example, numbers from a string. This is particularly true if both ends of the connection were written in a "strongly typed" language such as C (since no human needs to read the data, it is never turned into text).

It sounds like you maybe have access to the source of at least one end of this software and should be able to find out the order and type of each unit of data.

*maybe "must" is too strong a word.
__________________
C programming resources:
GNU C Function and Macro Index -- glibc reference manual
The C Book -- nice online learner guide
Current ISO draft standard
CCAN -- new CPAN like open source library repository
GDB tutorial #1 -- gnu debugger tutorials -- GDB tutorial #2
cpwiki -- our wiki on sourceforge

Last edited by MK27; 03-12-2010 at 05:33 AM.
MK27 is offline   Reply With Quote
Old 03-12-2010, 07:29 AM   #3
Registered User
 
Join Date: Feb 2010
Posts: 8
Help !! Any advice for reverse engineering an encoded network stream ??

Hi

Thanks very much for your response and the link !!

I do have access to the messages after they are decoded and are sent out of the gateway to the downstream client however because the gateway removes, adds and changes fields it's very difficult to use those messages to work out what's entering the gateway itself. This is why I have to decode the messages....as soon as I have done that I can compare them with the messages that are leaving the gateway and calculate the time spent in the gateway


Cheers

Steve
stevehicks is offline   Reply With Quote
Old 03-12-2010, 08:44 AM   #4
Registered User
 
Join Date: Oct 2008
Location: TX
Posts: 1,628
Quote:
Originally Posted by stevehicks View Post
Does anyone have any ideas or know of a C lib that have functions that can do the work of programmatically decoding the message stream ?? Essentially this is reverse engineering of course.
Can you post the code that you have written using libpcap, because if the message format and its member types is known, you can dump it into an in-core struct and print out the fields. The only thing faster than a C program would be one written in assembly.
Quote:
Originally Posted by MK27 View Post
... "strongly typed" language such as C ...
Btw, C is a "weakly typed" language as it allows conversion from one type to another with relative ease using casts, whereas a "strongly typed" language won't let you do that.
itCbitC is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump


All times are GMT -6. The time now is 12:18 AM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22