![]() |
| |||||||
![]() |
| | LinkBack | Thread Tools | Display Modes |
| | #1 |
| Registered User Join Date: Feb 2010
Posts: 8
| I work for A.Big.Bank and we use a third party application for consuming stock price messages from A.Big.Stock.Exchange. The Gateway connects to a remote system using BSD sockets with TCP and gets sent messages...it then processes them and sends them on to downstream subscribers. Anyways, we believe that the Gateway is badly written and is taking a long time to process the messages it receives. The messages it sends to the downstream clients are purely ASCII and I can tell exactly what they contain, however the ones sent from the Exchange to the Gateway over that TCP connection are encoded(though they are not compressed or encrypted since the actual information they contain is publicly available - for example via Yahoo Finance, albeit delayed by 15 mins). The messages are a mixture of binary and ASCII fields : this is how they appear from a libpcap session piped thru less : Code: ^@^@^@\^@^@^@<CC>^@^@^@^D^@^@^@^C^@?^@^@<FF><FF><FF><FF>K<90><CA><86>AOAKZ^@^@^@\^A^B^AAOAKZ100604200P^ Now, I now that the message is composed of fixed length fields of different types, e.g : - message length - message sequence number - instrument - bid price - ask price and I know that each of the numerical fields is stored using the most appropriate storage type(for example on these types of fields the message length field is stored in a uint8_t. So I know that(given it's not compressed or encrypted) if I mess around for long enough I will be able to write some code that can decode all the fields correctly. The probelm is that this is horribly time consuming. the question is....is there a quicker way to do it ?? Does anyone have any ideas or know of a C lib that have functions that can do the work of programmatically decoding the message stream ?? Essentially this is reverse engineering of course. I haven't been using C very long but I'm not a n00b(I am usually required to write in PERL) but I'm making very slow progress and I need to get some speed up on this : ( Any ideas on how to go about this in a more efficient way would be greatly appreciated : * ) |
| stevehicks is offline | |
| | #2 | |
| dat is, vast staat Join Date: Jul 2008 Location: SE Queens
Posts: 6,612
| Probably this data is just "serialized" in a basic way. Quote:
SourceForge.net: Serialization - cpwiki This is less often an issue of interest in perl because of the dynamic typing and such -- data is most often stored/transmitted as text or in a database with a formal protocol (eg YAML). Using pack() and unpack() in perl is APITA so transmission is almost always plain text. But sometimes the easiest thing to do in C is to not translate the data into text. The only "box" into which transmission must fit is an 8-bit byte format.* You can write() read() send() and recv() non-char types, which makes the data human-unreadable but very easy to use, because you do not have to apply scan() functions or something to extract, for example, numbers from a string. This is particularly true if both ends of the connection were written in a "strongly typed" language such as C (since no human needs to read the data, it is never turned into text). It sounds like you maybe have access to the source of at least one end of this software and should be able to find out the order and type of each unit of data. *maybe "must" is too strong a word.
__________________ C programming resources: GNU C Function and Macro Index -- glibc reference manual The C Book -- nice online learner guide Current ISO draft standard CCAN -- new CPAN like open source library repository GDB tutorial #1 -- gnu debugger tutorials -- GDB tutorial #2 cpwiki -- our wiki on sourceforge Last edited by MK27; 03-12-2010 at 05:33 AM. | |
| MK27 is offline | |
| | #3 |
| Registered User Join Date: Feb 2010
Posts: 8
| Help !! Any advice for reverse engineering an encoded network stream ?? Hi Thanks very much for your response and the link !! I do have access to the messages after they are decoded and are sent out of the gateway to the downstream client however because the gateway removes, adds and changes fields it's very difficult to use those messages to work out what's entering the gateway itself. This is why I have to decode the messages....as soon as I have done that I can compare them with the messages that are leaving the gateway and calculate the time spent in the gateway Cheers Steve |
| stevehicks is offline | |
| | #4 | |
| Registered User Join Date: Oct 2008 Location: TX
Posts: 1,628
| Quote:
Btw, C is a "weakly typed" language as it allows conversion from one type to another with relative ease using casts, whereas a "strongly typed" language won't let you do that. | |
| itCbitC is offline | |
![]() |
| Thread Tools | |
| Display Modes | |
|