Thread: c serializer - approaches to cross machine support

  1. #1
    Registered User
    Join Date
    Jul 2009
    Posts
    35

    c serializer - approaches to cross machine support

    Hey All!

    I've been working on really really nice c serializer that's just about done. I slightly re-invented the wheel because I didn't like my options, and they didn't have some features I needed.

    Anyway, what I'm trying to wrap my head around now is dealing with cases where the size of a type on the source (serializing) machine is larger or smaller than the same type on an other machine. And actually - I can't really think of a case where this could be a problem because of the way I've implemented the interfaces - I use all standard types: uint8_t, uint16_t. Which in theory should guarantee that that type is n bits right?

    If you wouldn't mind taking a look at some code that would be awesome.

    header: https://github.com/beheadedmyway/gwl...er/src/gwser.h
    source: https://github.com/beheadedmyway/gwl...er/src/gwser.c
    example: https://github.com/beheadedmyway/gwl...calars/test1.c
    example: https://github.com/beheadedmyway/gwl..._keyed/test1.c

    I was thinking of storing type size information in the serialized stream but I think a better way would be to define my own internal types that I use instead of the uint8_t types - so that I can use macros and check the available type sizes and redefine my own internal types to make sure that they're big enough.. meaning something like this:

    (psuedo code)

    #if UINT8_MAX == XXX
    typedef gw_uint8_t uint8_t
    #endif

    I'm thinking if I do this then when it's built on another machine I can control what gets used in the typedefs to make sure the sizes are at least n bits. or maybe I don't even need to worry about this now? does using standard uint8_t types alleviate you from having to do this?

    thanks for any help or suggestions!

  2. #2
    Banned
    Join Date
    May 2009
    Posts
    37
    beheadedmyway: [shakes head] i don't think he gets it! look, here i've demonstrated that ANY ordinary project will have lines of codes in the thousands!:
    https://github.com/beheadedmyway/gwl...er/src/gwser.c

    and judging by this character's simpleton mind, it's probably in half a dozen of files anyway. this source code i've written is REALLY GOOD (if i do say so myself). i'm sure!! just because I THINK SO, BECAUSE I ACTUALLY HAD A HARD TIME WRAPPING MY HEAD AROUND IT AND THE TYPICAL SNIPPETS OF CODES PROVIDED BY INSTRUCTORS ARE NO BETTER....

    seriously, i mean, i've seen that guy's posted code a few weeks ago and i can understand it and it's nothing [pfft .... air].


    me (this is me talking now): hey, kid, just because you are ENTERTAINED by your own "skills" of using dereferencing (more effectively than c programming students can, anyways) and mighty proud of the mind numbling get/set nature of the library -- that's all too common in the industry. yah ain't gonna impress.

  3. #3
    Banned
    Join Date
    May 2009
    Posts
    37
    A LOT of things that you are so proud of are unoptimized (which is basically universal in ALL machines using c). i'll give 1 CONCRETE example:

    int16_t * flagsp = (uint16_t *)&buf[gwser_header_flags_offset];



    should be:

    uint16_t * flagsp = buf + gwser_header_flags_offset;
    /* unless you're basing your arithmetic/conditional branch on SIGNED addresses, there is almost no reason to declare an address as signed */

    of course this is just trivial and i'm expecting a "[gaaaaassppp!!!! look, loook, here!! see i knew it he's a beginner trying to fool us he actually knows anything! see? the only ones he talks about are those basic ones!!"

  4. #4
    Registered User
    Join Date
    Jul 2009
    Posts
    35
    hey renzokuken01,

    Sorry if I came off like I was trying to impress. There's always something in C that i'm trying to understand better. In this case I don't quite understand how to make sure that the streams of data will be portable to another machine in cases where type sizes might be different.

    So I'm not perfect - that's why I was asking for help.

    And thanks for pointing out that optimization. If you see anything else please point it out.

    Thanks!

  5. #5
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Without actually reading the code, just the question:

    IF uint16_t exists on a system, it is guaranteed to be a 16-bit integer. If for some reason a system doesn't have a 16-bit data type, then uint16_t won't exist on that system.

    If you keep reading through <stdint.h>, you'll come across uint_least16_t, which is the smallest unsigned type on that system that can hold 16-bits. You are guaranteed 8, 16, 32, and 64 bit versions of this type.

  6. #6
    Registered User
    Join Date
    Jul 2009
    Posts
    35
    Quote Originally Posted by tabstop View Post
    Without actually reading the code, just the question:

    IF uint16_t exists on a system, it is guaranteed to be a 16-bit integer. If for some reason a system doesn't have a 16-bit data type, then uint16_t won't exist on that system.

    If you keep reading through <stdint.h>, you'll come across uint_least16_t, which is the smallest unsigned type on that system that can hold 16-bits. You are guaranteed 8, 16, 32, and 64 bit versions of this type.
    Thanks - that's kind of what I was thinking but I always run into little snippets that say no machine is guaranteed to have types of the "standard" size - of course I run into that in books from the 80's or early nineties. So I would imagine it's different now - and with these types in the C99 standard I should be ok.

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by beheadedmyway View Post
    So I would imagine it's different now - and with these types in the C99 standard I should be ok.
    You should not assume that.

    The other system you're trying to talk to might not be running C or any language with compatible types. I can't find the reference right at the moment but a while back I blundered into a website for a super-computer where the text standard was 32bit unicode...

    Never assume anything...

  8. #8
    Registered User
    Join Date
    Jul 2009
    Posts
    35
    Quote Originally Posted by CommonTater View Post
    You should not assume that.

    The other system you're trying to talk to might not be running C or any language with compatible types. I can't find the reference right at the moment but a while back I blundered into a website for a super-computer where the text standard was 32bit unicode...

    Never assume anything...
    Thanks. yeah that's what I'm always thinking in the back of my mind. But I can never find any concrete references on dealing with these types of situations.

  9. #9
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by beheadedmyway View Post
    Thanks. yeah that's what I'm always thinking in the back of my mind. But I can never find any concrete references on dealing with these types of situations.
    If there's no C compiler on the target architecture and no cross-compiler that targets that architecture, then this is all irrelevant because there's no way to run this program that we're writing so why are we even writing it.

    If there is a C compiler on the target or a cross-compiler that targets the architecture, then that compiler will know what the "right" things are and will choose them (assuming we have a C99 compiler, which we very likely will). If the system has 32-bit chars, then uint_least32_t will be unsigned char -- but that doesn't really matter to us, all we care about is that it has 32 bits.

    If you're doing text and you want to be sure how big your chars are for that reason, there is CHAR_BIT. This would only come up if we want the output from one machine to carry over as the input to another machine, though.

  10. #10
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by beheadedmyway View Post
    Thanks. yeah that's what I'm always thinking in the back of my mind. But I can never find any concrete references on dealing with these types of situations.
    For communications to work, there has to be an understanding between both ends... and thus the programmers at both ends... of exactly what should be passed back and forth and what it means.

    Think of it as a mini-language... You have to define each word individually. On your end you may be sending a uint_16... but on a different OS at the other end they might be receiving text ... it doesn't matter so long as the two are compatible enough to recover the data.

    When programming network sequences I generally packetize with guard values between data fields. The guard value is something normally impossible in your commuications: For example -127 between character strings. So basically you end up with data-guard-data-guard etc. If you hit a guard value while decoding the packet you know you're off by at least one, and can often reparse to the previous guard and recover the packet. It's not perfect, but it does improve reliability.

    The important thing is that both ends need to know exactly what the other is sending.

  11. #11
    Registered User
    Join Date
    Jul 2009
    Posts
    35
    Quote Originally Posted by CommonTater View Post
    For communications to work, there has to be an understanding between both ends... and thus the programmers at both ends... of exactly what should be passed back and forth and what it means.

    Think of it as a mini-language... You have to define each word individually. On your end you may be sending a uint_16... but on a different OS at the other end they might be receiving text ... it doesn't matter so long as the two are compatible enough to recover the data.

    When programming network sequences I generally packetize with guard values between data fields. The guard value is something normally impossible in your commuications: For example -127 between character strings. So basically you end up with data-guard-data-guard etc. If you hit a guard value while decoding the packet you know you're off by at least one, and can often reparse to the previous guard and recover the packet. It's not perfect, but it does improve reliability.

    The important thing is that both ends need to know exactly what the other is sending.
    I see what you mean for network programming - it's up to both programmers to ensure that for example; if some piece of data in the stream is 32 bits the other programming should make sure to read 32 bits.

    I guess my ultimate question is more related to passing back and forth files that are binary streams like this. If I save a file that gets emailed to some random computer that happens to use my serialization library for deserialization - but the compiler that compiled the library had a different size type - how can I account for that either at runtime or compile time.

    An example I can think of is a machine that has a 32 bit long, and a different machine that has a 64 bit long. If on the source machine I write a long to a stream -> it outputs 32 bits. but on the destination machine the deserializer goes to read a long but the size of it on that machine is 64 bits. So the deserializer will eat up 4 extra bytes that it shouldn't. What's the best way to protect against those scenarios?

    Thanks!

  12. #12
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by beheadedmyway View Post
    An example I can think of is a machine that has a 32 bit long, and a different machine that has a 64 bit long. If on the source machine I write a long to a stream -> it outputs 32 bits. but on the destination machine the deserializer goes to read a long but the size of it on that machine is 64 bits. So the deserializer will eat up 4 extra bytes that it shouldn't. What's the best way to protect against those scenarios?

    Thanks!
    By being exactly specific about your protocal.

    Really... right down to defining the exact position of important items in each packet...

    In your example where you are sending a file... for example... he would request the file, you would respond with the file size in bytes, he would say Yay or Nay... and from there it's up to him.

    In this case the internal sizes of the systems and the variables in play in code are far less important since files are generally manipulated whole-cloth at this level. Where it does get important is for example... a buffer with a 32 bit value that must be correctly received (lets say it's a passkey) This is where the size of variables is critical... If you expect 32 bits, then he'd better send 32 bits... and that's where you have to communicate very precisely with other programmers.

    In case you haven't seen it yet, internet protocals are defined in the precise way I've been describing... It just couldn't be done any other way.

    RFC Sourcebook

  13. #13
    Registered User
    Join Date
    Jul 2009
    Posts
    35
    I guess maybe I should change my question slightly. or maybe i'm not understanding you correctly. I'm not concerned about sending it. what I'm concerned about is that when the source code is built on different computers that the sizes of types i'm expecting are the right sizes.

    going back to my previous example of using the long. say I have two functions:

    void write_long(serializer,long);
    long read_long(deserializer);

    let's say on my computer a long is 32 bits, I write one long to the serializer, then save it to a file.
    give that file to someone who has my c library and is going to use it to deserialize the file.
    on their computer a long is 64 bits, and the read_long would overflow when reading the file I gave them.

    this is more in the lines of what I'm trying to understand. how do you ensure that your library being compiled on another machine ensures the right sizes? does that make more sense in terms of the question I'm trying to understand?

    Thanks!

  14. #14
    Registered User
    Join Date
    Jul 2009
    Posts
    35
    and I think tabstop might have answered my question earlier - that if a machine has uint8,uint16, etc, it's guaranteed to be that size.

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by beheadedmyway View Post
    and I think tabstop might have answered my question earlier - that if a machine has uint8,uint16, etc, it's guaranteed to be that size.
    I think you've basically figured out the gist of it, which is you cannot rely on C data types for this purpose, you need to carefully control the width of the types you are using.

    Another thing that hasn't come up yet is different endian conventions between machines. Intel processors are little-endian, most of the rest of the world is big-endian. So you need to also define what the byte ordering will be in your serialized streams.

    Despite the fact that most consumer hardware is Intel and therefore little-endian, it's common practice to use big-endian in situations where portability is an issue. It doesn't really matter, but you need to decide on one convention and stick to it.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. LDAP Query
    By Travoiz in forum C++ Programming
    Replies: 0
    Last Post: 08-13-2009, 02:58 PM
  2. Errors including <windows.h>
    By jw232 in forum Windows Programming
    Replies: 4
    Last Post: 07-29-2008, 01:29 PM
  3. failure to import external C libraries in C++ project
    By nocturna_gr in forum C++ Programming
    Replies: 3
    Last Post: 12-02-2007, 03:49 PM
  4. Dev-cpp - compiler options
    By tretton in forum C Programming
    Replies: 7
    Last Post: 01-06-2006, 06:20 PM