c serializer - approaches to cross machine support

**beheadedmyway** · 02-02-2011

Hey All!

I've been working on really really nice c serializer that's just about done. I slightly re-invented the wheel because I didn't like my options, and they didn't have some features I needed.

Anyway, what I'm trying to wrap my head around now is dealing with cases where the size of a type on the source (serializing) machine is larger or smaller than the same type on an other machine. And actually - I can't really think of a case where this could be a problem because of the way I've implemented the interfaces - I use all standard types: uint8_t, uint16_t. Which in theory should guarantee that that type is n bits right?

If you wouldn't mind taking a look at some code that would be awesome.

header: https://github.com/beheadedmyway/gwl...er/src/gwser.h
source: https://github.com/beheadedmyway/gwl...er/src/gwser.c
example: https://github.com/beheadedmyway/gwl...calars/test1.c
example: https://github.com/beheadedmyway/gwl..._keyed/test1.c

I was thinking of storing type size information in the serialized stream but I think a better way would be to define my own internal types that I use instead of the uint8_t types - so that I can use macros and check the available type sizes and redefine my own internal types to make sure that they're big enough.. meaning something like this:

(psuedo code)

#if UINT8_MAX == XXX
typedef gw_uint8_t uint8_t
#endif

I'm thinking if I do this then when it's built on another machine I can control what gets used in the typedefs to make sure the sizes are at least n bits. or maybe I don't even need to worry about this now? does using standard uint8_t types alleviate you from having to do this?

thanks for any help or suggestions!

**~~renzokuken01~~** · 02-02-2011

beheadedmyway: [shakes head] i don't think he gets it! look, here i've demonstrated that ANY ordinary project will have lines of codes in the thousands!:
https://github.com/beheadedmyway/gwl...er/src/gwser.c

and judging by this character's simpleton mind, it's probably in half a dozen of files anyway. this source code i've written is REALLY GOOD (if i do say so myself). i'm sure!! just because I THINK SO, BECAUSE I ACTUALLY HAD A HARD TIME WRAPPING MY HEAD AROUND IT AND THE TYPICAL SNIPPETS OF CODES PROVIDED BY INSTRUCTORS ARE NO BETTER....

seriously, i mean, i've seen that guy's posted code a few weeks ago and i can understand it and it's nothing [pfft .... air].

me (this is me talking now): hey, kid, just because you are ENTERTAINED by your own "skills" of using dereferencing (more effectively than c programming students can, anyways) and mighty proud of the mind numbling get/set nature of the library -- that's all too common in the industry. yah ain't gonna impress.

**~~renzokuken01~~** · 02-02-2011

A LOT of things that you are so proud of are unoptimized (which is basically universal in ALL machines using c). i'll give 1 CONCRETE example:

int16_t * flagsp = (uint16_t *)&buf[gwser_header_flags_offset];

should be:

uint16_t * flagsp = buf + gwser_header_flags_offset;
/* unless you're basing your arithmetic/conditional branch on SIGNED addresses, there is almost no reason to declare an address as signed */

of course this is just trivial and i'm expecting a "[gaaaaassppp!!!! look, loook, here!! see i knew it he's a beginner trying to fool us he actually knows anything! see? the only ones he talks about are those basic ones!!"

**beheadedmyway** · 02-02-2011

hey renzokuken01,

Sorry if I came off like I was trying to impress. There's always something in C that i'm trying to understand better. In this case I don't quite understand how to make sure that the streams of data will be portable to another machine in cases where type sizes might be different.

So I'm not perfect - that's why I was asking for help.

And thanks for pointing out that optimization. If you see anything else please point it out.

Thanks!

**tabstop** · 02-02-2011

Without actually reading the code, just the question:

IF uint16_t exists on a system, it is guaranteed to be a 16-bit integer. If for some reason a system doesn't have a 16-bit data type, then uint16_t won't exist on that system.

If you keep reading through <stdint.h>, you'll come across uint_least16_t, which is the smallest unsigned type on that system that can hold 16-bits. You are guaranteed 8, 16, 32, and 64 bit versions of this type.

**beheadedmyway** · 02-02-2011

Originally Posted by tabstop

Without actually reading the code, just the question:

IF uint16_t exists on a system, it is guaranteed to be a 16-bit integer. If for some reason a system doesn't have a 16-bit data type, then uint16_t won't exist on that system.

If you keep reading through <stdint.h>, you'll come across uint_least16_t, which is the smallest unsigned type on that system that can hold 16-bits. You are guaranteed 8, 16, 32, and 64 bit versions of this type.

Thanks - that's kind of what I was thinking but I always run into little snippets that say no machine is guaranteed to have types of the "standard" size - of course I run into that in books from the 80's or early nineties. So I would imagine it's different now - and with these types in the C99 standard I should be ok.

**~~CommonTater~~** · 02-02-2011

Originally Posted by beheadedmyway

So I would imagine it's different now - and with these types in the C99 standard I should be ok.

You should not assume that.

The other system you're trying to talk to might not be running C or any language with compatible types. I can't find the reference right at the moment but a while back I blundered into a website for a super-computer where the text standard was 32bit unicode...

Never assume anything...

**beheadedmyway** · 02-02-2011

Originally Posted by CommonTater

You should not assume that.

The other system you're trying to talk to might not be running C or any language with compatible types. I can't find the reference right at the moment but a while back I blundered into a website for a super-computer where the text standard was 32bit unicode...

Never assume anything...

Thanks. yeah that's what I'm always thinking in the back of my mind. But I can never find any concrete references on dealing with these types of situations.

**tabstop** · 02-02-2011

Originally Posted by beheadedmyway

Thanks. yeah that's what I'm always thinking in the back of my mind. But I can never find any concrete references on dealing with these types of situations.

If there's no C compiler on the target architecture and no cross-compiler that targets that architecture, then this is all irrelevant because there's no way to run this program that we're writing so why are we even writing it.

If there is a C compiler on the target or a cross-compiler that targets the architecture, then that compiler will know what the "right" things are and will choose them (assuming we have a C99 compiler, which we very likely will). If the system has 32-bit chars, then uint_least32_t will be unsigned char -- but that doesn't really matter to us, all we care about is that it has 32 bits.

If you're doing text and you want to be sure how big your chars are for that reason, there is CHAR_BIT. This would only come up if we want the output from one machine to carry over as the input to another machine, though.

**~~CommonTater~~** · 02-02-2011

Originally Posted by beheadedmyway

Thanks. yeah that's what I'm always thinking in the back of my mind. But I can never find any concrete references on dealing with these types of situations.

For communications to work, there has to be an understanding between both ends... and thus the programmers at both ends... of exactly what should be passed back and forth and what it means.

Think of it as a mini-language... You have to define each word individually. On your end you may be sending a uint_16... but on a different OS at the other end they might be receiving text ... it doesn't matter so long as the two are compatible enough to recover the data.

When programming network sequences I generally packetize with guard values between data fields. The guard value is something normally impossible in your commuications: For example -127 between character strings. So basically you end up with data-guard-data-guard etc. If you hit a guard value while decoding the packet you know you're off by at least one, and can often reparse to the previous guard and recover the packet. It's not perfect, but it does improve reliability.

The important thing is that both ends need to know exactly what the other is sending.

**beheadedmyway** · 02-02-2011

Originally Posted by CommonTater

For communications to work, there has to be an understanding between both ends... and thus the programmers at both ends... of exactly what should be passed back and forth and what it means.

Think of it as a mini-language... You have to define each word individually. On your end you may be sending a uint_16... but on a different OS at the other end they might be receiving text ... it doesn't matter so long as the two are compatible enough to recover the data.

When programming network sequences I generally packetize with guard values between data fields. The guard value is something normally impossible in your commuications: For example -127 between character strings. So basically you end up with data-guard-data-guard etc. If you hit a guard value while decoding the packet you know you're off by at least one, and can often reparse to the previous guard and recover the packet. It's not perfect, but it does improve reliability.

The important thing is that both ends need to know exactly what the other is sending.

I see what you mean for network programming - it's up to both programmers to ensure that for example; if some piece of data in the stream is 32 bits the other programming should make sure to read 32 bits.

I guess my ultimate question is more related to passing back and forth files that are binary streams like this. If I save a file that gets emailed to some random computer that happens to use my serialization library for deserialization - but the compiler that compiled the library had a different size type - how can I account for that either at runtime or compile time.

An example I can think of is a machine that has a 32 bit long, and a different machine that has a 64 bit long. If on the source machine I write a long to a stream -> it outputs 32 bits. but on the destination machine the deserializer goes to read a long but the size of it on that machine is 64 bits. So the deserializer will eat up 4 extra bytes that it shouldn't. What's the best way to protect against those scenarios?

Thanks!

**~~CommonTater~~** · 02-02-2011

Originally Posted by beheadedmyway

An example I can think of is a machine that has a 32 bit long, and a different machine that has a 64 bit long. If on the source machine I write a long to a stream -> it outputs 32 bits. but on the destination machine the deserializer goes to read a long but the size of it on that machine is 64 bits. So the deserializer will eat up 4 extra bytes that it shouldn't. What's the best way to protect against those scenarios?

Thanks!

By being exactly specific about your protocal.

Really... right down to defining the exact position of important items in each packet...

In your example where you are sending a file... for example... he would request the file, you would respond with the file size in bytes, he would say Yay or Nay... and from there it's up to him.

In this case the internal sizes of the systems and the variables in play in code are far less important since files are generally manipulated whole-cloth at this level. Where it does get important is for example... a buffer with a 32 bit value that must be correctly received (lets say it's a passkey) This is where the size of variables is critical... If you expect 32 bits, then he'd better send 32 bits... and that's where you have to communicate very precisely with other programmers.

In case you haven't seen it yet, internet protocals are defined in the precise way I've been describing... It just couldn't be done any other way.

RFC Sourcebook

**beheadedmyway** · 02-02-2011

I guess maybe I should change my question slightly. or maybe i'm not understanding you correctly. I'm not concerned about sending it. what I'm concerned about is that when the source code is built on different computers that the sizes of types i'm expecting are the right sizes.

going back to my previous example of using the long. say I have two functions:

void write_long(serializer,long);
long read_long(deserializer);

let's say on my computer a long is 32 bits, I write one long to the serializer, then save it to a file.
give that file to someone who has my c library and is going to use it to deserialize the file.
on their computer a long is 64 bits, and the read_long would overflow when reading the file I gave them.

this is more in the lines of what I'm trying to understand. how do you ensure that your library being compiled on another machine ensures the right sizes? does that make more sense in terms of the question I'm trying to understand?

Thanks!

**beheadedmyway** · 02-02-2011

and I think tabstop might have answered my question earlier - that if a machine has uint8,uint16, etc, it's guaranteed to be that size.

**brewbuck** · 02-02-2011

Originally Posted by beheadedmyway

and I think tabstop might have answered my question earlier - that if a machine has uint8,uint16, etc, it's guaranteed to be that size.

I think you've basically figured out the gist of it, which is you cannot rely on C data types for this purpose, you need to carefully control the width of the types you are using.

Another thing that hasn't come up yet is different endian conventions between machines. Intel processors are little-endian, most of the rest of the world is big-endian. So you need to also define what the byte ordering will be in your serialized streams.

Despite the fact that most consumer hardware is Intel and therefore little-endian, it's common practice to use big-endian in situations where portability is an issue. It doesn't really matter, but you need to decide on one convention and stick to it.

Thread: c serializer - approaches to cross machine support

Thread Tools

Search Thread

Display

c serializer - approaches to cross machine support

Similar Threads

LDAP Query

Errors including <windows.h>

failure to import external C libraries in C++ project

Dev-cpp - compiler options