Thread: Save File Organization Question

  1. #1
    Registered User
    Join Date
    May 2011
    Posts
    3

    Save File Organization Question

    I've recently started to play with the idea of coding for more than just a passing hobby and one of the first problems that I ran into was what to do with save files. I understand how to write information to disk, but where I get lost is on how that information is typically organized and identified for retrieval later. Is every line given some sort of a tag so it can be searched and assigned to a variable when it's read after the program starts up? Does the program just read everything on line 1 and assign that to some variable?

    I guess what I'm looking for here, is if there are any standardized ways to handle reading and writing from disk in an organized and efficient way. I did search the forums but most of what I was able to find was dealing with specific problems. I am looking for more general knowledge. If someone could point me to a link or give me an idea what to search I would be grateful.

    Also, in case it helps, the file I am playing with would have to store a very large amount of different variables from many different objects. (Many thousands of objects)

    Thanks for any help you can give.

  2. #2
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    A lot of games, and not a few businesses, use record (struct) based filing systems...
    The basic idea is that you organize blocks of data as structs and then just write the struct itself to disk... memory to disk... disk to memory. No translation or intermediate steps needed. In larger storage systems files are made up of lots and lots of structs (millions in some cases) and you can access them very rapidly by using two common techinques "Binary Search" and "Random Access" where you know the size of the struct so if you want the 10,000th one you can go straight to it and read it in fasterthanthis... blink of an eye, really. When you don't know which struct you need Binary Search lets you find 1 of a thousand in only 10 tries, one of 2000 in only 11... even if it's the last record in the file... not quite blink of an eye but surpriszingly fast.

    Do some googling on these techniques, it's a fastinating read.

  3. #3
    Registered User
    Join Date
    May 2011
    Posts
    3

    Thanks

    Thanks for the info, this gives me a good place to start. I had an inkling that adding tags to every line was a terribly inefficient way to go about saving and loading stuff.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Zafaron View Post
    Thanks for the info, this gives me a good place to start. I had an inkling that adding tags to every line was a terribly inefficient way to go about saving and loading stuff.
    There are situations where it's the right way to do it...
    For example... the Random Access method works perfectly when you have the exact same information in large numbers of structs (i.e. they're all the same) ... For example an inventory program tracking 100,000 item inventory.

    However this system ain't worth crap if you can't standardize the records. When blocks are differing sizes or you have huge amounts of non-repeating data to store, Random Access ain't gonna do it for ya... That's when you break out the "Formatted Text File" method (look at any windows .ini file) and start reading sequentially.

    It really depends on what you're storing, how standardized it is and how many repetitions you have... You may end up with a sequential file of 30,000 variables that are otherwise unrelated or a Random Access file of 10 identical structs that are clearly related...

    It's about analysing your needs, doing the research and deciding the best course...
    No professional would settle for less.

  5. #5
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    A lot of games, and not a few businesses, use record (struct) based filing systems...The basic idea is that you organize blocks of data as structs and then just write the struct itself to disk... memory to disk... disk to memory. No translation or intermediate steps needed.
    Boy do I get tired of hearing that garbage repeated.

    No, dude, as I've said, in the real world, it is almost always more complicated than dumping whatever binary chunk the compiler gives you to a file; even when a simple flat binary is used in the real world, a specific endianness, packing, alignment, size, and format (such as IEEE floating points or some other form), is expected. No two compilers, or even one compiler with different options used during compile time, will agree on all of those details.

    [Edit]
    Oh, and as this is the C++ side of things, the number of things that compiler vendors can and do change between their different products to make the "simply dump all structures to a disk" route fail is significantly larger.
    [/Edit]

    It can work otherwise as long as you limit yourself to a single platform, provide tools to convert the file, or generate the given file only a temporary cache, but otherwise you are setting yourself up for failure.

    [Edit]
    Oh, I should have made a mention of this:

    It's about analysing your needs, doing the research and deciding the best course...
    No professional would settle for less.
    This bit though, is simply excellent advice.

    There is no universal standard that will match every need.

    If you want advice on any particular data, you'll have to discuss what you have.

    [/Edit]

    Soma
    Last edited by phantomotap; 05-11-2011 at 07:29 PM. Reason: none of your business

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by phantomotap View Post
    even when a simple flat binary is used in the real world, a specific endianness, packing, alignment, size, and format (such as IEEE floating points or some other form), is expected. No two compilers, or even one compiler with different options used during compile time, will agree on all of those details.
    I think I'm confused. Do companies often change compilers between printing copy number 50000 and 50001 of a production run?

  7. #7
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Do companies have data around for years? Yes.
    Do companies upgrade their development environment? Yes.
    Do companies often change compilers between printing copy number 50000 and 50001 of a production run? No.

    But then, that is irrelevant if they upgrade after the original production run only to ship a new product built from the same source with a different compiler that is still supposed to behave correctly in the face of data written by the original product.

    *gasp*

    By simply dumping the binary chunk the compiler handed you into a file you have no control over how it is laid out. That chunk can change in any number of ways between compiler versions, compiler vendors, different compiler flags, different operating systems, different hardware profiles, and pretty much any other reason the compiler vendor may choose to change an internal mechanism.

    Soma

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by phantomotap View Post
    But then, that is irrelevant if they upgrade after the original production run only to ship a new product built from the same source with a different compiler that is still supposed to behave correctly in the face of data written by the original product.
    I think I'm confused. Has that ever happened? (I mean, I'm cynical enough to believe that it's happened, and that a company would release a product without even testing it once, but I wouldn't expect it.)

  9. #9
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    I think I'm confused.
    I think you are.

    I'm sure that that has happened as an outside occurrence, but I doubt it is a common occurrence. My problem isn't that you couldn't trap the occurrence of a new version of a product failing to manipulate data generated with an old version of a product, but that the product vendor never had any control over the format of that data in the first place. Sure, you absolutely could, and must if old data must be handled, take the effort in the new version of the product to go through discovery of how that given compiler from that vendor with those options produces those binary characteristics and write the code to handle those exact characteristics.

    That situation just doesn't happen in the real world. (It happens, but is an extreme outlier.) The developer will either specify the expected format and produce code to handle that format (very definitely not just dumping whatever binary chunk the compiler hands you into a file), or the old data will be marked as "outdated" and no effort will be made to support old data versions. Both of these, they happen.

    The point of my concern is dishing out such horrible advice as "just dump the memory out to file" without at explaining the absolute, undeniable fact that you then have no control over the binary layout, its stability, or its portability.

    Soma

  10. #10
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by phantomotap View Post
    Boy do I get tired of hearing that garbage repeated.
    And boy do I ever get tired of you repeatedly harping along about how it's somehow a bad thing.

    It's done all the time!

    Give it a rest, ok?

  11. #11
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by phantomotap View Post
    By simply dumping the binary chunk the compiler handed you into a file you have no control over how it is laid out.
    Of course you do... look up #pragma pack().... You have total control over that.

    I wrote an inventory package in Turbo Pascal just before Borland killed the language (and yes they really did that) that was essentially a records based random access file with a couple of indexes for secondary sorts. It handles (yes, they still use it, in it's 7th version) about 20,000 to 30,000 items in an electronic partss inventory. When I switched over to C, I re-wrote the package for the company and installed it... while we did make a backup "just in case", C was perfectly able to access all records just like nothing happened...

    Really... you may be more experienced and smarter than me (at least you seem to think so) but about THIS, you are flat out wrong.

    EDIT: And exactly what do you think data bases like sql write? Yep, records based storage...
    Last edited by CommonTater; 05-11-2011 at 08:41 PM.

  12. #12
    Registered User
    Join Date
    May 2011
    Posts
    3
    I did a little bit more research on the topic after reading the comments and found a link that is a good place to start for anybody else who's having the same sort of issues. [36] Serialization and Unserialization Updated! , C++ FAQ

    Thanks again for the insight, just having a place to start searching was a great help.

  13. #13
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Give it a rest, ok?
    Sorry, but that's just not going to happen.

    If I see this advice again, I'll say something again. I'm not going to leave some poor newbie with bad information just because you are arrogant.

    I'd hope you'd do me the same favor.

    #pragma pack().... You have total control over that.
    The `pragma' directive is compiler specific, and that particular one only controls packing. It doesn't effect endianness, the size of the data types, or the format of floating point data elements on platforms with multiple possibilities. Naturally, anyone with any sense would recommend using the standard size types. The combination of compiler specific directives with the portable standard sized types still only give you about one-fourth of the picture.

    Really... you may be more experienced and smarter than me (at least you seem to think so) but about THIS, you are flat out wrong.
    Unfortunately, what I've said about this situation is statements of fact. They are not opinion.

    Here, instead of giving bad examples of why you think you are right; why not put you money where your mouth is and try an example, actually learn something, and be able to pass that along?

    Code:
    // add packing prefix if you like
    struct some_packed_data
    {
    // typedef unsigned 8 bit int
    // typedef unsigned 16 bit int
    // typedef unsigned 32 bit int
    // typedef signed 8 bit int
    // typedef signed 16 bit int
    // typedef signed 32 bit int
    // typedef unsigned 8 bit int array of 8 elements
    // typedef unsigned 16 bit int array of 8 elements
    // typedef unsigned 32 bit int array of 8 elements
    // typedef signed 8 bit int array of 8 elements
    // typedef signed 16 bit int array of 8 elements
    // typedef signed 32 bit int array of 8 elements
    // ASCII string type 64 element array useful for standard string functions
    // wide string type 64 element array useful for standard wide string functions
    // typedef for whatever happens to be native floating point type 1
    // typedef for whatever happens to be native floating point type 2
    // typedef for whatever happens to be native floating point type 3
    };
    // add packing postfix if you like
    
    // later in code
    fread(/*the above structure */);
    // print the above structures contents
    // randomly mutate the above structures contents
    fwrite(/* the above structure */);
    Now, compile on "MSVC2K3" for "Windows", "Intel C++ Compiler" for "Windows", "GCC" for "Windows", "GCC" for "GNU/Linux", and "Intel C++ Compiler" for "GNU/Linux".

    Now compile again with whatever the given compilers variations are for optimize for speed, again for size, again with whatever holds for "fast math", and again for whatever holds for "use native floating point width".

    Be amazed as you get about six different sizes and about twelve different binaries.

    And exactly what do you think data bases like sql write? Yep, records based storage...
    I sincerely hope you are joking, but just in case you are not, I'll give you an example of what a few "SQL" database engine providers applications do write from my time with them in years past.

    "Oracle" serializes data as a collection of "B*Tree" suites with a variation on branching not normally seen in the wild with all the pointers having the same endianness which is a determined from a flag from earlier in a given database chunk (file). Different elemental "SQL" types can have different underlying representations depending on flags used when creating the database, but one commonality is length prefixed strings in trees created as part of an indexing operation for a given table.

    "SQLite" serializes the possible 64 bit values of all of its native "SQL" integer types as well as indexes for internal use as a variation on a length prefix string targeting space optimization meaning that a seven bit value takes up a byte while a 64 bit value takes up nine bytes.

    "MySQL" with some community patches for faster text searching in text fields using a generalization of the same algorithm "Firefox" uses for spell checking builds two index trees for an indexing operating on a table with a full text search field where on table is sorted on fixed length 64 bit integers representing "UNIQUE_KEY" and while the other uses variable length, length prefixed strings of the shortened word from the variation of the spell checking algorithm as keys.

    Strangely enough, not one of these tools simply dumps the binary chunk the compiler hands them into a file.

    Soma
    Last edited by phantomotap; 05-11-2011 at 09:02 PM. Reason: none of your business

  14. #14
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by phantomotap View Post
    Sorry, but that's just not going to happen.

    If I see this advice again, I'll say something again. I'm not going to leave some poor newbie with bad information just because you are arrogant.

    I'd hope you'd do me the same favor.
    I see... so basically you are threatening me to shut me up about something you simply do not understand.

    Back to the bozo bin you go! (This is your 4th trip in by the way...and this time you ain't coming back out.)

  15. #15
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by phantomotap View Post
    Now, compile on "MSVC2K3" for "Windows", "Intel C++ Compiler" for "Windows", "GCC" for "Windows", "GCC" for "GNU/Linux", and "Intel C++ Compiler" for "GNU/Linux".
    I don't have MSVC2K3 here, only at work, so I'll have to run the test tomorrow. (I don't have Intel C++ at all. I seem to recall that that's not free for windows, but I'll double check, and if it's free I'll get it tonight and test it too.)

    The main point I was hoping you'd get to is that, so far as I can see, with all these questions about endianness and packing etc, the native format (i.e. how the struct is already stored) is The Right Answer >95% of the time. If you're in a situation where you actually have to choose one endianness over another, you'll know it well ahead of time. So why not start there and change it only if necessary?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 9
    Last Post: 03-29-2010, 09:07 PM
  2. Replies: 7
    Last Post: 04-15-2009, 10:35 AM
  3. Open File/Save File crashes on cancel?
    By Blackroot in forum Windows Programming
    Replies: 0
    Last Post: 08-02-2008, 02:16 AM
  4. Header organization question
    By Bigdog54 in forum C Programming
    Replies: 1
    Last Post: 01-28-2003, 11:56 AM
  5. Save to File
    By Korhedron in forum C++ Programming
    Replies: 19
    Last Post: 12-01-2002, 05:16 PM