Thread: Serializing classes

  1. #31
    3735928559
    Join Date
    Mar 2008
    Location
    RTP
    Posts
    838
    i have two cents here:

    if you're going to use text, use XML.

    if you're going to use binary, consider picking up SQLite.

  2. #32
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    [Edit]
    Okay, the post was edited after I responded, but this has apparently already been seen.

    *shrug*

    I don't know.

    [/Edit]

    But to turn that back into what it was, you can't just iterate and replace like you did before -- you are going to need a state machine parser, lookaheads, etc.
    O_o

    The only context in that serialization (a simple self-escaping character scheme) is the escape character and the very next character that follows the escape character. So, actually, you do pretty much do iterate and replace exactly as you did before.

    In the great before, you scan and dump until you find a character that needs to be escaped where you dump the escape character followed by the character that represents the escaped character. You essentially perform a single lookup.

    In reading it back in, you scan and dump until you find the escape character where you look at the next character and dump the actual value of the representation. You essentially perform a reverse lookup.

    A simple transition table, one entry per escaped character, gets you everything you need.

    This isn't something as complex as XML.

    Soma
    Last edited by phantomotap; 06-14-2011 at 11:31 AM. Reason: none of your business

  3. #33
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by MK27 View Post
    Think about the parsing getting that out. Here's a string:

    "blah/
    hah-ha"

    So before serialization we iterate thru and turn that into:

    "blah///nhah-ha"

    Easy. But to turn that back into what it was, you can't just iterate and replace like you did before -- you are going to need a lookahead/lookback deal. Not so complicated, but unless there's a reason to do so...this is not going to make the task easier.

    Or maybe that's just a matter of style. I defer, point taken. The OP needs to decide whether human readability is important or not, because the later is quicker and simpler.
    It's quite easy, you copy characters one at a time, unless you read in the escape. character. If you read an escape character, you switch on the next character to decide what to do. If there is no next character, and possibly if the next character is not a valid escape, you report an error.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  4. #34
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    As a matter of interest, my posts weren't intended to offer support for the comments made by Elysia.

    I realize now that my original intent was completely lost when I responded to what MK27 said directly instead of to the total context of the thread; instead I offered only a bit of flaky context to support what I didn't even explain.

    *shrug*

    I guess I need more sleep.

    My intent in dropping by was to say that serialization is a complex field and there is no easy answer unless one does use a library designed specifically for that purpose. The real world will pretty much guarantee that it will not be as simple as "just use $X" even then.

    Saying simply "use operators >> and <<" is pretty much as foolish and harmful as the other serialization related advice I very vocally despise.

    Soma

  5. #35
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by King Mir View Post
    It's quite easy, you copy characters one at a time, unless you read in the escape. character. If you read an escape character, you switch on the next character to decide what to do.
    Yeah, that's what I meant by lookahead. I'll admit I've been a bit bombastic here, sorry. My point was while using >> and plain text is possible it is NOT the easy and efficient way and is only a good choice if you need human readability in the file*, which I have not seen the OP (Whyrusleeping‎) state that as a goal.

    * or have some bizarre aversion to low level I/O and binary files.
    Last edited by MK27; 06-14-2011 at 11:53 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #36
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by MK27 View Post
    Yeah, that's what I meant by lookahead. I'll admit I've been a bit bombastic here, sorry. My point was while using >> and plain text is possible it is NOT the easy and efficient way and is only a good choice if you need human readability in the file*, which I have not seen the OP (Whyrusleeping‎) state that as a goal.

    * or have some bizarre aversion to low level I/O and binary files.
    It's not lookahead. But yeah you don't want to use >> and << for strings, but instead copy character by character. Strings and character arrays would be a special case in that kind of setup.

    As for whether you want human readability, the answer is always yes for unencrypted content. The question is is that readability important enough to trump other considerations. For a beginner, readability is particularly important, because it makes debugging that much easier.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  7. #37
    Registered User
    Join Date
    May 2011
    Posts
    29
    Well, with a little help from here and a lot of man page referencing i finally got around to what i wanted to do (sorry if i was ever unclear on what i was asking, ive never really worked with file i/o before unless you count pickling in python)
    Code:
    class datas
    {
    public:
      char filename[64];
      void s(const char *fname)
      {
        int l = strlen(fname);
        for(int i = 0; i < l; i++)
          {
    	filename[i] = fname[i];
          }
        filename[l] = 0;
      }
    
      void add(const char *word)
      {
        ofstream file (filename, ios::out | ios::app | ios::binary);
        file.seekp(0, ios::end);
        int m = strlen(word) + 1;
        char num[16];
        sprintf(num, "%d", m);
        file.write (num, 4);
        file.write (word, m);
        file.close();
      }
    
      int search(const char *word)
      {
        ifstream file (filename, ios::in | ios::binary);
        int s = 0;
        int a = 0;
        char num[16];
        char w[32];
        while(! file.eof())
          {
    	file.read(num, 4);
    	a = atoi(num);
    	file.read (w, a);
    	if(!strcmp(word, w))
    	  {
    	    file.close();
    	    return 1;
    	  }
          }
        file.close();
        return 0;
      }
    	
    };
    
    
    
    int main()
    {
      datas test;
      test.s("dt.bin");
      system("rm dt.bin");
      int run = 1;
      char inp[64];
      char srch[64];
      int a = 0;
      while (run == 1)
        {
          cin.getline(inp, 64);
          a = strlen(inp);
          cout << a;
          inp[a + 1] = 0;
          if(!strcmp(inp, "!exit"))
    	{
    	  run = 0;
    	  cout << "exiting\n";
    	    }
    	else if(!strncmp(inp, "!search", 7))
    	  {
    	    a = strlen(inp) - 8;
    	    for(int i = 0; i < a; i++)
    	      {
    		srch[i] = inp[8 + i];
    	      }
    	    srch[a] = 0;
    	    if(test.search(srch) == 1)
    	      {
    		cout << "found in list!\n";
    	      }
    	  }
    	else
    	  {
    	    test.add(inp);
    	  }
        }
        return 0;
    }
    basically, any word you type will be added to the file and typing !search will search the file to see if it contains that word.
    [edit]
    this strays from what i was orignally intending to do, but works much better for what i wanted
    [/edit]
    Last edited by Whyrusleeping; 06-14-2011 at 05:01 PM.

  8. #38
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    I was feeling generous today. Remove all red, add all blue. Also, read comments.

    Code:
    class datas//conventionally classes should start with a capital letter. 
    {
    public:
      char filename[64];//file names can be longer than 63 characters. Use an std::string.
      void s(const char *fname)
      {
        int l = strlen(fname);
        for(int i = 0; i < l; i++)//what if l>63? Never write a program that can crash.
          {
    	filename[i] = fname[i];
          }
        filename[l] = 0;
      }
    
      void add(const char *word)
      {
        ofstream file (filename, ios::out | ios::app | ios::binary);
        file.seekp(0, ios::end);
        int m = strlen(word) + 1;//you should just store the length.
        char num[16];//only need 11
        sprintf(num, "%d", m);
        file.write (num, 4);
        /*Write the first four digits of the length? It won't crash, but it will cause an 
           error. As long as you're aware, it's not vital to fix this. 
           But consider:*/
        file << setw(10) << m << std::ends;
        /*better not to uses ends, but this is what you intended. */
        file.write (word, m);
        /*Better to write a \n instead of \0, so a text editor can read it.*/
        file.close();
      }
    
      int search(const char *word)
      {
        ifstream file (filename, ios::in | ios::binary);
        int s = 0;
        int a = 0;
        char num[16]={'\0'};//ensures null termination.
        /*also, this should be 12 chars long, not 16*/
        char w[32];
        while(! file.eof())
          {
    	file.read(num, 411);
    	a = atoi(num);
            /*error if num is not null terminated. No program should crash when fed 
               bad data*/
    	file.read (w, a);
            /* reading can fail, check for this so you don't use bad data.*/
    	if(!strcmp(word, w))
            /*what if w>31? what if no null is read?*/
            if(std::string(w,a-1) == word)
            /*strings are safer, because they store their own length, and don't
               overflow*/
    	  {
    	    file.close();
    	    return 1;
    	  }
          }
        file.close();
        return 0;
      }
    	
    };
    
    
    
    int main()
    {
      datas test;
      test.s("dt.bin");
      /*since a data's must have a database file, this should be a constructor, not
         a method*/
      system("rm dt.bin");
      int run = 1;
      char inp[64];//just use an std::string
      char srch[64];
      int a = 0;
      while (run == 1)
        {
          cin.getline(inp, 64);
          a = strlen(inp);
          cout << a;
          inp[a + 1] = 0;//you don't want to do this.
          if(!strcmp(inp, "!exit"))
    	{
    	  run = 0;
    	  cout << "exiting\n";
    	    }
    	else if(!strncmp(inp, "!search", 7))
    	  {
    	    a = strlen(inp) - 8;
    	    for(int i = 0; i < a; i++)
    	      {
    		srch[i] = inp[8 + i];
    	      }
    	    srch[a] = 0;
    	    if(test.search(srch) == 1)
    	      {
    		cout << "found in list!\n";
    	      }
    	  }
    	else
    	  {
    	    test.add(inp);
    	  }
        }
        return 0;
    }
    Also, everywhere, use better variable names.
    Last edited by King Mir; 06-14-2011 at 07:09 PM.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  9. #39
    Registered User
    Join Date
    May 2011
    Posts
    29
    on the length of the input, i was meaning to cap it at 64, the database is meant to hold individual words, and i dont know any words that are 64 letters long...

  10. #40
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    While I won't discourage you from what you're trying to do, this code is extremely dangerous. You need to patch it up a bit.
    Code:
      void s(const char *fname)
      {
        int l = strlen(fname);
        for(int i = 0; i < l; i++)
          {
    	filename[i] = fname[i];
          }
        filename[l] = 0;
      }
    Why you feel you need to do this is beyond me. A simple solution would be:
    Code:
      void s(const std::string& fname)
    {
    	filename = fname;
    }
    Btw, s is a very poor name for a member function.

    Code:
        char num[16];
        sprintf(num, "%d", m);
    Dangerous and a ticking time bomb. The size of an int is implementation defined, thus is also its length.
    Furthermore, should you change %d to something else, or reduce the size of num, you can find yourself with buffer overruns.
    A better approach might be:
    Code:
    std::string num = boost::lexical_cast<std::string>(m);
    (Requires boost library.)

    By getting rid of the C stuff, we can rewrite the add function:
    Code:
      void add(const std::string& word)
    {
        ofstream file (filename, ios::out | ios::app | ios::binary);
        file.seekp(0, ios::end);
        auto length = word.size();
        file << length << std::ends;
        file.write (word.c_str(), length + 1);
    }
    (file.close() is not necessary; the destructor will do it for us.)

    We can also rewrite search:
    Code:
    int search(const std::string& word)
    {
        ifstream file (filename, ios::in | ios::binary);
        int length = 0;
    
        for (;;)
        {
    	file >> length;
    	if (file.eof()) return 0;
    	std::vector<char> buf(length + 1);
    	file.read(&buf[0], buf.size()); // A null terminator was written to file, as well
    	if (file.eof()) return 0;
    	std::string _Word(buf.begin(), buf.end());
    
            if (word == _Word)
    	    return 1;
        }
        return 0;
    }
    Once again, file.close is not necessary.
    Undoubtedly, this has bugs in it, but it is much cleaner and safer than your C code.
    Last edited by Elysia; 06-16-2011 at 12:16 AM. Reason: Fixed buffer overrun + infinite loop
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  11. #41
    Registered User
    Join Date
    May 2011
    Posts
    29
    Hey, thanks for all the help. but im a little lost on what some lines do (it works great, i just want to understand it):
    in this line what does setw(10) do? also, i was warned by quite a few people about using << for binary files.
    Code:
    file << std::setw(10) << length << std::ends;
    and im not sure what the vector is doing here:
    Code:
    std::vector<char> buf(length);
    also, in your rewritten search function, what happens when the word being searched for isnt in the file? it looks like an infinite loop

  12. #42
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by Whyrusleeping View Post
    on the length of the input, i was meaning to cap it at 64, the database is meant to hold individual words, and i dont know any words that are 64 letters long...
    The thing is, you don't want your program to crash when sent bad data. It doesn't need to report an error, although that would be nice, but it should never crash. So that's why I suggest using std::string.

    Similarly for the file name.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  13. #43
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by Elysia View Post
    Code:
        char num[16];
        sprintf(num, "%d", m);
    Dangerous and a ticking time bomb. The size of an int is implementation defined, thus is also its length.
    Furthermore, should you change %d to something else, or reduce the size of num, you can find yourself with buffer overruns.
    Of all the things wrong with his code, assuming that int is 4 bytes isn't one that counts. That's a pretty reasonable assumption.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  14. #44
    Registered User
    Join Date
    Apr 2006
    Posts
    2,149
    Quote Originally Posted by Whyrusleeping View Post
    Hey, thanks for all the help. but im a little lost on what some lines do (it works great, i just want to understand it):
    in this line what does setw(10) do? also, i was warned by quite a few people about using << for binary files.
    Code:
    file << std::setw(10) << length << std::ends;
    You're not really writing a binary file. You're writing the number of characters lexically.

    setw(10) sets the width of the next field written to be 10 characters long, or longer. It will prepend ' ' (defaultly), for the integer written, so to write the number 10 it would write " 10". 10 characters is the most that a four byte int would need. This is the same as printf("%10d",length).

    and im not sure what the vector is doing here:
    Code:
    std::vector<char> buf(length);
    Instead of a string or char array, Elysia chose to write the data into a char vector. I presume that this is because std::string data is not guaranteed to be continuous in memory, if that is in fact the case. A vector is like an array, but safer.

    also, in your rewritten search function, what happens when the word being searched for isnt in the file? it looks like an infinite loop
    Indeed.
    It is too clear and so it is hard to see.
    A dunce once searched for fire with a lighted lantern.
    Had he known what fire was,
    He could have cooked his rice much sooner.

  15. #45
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    I'm a little floored that this has taken so long to get anywhere.

    First decide how many bytes you want to write (see mask):
    Code:
    for (unsigned int mask = 0xff000000; mask > 0; mask >>= CHAR_BIT) 
    	myfile.put((len & mask));
    You may want to pick a smaller byte mask for smaller numbers, or if you know you're working on a machine with a crippled CPU. (BTW, istream::put, along with istream::get, is safe for binary files because they work with bytes only.)


    Then when you read, you just fetch that many bytes again during the unserialize part. Now, when you open a strange binary file, it might be in Big Endian or Little Endian; one of these is different from the byte order on your machine. So you should call a byte-swapping routine after you do this part.
    Code:
    size_t len = 0;
    vector<char> bytes(4, '\0');
    
    myfile.read(&bytes[0], bytes.size());
    len = (bytes[0] << CHAR_BIT * 3) | (bytes[1] << CHAR_BIT * 2) | (bytes[2] << CHAR_BIT) | bytes[3];
    If you're opening files you write on the host machine, endianness doesn't matter.

    Dump the string.

    Now you know about all you need to know about writing and reading strings and integers portably in binary, which will be about 90% of the data you put in those files. ID3v1 tags found in MP3 files, for example, fit into 128 bytes. It should be fairly obvious you don't need length information there, just grab the whole block. So planning your data format is essential. It may turn out with substantial abuse that certain fields aren't long enough but that's just the way it goes. You put out another version of the format to address the problem.
    Last edited by whiteflags; 06-15-2011 at 09:23 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Conversions between base classes and derived classes
    By tharnier in forum C++ Programming
    Replies: 14
    Last Post: 03-18-2011, 10:50 AM
  2. Classes access other classes local variables
    By parad0x13 in forum C++ Programming
    Replies: 6
    Last Post: 01-14-2010, 04:36 AM
  3. Serializing/deserializing problem
    By vsla in forum C Programming
    Replies: 3
    Last Post: 04-21-2008, 03:55 PM
  4. Serializing problem.. (can't use >> operator)
    By RancidWannaRiot in forum Windows Programming
    Replies: 2
    Last Post: 10-29-2005, 11:10 AM
  5. Serializing a class
    By Prog.Patterson in forum C++ Programming
    Replies: 4
    Last Post: 10-27-2005, 10:21 PM