i have two cents here:
if you're going to use text, use XML.
if you're going to use binary, consider picking up SQLite.
i have two cents here:
if you're going to use text, use XML.
if you're going to use binary, consider picking up SQLite.
[Edit]
Okay, the post was edited after I responded, but this has apparently already been seen.
*shrug*
I don't know.
[/Edit]
O_oBut to turn that back into what it was, you can't just iterate and replace like you did before -- you are going to need a state machine parser, lookaheads, etc.
The only context in that serialization (a simple self-escaping character scheme) is the escape character and the very next character that follows the escape character. So, actually, you do pretty much do iterate and replace exactly as you did before.
In the great before, you scan and dump until you find a character that needs to be escaped where you dump the escape character followed by the character that represents the escaped character. You essentially perform a single lookup.
In reading it back in, you scan and dump until you find the escape character where you look at the next character and dump the actual value of the representation. You essentially perform a reverse lookup.
A simple transition table, one entry per escaped character, gets you everything you need.
This isn't something as complex as XML.
Soma
Last edited by phantomotap; 06-14-2011 at 11:31 AM. Reason: none of your business
It's quite easy, you copy characters one at a time, unless you read in the escape. character. If you read an escape character, you switch on the next character to decide what to do. If there is no next character, and possibly if the next character is not a valid escape, you report an error.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
As a matter of interest, my posts weren't intended to offer support for the comments made by Elysia.
I realize now that my original intent was completely lost when I responded to what MK27 said directly instead of to the total context of the thread; instead I offered only a bit of flaky context to support what I didn't even explain.
*shrug*
I guess I need more sleep.
My intent in dropping by was to say that serialization is a complex field and there is no easy answer unless one does use a library designed specifically for that purpose. The real world will pretty much guarantee that it will not be as simple as "just use $X" even then.
Saying simply "use operators >> and <<" is pretty much as foolish and harmful as the other serialization related advice I very vocally despise.
Soma
Yeah, that's what I meant by lookahead. I'll admit I've been a bit bombastic here, sorry. My point was while using >> and plain text is possible it is NOT the easy and efficient way and is only a good choice if you need human readability in the file*, which I have not seen the OP (Whyrusleeping) state that as a goal.
* or have some bizarre aversion to low level I/O and binary files.
Last edited by MK27; 06-14-2011 at 11:53 AM.
C programming resources:
GNU C Function and Macro Index -- glibc reference manual
The C Book -- nice online learner guide
Current ISO draft standard
CCAN -- new CPAN like open source library repository
3 (different) GNU debugger tutorials: #1 -- #2 -- #3
cpwiki -- our wiki on sourceforge
It's not lookahead. But yeah you don't want to use >> and << for strings, but instead copy character by character. Strings and character arrays would be a special case in that kind of setup.
As for whether you want human readability, the answer is always yes for unencrypted content. The question is is that readability important enough to trump other considerations. For a beginner, readability is particularly important, because it makes debugging that much easier.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
Well, with a little help from here and a lot of man page referencing i finally got around to what i wanted to do (sorry if i was ever unclear on what i was asking, ive never really worked with file i/o before unless you count pickling in python)
basically, any word you type will be added to the file and typing !search will search the file to see if it contains that word.Code:class datas { public: char filename[64]; void s(const char *fname) { int l = strlen(fname); for(int i = 0; i < l; i++) { filename[i] = fname[i]; } filename[l] = 0; } void add(const char *word) { ofstream file (filename, ios::out | ios::app | ios::binary); file.seekp(0, ios::end); int m = strlen(word) + 1; char num[16]; sprintf(num, "%d", m); file.write (num, 4); file.write (word, m); file.close(); } int search(const char *word) { ifstream file (filename, ios::in | ios::binary); int s = 0; int a = 0; char num[16]; char w[32]; while(! file.eof()) { file.read(num, 4); a = atoi(num); file.read (w, a); if(!strcmp(word, w)) { file.close(); return 1; } } file.close(); return 0; } }; int main() { datas test; test.s("dt.bin"); system("rm dt.bin"); int run = 1; char inp[64]; char srch[64]; int a = 0; while (run == 1) { cin.getline(inp, 64); a = strlen(inp); cout << a; inp[a + 1] = 0; if(!strcmp(inp, "!exit")) { run = 0; cout << "exiting\n"; } else if(!strncmp(inp, "!search", 7)) { a = strlen(inp) - 8; for(int i = 0; i < a; i++) { srch[i] = inp[8 + i]; } srch[a] = 0; if(test.search(srch) == 1) { cout << "found in list!\n"; } } else { test.add(inp); } } return 0; }
[edit]
this strays from what i was orignally intending to do, but works much better for what i wanted
[/edit]
Last edited by Whyrusleeping; 06-14-2011 at 05:01 PM.
I was feeling generous today. Remove all red, add all blue. Also, read comments.
Also, everywhere, use better variable names.Code:class datas//conventionally classes should start with a capital letter. { public: char filename[64];//file names can be longer than 63 characters. Use an std::string. void s(const char *fname) { int l = strlen(fname); for(int i = 0; i < l; i++)//what if l>63? Never write a program that can crash. { filename[i] = fname[i]; } filename[l] = 0; } void add(const char *word) { ofstream file (filename, ios::out | ios::app | ios::binary); file.seekp(0, ios::end); int m = strlen(word) + 1;//you should just store the length. char num[16];//only need 11 sprintf(num, "%d", m); file.write (num, 4); /*Write the first four digits of the length? It won't crash, but it will cause an error. As long as you're aware, it's not vital to fix this. But consider:*/ file << setw(10) << m << std::ends; /*better not to uses ends, but this is what you intended. */ file.write (word, m); /*Better to write a \n instead of \0, so a text editor can read it.*/ file.close(); } int search(const char *word) { ifstream file (filename, ios::in | ios::binary); int s = 0; int a = 0; char num[16]={'\0'};//ensures null termination. /*also, this should be 12 chars long, not 16*/ char w[32]; while(! file.eof()) { file.read(num, 411); a = atoi(num); /*error if num is not null terminated. No program should crash when fed bad data*/ file.read (w, a); /* reading can fail, check for this so you don't use bad data.*/ if(!strcmp(word, w)) /*what if w>31? what if no null is read?*/ if(std::string(w,a-1) == word) /*strings are safer, because they store their own length, and don't overflow*/ { file.close(); return 1; } } file.close(); return 0; } }; int main() { datas test; test.s("dt.bin"); /*since a data's must have a database file, this should be a constructor, not a method*/ system("rm dt.bin"); int run = 1; char inp[64];//just use an std::string char srch[64]; int a = 0; while (run == 1) { cin.getline(inp, 64); a = strlen(inp); cout << a; inp[a + 1] = 0;//you don't want to do this. if(!strcmp(inp, "!exit")) { run = 0; cout << "exiting\n"; } else if(!strncmp(inp, "!search", 7)) { a = strlen(inp) - 8; for(int i = 0; i < a; i++) { srch[i] = inp[8 + i]; } srch[a] = 0; if(test.search(srch) == 1) { cout << "found in list!\n"; } } else { test.add(inp); } } return 0; }
Last edited by King Mir; 06-14-2011 at 07:09 PM.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
on the length of the input, i was meaning to cap it at 64, the database is meant to hold individual words, and i dont know any words that are 64 letters long...
While I won't discourage you from what you're trying to do, this code is extremely dangerous. You need to patch it up a bit.
Why you feel you need to do this is beyond me. A simple solution would be:Code:void s(const char *fname) { int l = strlen(fname); for(int i = 0; i < l; i++) { filename[i] = fname[i]; } filename[l] = 0; }
Btw, s is a very poor name for a member function.Code:void s(const std::string& fname) { filename = fname; }
Dangerous and a ticking time bomb. The size of an int is implementation defined, thus is also its length.Code:char num[16]; sprintf(num, "%d", m);
Furthermore, should you change %d to something else, or reduce the size of num, you can find yourself with buffer overruns.
A better approach might be:
(Requires boost library.)Code:std::string num = boost::lexical_cast<std::string>(m);
By getting rid of the C stuff, we can rewrite the add function:
(file.close() is not necessary; the destructor will do it for us.)Code:void add(const std::string& word) { ofstream file (filename, ios::out | ios::app | ios::binary); file.seekp(0, ios::end); auto length = word.size(); file << length << std::ends; file.write (word.c_str(), length + 1); }
We can also rewrite search:
Once again, file.close is not necessary.Code:int search(const std::string& word) { ifstream file (filename, ios::in | ios::binary); int length = 0; for (;;) { file >> length; if (file.eof()) return 0; std::vector<char> buf(length + 1); file.read(&buf[0], buf.size()); // A null terminator was written to file, as well if (file.eof()) return 0; std::string _Word(buf.begin(), buf.end()); if (word == _Word) return 1; } return 0; }
Undoubtedly, this has bugs in it, but it is much cleaner and safer than your C code.
Hey, thanks for all the help. but im a little lost on what some lines do (it works great, i just want to understand it):
in this line what does setw(10) do? also, i was warned by quite a few people about using << for binary files.
and im not sure what the vector is doing here:Code:file << std::setw(10) << length << std::ends;
also, in your rewritten search function, what happens when the word being searched for isnt in the file? it looks like an infinite loopCode:std::vector<char> buf(length);
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
You're not really writing a binary file. You're writing the number of characters lexically.
setw(10) sets the width of the next field written to be 10 characters long, or longer. It will prepend ' ' (defaultly), for the integer written, so to write the number 10 it would write " 10". 10 characters is the most that a four byte int would need. This is the same as printf("%10d",length).
Instead of a string or char array, Elysia chose to write the data into a char vector. I presume that this is because std::string data is not guaranteed to be continuous in memory, if that is in fact the case. A vector is like an array, but safer.and im not sure what the vector is doing here:
Code:std::vector<char> buf(length);
Indeed.also, in your rewritten search function, what happens when the word being searched for isnt in the file? it looks like an infinite loop
It is too clear and so it is hard to see.
A dunce once searched for fire with a lighted lantern.
Had he known what fire was,
He could have cooked his rice much sooner.
I'm a little floored that this has taken so long to get anywhere.
First decide how many bytes you want to write (see mask):
You may want to pick a smaller byte mask for smaller numbers, or if you know you're working on a machine with a crippled CPU. (BTW, istream::put, along with istream::get, is safe for binary files because they work with bytes only.)Code:for (unsigned int mask = 0xff000000; mask > 0; mask >>= CHAR_BIT) myfile.put((len & mask));
Then when you read, you just fetch that many bytes again during the unserialize part. Now, when you open a strange binary file, it might be in Big Endian or Little Endian; one of these is different from the byte order on your machine. So you should call a byte-swapping routine after you do this part.
If you're opening files you write on the host machine, endianness doesn't matter.Code:size_t len = 0; vector<char> bytes(4, '\0'); myfile.read(&bytes[0], bytes.size()); len = (bytes[0] << CHAR_BIT * 3) | (bytes[1] << CHAR_BIT * 2) | (bytes[2] << CHAR_BIT) | bytes[3];
Dump the string.
Now you know about all you need to know about writing and reading strings and integers portably in binary, which will be about 90% of the data you put in those files. ID3v1 tags found in MP3 files, for example, fit into 128 bytes. It should be fairly obvious you don't need length information there, just grab the whole block. So planning your data format is essential. It may turn out with substantial abuse that certain fields aren't long enough but that's just the way it goes. You put out another version of the format to address the problem.
Last edited by whiteflags; 06-15-2011 at 09:23 PM.