Thread: Question about file IO

  1. #1
    Registered User
    Join Date
    Nov 2005
    Posts
    673

    Question about file IO

    I am thinking of creating a archive format similar to that of the MoPaQ used by blizzard entertainment, but there are a few things that I do not understand completely.

    1) What is the best method for padding to make certain offsets certain.
    Such as offset
    1. 0x00 Version String ( 1.1B )
    2. 0x04 Hash Table Offset ( i.e. 123456789 )
    3. 0x08 MetaData Table Offset ( i.e. 2395321 )
    4. 0x0A BlockSize ( i.e. 1024 )
    I guess my main question is. If I write 123456789 to file it will take 9 bytes. Is there a way to make that only take 4 bytes of file space? Or else how can I maintain a consistent padding. So, Locations can be known before writing.

  2. #2
    Registered User
    Join Date
    Aug 2010
    Location
    Poland
    Posts
    733
    It occupies 9 bytes because you open stream in text mode and each letter occupies 1 byte. You need to open in binary mode. There is very much to say, so it will be better if you google it on your own.
    The main difference is that writing in binary format is simply dumping memory to the disk (data is copied and stored in the same format, size, etc), this is why integers stored will have the same size and endianess.

    Padding bytes are a different thing, which you also need to care about, especially when writing/reading structures.

  3. #3
    Registered User
    Join Date
    Nov 2005
    Posts
    673
    I know how to use std::ios::binary, but Im not exactly sure what you mean by dumping memory to disk. I may be wrong but setting binary flag only stops conversions for line endings correct?

    If you know of any good resources for binary files please post them, I will be google'ing in the meantime

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Raigne
    I know how to use std::ios::binary, but Im not exactly sure what you mean by dumping memory to disk.
    You're probably using the overloaded operator<< to write the integer as formatted output. What you want to do instead is to treat the integer as a sequence of four bytes and write those bytes to disk. Things like endianness and whether the integer is exactly four bytes may come into play.

    Quote Originally Posted by Raigne
    I may be wrong but setting binary flag only stops conversions for line endings correct?
    Yes.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Nov 2005
    Posts
    673
    I am trying to use ostream::write(), but I am not sure how to convert an integer to char* to use that method.

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Raigne View Post
    I am trying to use ostream::write(), but I am not sure how to convert an integer to char* to use that method.
    You don't. You pass in a pointer to where, in memory, the thing you want to write is.

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Raigne View Post
    I am thinking of creating a archive format similar to that of the MoPaQ used by blizzard entertainment, but there are a few things that I do not understand completely.

    1) What is the best method for padding to make certain offsets certain.
    Such as offset
    1. 0x00 Version String ( 1.1B )
    2. 0x04 Hash Table Offset ( i.e. 123456789 )
    3. 0x08 MetaData Table Offset ( i.e. 2395321 )
    4. 0x0A BlockSize ( i.e. 1024 )
    I guess my main question is. If I write 123456789 to file it will take 9 bytes. Is there a way to make that only take 4 bytes of file space? Or else how can I maintain a consistent padding. So, Locations can be known before writing.
    You should note that the actual values do not go at 2 and 3... these are offsets into the file where the actual values are stored. (Think of it like a pointer in memory.) So position 2 (starting at byte 4 of the file) actually contains the position of the first byte of the hash table... not the hash table itself.

    Also note that since they are using 4 byte offsets... the file format is naturally limited to 4gb - overhead (hash table, meta data, header, etc.)
    Last edited by CommonTater; 05-15-2011 at 11:42 AM.

  8. #8
    Registered User
    Join Date
    Nov 2005
    Posts
    673
    This is the header from the MoPaQ or mpq file blizzard entertainment uses.
    Code:
    00h: char(4) Magic             Indicates that the file is a MoPaQ archive. Must be ASCII "MPQ" 1Ah.
    04h: int32 HeaderSize          Size of the archive header.
    08h: int32 ArchiveSize         Size of the whole archive, including the header. Does not include the strong digital signature, 
                                   if present. This size is used, among other things, for determining the region to hash in computing 
                                   the digital signature. This field is deprecated in the Burning Crusade MoPaQ format, and the size 
                                   of the archive is calculated as the size from the beginning of the archive to the end of the 
                                   hash table, block table, or extended block table (whichever is largest).
    0Ch: int16 FormatVersion       MoPaQ format version. MPQAPI will not open archives where this is negative. Known versions:
    	0000h                  Original format. HeaderSize should be 20h, and large archives are not supported.
    	0001h                  Burning Crusade format. Header size should be 2Ch, and large archives are supported.
    0Eh: int8 SectorSizeShift      Power of two exponent specifying the number of 512-byte disk sectors in each logical sector 
                                   in the archive. The size of each logical sector in the archive is 512 * 2^SectorSizeShift. 
                                   Bugs in the Storm library dictate that this should always be 3 (4096 byte sectors).
    10h: int32 HashTableOffset     Offset to the beginning of the hash table, relative to the beginning of the archive.
    14h: int32 BlockTableOffset    Offset to the beginning of the block table, relative to the beginning of the archive.
    18h: int32 HashTableEntries    Number of entries in the hash table. Must be a power of two, and must be less than 2^16 
                                   for the original MoPaQ format, or less than 2^20 for the Burning Crusade format.
    1Ch: int32 BlockTableEntries   Number of entries in the block table.
    
    Fields only present in the Burning Crusade format and later:
    
    20h: int64 ExtendedBlockTableOffset   Offset to the beginning of the extended block table, relative to the beginning of the archive.
    28h: int16 HashTableOffsetHigh        High 16 bits of the hash table offset for large archives.
    2Ah: int16 BlockTableOffsetHigh       High 16 bits of the block table offset for large archives.
    MPQ files get much larger than 4gb.

    I am looking for a similar approach.

    Thanks for the valued input.

    How would I go about getting the address of the first byte?

  9. #9
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    As you can see, they have a 64-bit size field at 0x20 to allow for >4gb files.

    As for "first byte" look at the description again: "Offset to the beginning of the hash table, relative to the beginning of the archive." So when you are writing your file, you look at how many bytes you've written before you get to the hash table (headers and whatever else goes first).

  10. #10
    Registered User
    Join Date
    Nov 2005
    Posts
    673
    Yes, I do not need them to be any larger than 4gb. Since my projects will be in absolutely no comparison in size to World of Warcraft.

    Sorry I didn't clarify my meaning of "first byte". I was wanting to know how to get the memory location of the first byte of a given data type, so that I can write() it to a file.

  11. #11
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Raigne View Post
    Sorry I didn't clarify my meaning of "first byte". I was wanting to know how to get the memory location of the first byte of a given data type, so that I can write() it to a file.
    I would like to reply with the post "&", but that is too short to meet forum regulations.

  12. #12
    Registered User
    Join Date
    Nov 2005
    Posts
    673
    I am sorry I am not explaining myself well enough. I know how to get the address, I dont know how to pass it correctly to the ofstream::write() method.

  13. #13
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Look at the manual for write: you will see it takes an address and a size, in that order. So give write the address of the data and how much data there is to write.

  14. #14
    Registered User
    Join Date
    Nov 2005
    Posts
    673
    Thank you for the informative responses, and not so much for #11. It was a simple question.

    This a little test case. Is the read() method the only way to achieve the correct result
    Code:
    void write()
    {
    	int t = 1000;
    	std::ofstream out("Test", std::ios::trunc | std::ios::binary );
    	out.write( (char*)&t, 4);
    	
    	out.close();
    	out.clear();
    }
    
    void read()
    {
    	char buffer[5] = {0};
    	std::ifstream in("Test", std::ios::binary );
    	in.read(buffer, 4);
    	int t = 0;
    	((char*)&t)[0] = buffer[0];
    	((char*)&t)[1] = buffer[1];
    	((char*)&t)[2] = buffer[2];
    	((char*)&t)[3] = buffer[3];
    	gint = t;
    }
    	
    int main()
    {
    	write();
    	read();
    	std::cout << gint << std::endl;
    	system("pause");
    	return 0;
    }
    If this is the case, I will have to make a function for each type to be converted back from the file?

  15. #15
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    You're joking, right?
    Code:
    void read()
    {
        int t;
        std::ifstream in("Test", std::ios::binary);
        in.read( (char*) &t, sizeof(t));
    }
    (EDIT: After all, you got write right. Why wouldn't you read the same way you write?)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Newbish Question file reading question....
    By kas2002 in forum C Programming
    Replies: 23
    Last Post: 05-17-2007, 12:06 PM
  2. File IO question
    By CaeZaR in forum C++ Programming
    Replies: 2
    Last Post: 02-07-2006, 02:01 AM
  3. File I/O Question
    By Achy in forum C Programming
    Replies: 2
    Last Post: 11-18-2005, 12:09 AM
  4. File question-Testing if file exists
    By fuh in forum C++ Programming
    Replies: 2
    Last Post: 01-28-2003, 07:11 PM
  5. I have a file I/O question as well
    By Unregistered in forum C Programming
    Replies: 1
    Last Post: 11-29-2001, 04:11 PM