Thread: file size with binary write larger than it should be?

  1. #1
    Registered User
    Join Date
    Feb 2012
    Posts
    10

    file size with binary write larger than it should be?

    Hi everyone.

    I have what seemed to be a simple problem at the start (famous last words) but now I am pretty confused as to whats going wrong.

    I need to write several structures to a binary file. Within these structures are more, smaller structures, and within those are the actual data types (unsigned shorts, unsigned chars) that I want to keep track of. The following are the 'base' structures':

    Code:
    struct Time
    {
     unsigned short hour;
     unsigned char minute;
    };
    struct LogVersion
    {
     unsigned char digit1;
     unsigned char digit2;
    };
    struct Date
    {
     unsigned char month;
     unsigned char day;
     unsigned char year;
    };
    struct Counter
    {
     unsigned short count;
    };
    struct BatterySlope
    {
     unsigned char AverageSlope;
     unsigned short ComputationCount;
    };
    struct PowerParams
    {
     unsigned char AverageSlope;
     unsigned char AverageOffset;
    };
    struct AverageValue
    {
     unsigned char Average;
    };
    Now, when I try and write to file all of the structures, according to my math, I should have a file size near 550 bytes. However when I am done writing the file and go to check, the size is about 710. Big whoop you might say, but it is a requirement for this project that I get the file size to under 512 bytes, and it ought to be much closer than 710 right now. Keep in mind that I am only testing the application and do not have much, if any, manipulations to these data types and are being written as their default values of 0. So in trying to figure out what I am doing wrong, I wrote only one of the larger structures that has a collection of 3 sub structures, each with 5 ushort values:

    Code:
    struct mlog
    {
       struct Counter Count1[5];
       struct Counter Count2[5];
       struct Counter Count3[5];
    };
    This ought to yield a file size of 30 bytes, and it does. I move onto the next one, which is larger:

    Code:
    struct glog 
    {
     struct LogVersion Version; 
     struct Date CurrentDate; 
     struct Counter BootCount[6]; 
     struct Counter LockCount[5]; 
     struct Counter ProgramCount[5]; 
     struct Counter StandbyCount[6]; 
     struct Counter RCVolumeCount;
     struct Counter RCSensitivityCount;
     struct Counter RCProgramCount;
     struct Counter RCZoomControlCount; 
     struct Counter DuoPhoneCount; 
     struct Counter ListeningCheckCount; 
     struct Counter iComBTPhoneCount; 
     struct Counter iComA2DPCount; 
     struct Counter iComJackCount; 
     struct Counter iComFMCount;
     struct Counter ErrorCount;
     struct Time ZoomCtlUseTime;
     struct Time DuoPhoneUseTime;
     struct Time iComStreamingUseTime; 
    };

    Did a little counting, and based on what I have here, the file size ought to be 80 bytes, based on having 80 bytes worth of data scattered in this structure. I check the file size after write, and it is 88 bytes??? I have gone through and written the items individually to a file as well, and got the same result. I also have written just one item to file to be absolutely sure htat my items are the right size. For some reason if I try and write, for example, CurrentVersion.digit1, and then CurrentVersion.digit2 to file by themselves, the file size is 4 bytes when I have only written two 1-byte chars??? I assumed that this would bypass the whole 'structure padding' problem that I have read about as far as adding extra bytes to structures and putting those into a write, but it seems to have not done so if this ends up being the problem, and this is the point where I am seriously questioning my understanding of everything I am working with and its not just 'woops, I put in a short where I shoulda put in a char.'

    I have a third structure that I won't list because its already getting a little wordier than I wanted, but this structure is even larger than 'glog', has more of this mystery error in file size, and to top it off I have an array of 5 of those high size structures, which creates the gobs of extra file space that I sorely need to remove. Can someone tell me if theres some kind of fundamental understanding I am missing between structures, writing them to files, or something else I've overlooked?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Look up "padding and alignment".

    sizeof(struct BatterySlope) is almost certainly NOT going to be 3.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Feb 2012
    Posts
    10
    Quote Originally Posted by Salem View Post
    Look up "padding and alignment".

    sizeof(struct BatterySlope) is almost certainly NOT going to be 3.
    See that's another thing. My 'sizeof' function is not working as I expect it to. If I print out a 'sizeof' anything, it will tell me the amount of items it contains, as opposed to the amount of bytes contianed in the data. For example, BatterySlope's size, according to sizeof, is 2, and not 3, you are right. But in testing further I have written just a simple printf("%d", sizeof(unsigned short)) and I get 1 instead of 2 (bytes) as I expect I should.

    Is the extra space really a padding/alignment problem though? Because I thought I would have bypassed that by accessing each individual item within a structure and writing just that to the file, but it doesn't seem to work. Perhaps I am getting an extra byte of space put into structures that contain more than one data type. For example struct BatterySlope having a slope and an offset will cause a single byte to be placed between them, increasing the size of the structure and therefore the size of the file?

    I'll look into padding and alignment more, thank you.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    What machine are you compiling for?

    What value is CHAR_BIT on your machine?
    Code:
    #include <stdio.h>
    #include <limits.h>
    int main ( ) {
      printf("%d\n",CHAR_BIT);
      return 0;
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Quote Originally Posted by Rath View Post
    See that's another thing. My 'sizeof' function is not working as I expect it to. If I print out a 'sizeof' anything, it will tell me the amount of items it contains, as opposed to the amount of bytes contianed in the data. For example, BatterySlope's size, according to sizeof, is 2, and not 3, you are right. But in testing further I have written just a simple printf("%d", sizeof(unsigned short)) and I get 1 instead of 2 (bytes) as I expect I should.
    I wonder if you're just not doing something right. Like, sizeof returns a size_t, which is an unsigned integer type. But from the symptoms you describe, it's not the signedness of what you print that makes it wrong. It's more like you're making it off by one somehow. If you're curious enough, compile and run sys-sizes. It will tell you the various sizes of types for your machine correctly. If there are differences from what you've experienced, you definitely have a bug in your code somewhere.

  6. #6
    Registered User
    Join Date
    Feb 2012
    Posts
    10
    After reading a link I found here regarding padding/alignment: Structure Member Alignment, Padding and Data Packing | GeeksforGeeks

    I think I roughly understand a little bit of the problem I seem to be having. When I try and write a 'char' value to file (through fwrite if that makes a difference), it seems to always, no matter what I do, want to add another single byte with it. This is because the way the padding works between chars and other memory items, the memory is in 4 byte blocks, so that integers will have 0 bytes padded. Also, ushorts will have 0 bytes padded because it is possible to access memory in 2 byte chunks. However with 1 byte chars, there is no way to access this single byte of memory because there will always be an extra 1 byte of padding, as the minimum size of a single padded memory read is 2 bytes? Am I somewhat on base with this?

    If all that is true, then my problem seems clear. I need to find a way to be able to get that single byte into a file, without the extra byte that's attached to it. A solution I found was to use #pragma pack, but it seems to be very taboo from what people say about it, causing unsafe memory accessing and the like. Is there anything else that you guys can think of that would get me to put a single byte into a file? Frankly, I don't care about the padding of the structures, it seems like a good way to keep everything straight. But I do need to be able to remove the padding when I go to write the char into my file.


    EDIT: I just ran the "CHAR_BIT" code you wrote Salem, and it returned 16....... what the heck! Does this mean that all my chars are defined as 16 bit, i.e. 2 bytes, no matter what??? If that's the case, how would I even be able to have a 1 byte data type? I guess that would completely mess up my padding theory huh? Note: I am writing this for an embedded firmware application that uses a proprietary (though I believe gcc based) compiler.

  7. #7
    Registered User
    Join Date
    Sep 2008
    Location
    Toronto, Canada
    Posts
    1,834
    Put #pragma pack(1) before you define the structures. It will get rid of padding.

  8. #8
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Could you run this program please.
    Code:
    #include <stdio.h>
    
    int main() {
        char str[7];
        printf("%d %d %d %d\n", sizeof(char), sizeof(short),
                                sizeof(int),  sizeof(str));
        return 0;
    }
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  9. #9
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    With a 16-bit char and 16-bit short, sizeof(struct glog) would be 44 without padding. So the compiler is doubling the size of that struct (from 88 bytes to 176 bytes).

    What machine are you compiling for?
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  10. #10
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    Could a macro somewhere be overriding his compiler's sizeof op?
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  11. #11
    Registered User
    Join Date
    Feb 2012
    Posts
    10
    Quote Originally Posted by oogabooga View Post
    Could you run this program please.
    Code:
    #include <stdio.h>
    
    int main() {
        char str[7];
        printf("%d %d %d %d\n", sizeof(char), sizeof(short),
                                sizeof(int),  sizeof(str));
        return 0;
    }
    The result of this program yields:

    1 1 1 7

    So it would seem my sizeof function does not work in the same way as most of what I have read online seems to be suggesting.
    I am compiling for a DSP Core, made by Ceva called Teaklite.

    I am thinking I should try the #pragma pack(1) just to see what the file size results will be... if they are what I expected then its packing. If not, I have 16 bit char types and need to figure out something else.

  12. #12
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    In C, a char is the smallest data type that can have a unique address in memory. C only requires that CHAR_BIT be at least 8 (it could be more, as in your case).

    This in itself isn't a problem (as far as the code is concerned), though it does mess about with your expectations.

    All that really matters is that you can fwrite() out the structures and fread() them back in again, and be back where you started.

    > Put #pragma pack(1) before you define the structures. It will get rid of padding.
    But it won't change the underlying architecture of the machine, which has CHAR_BIT set to 16.
    So char and short are both the same size, and there is no padding to squeeze out.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  13. #13
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    It looks like you may have 16-bit ints as well.
    sizeof() actually doesn't return the number of bytes, but the number of chars.
    One last piece of code. Please try this:
    Code:
    #include <stdio.h>
    #include <limits.h>
    
    int main() {
        printf("%d %d\n", INT_MIN, INT_MAX);
        return 0;
    }
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  14. #14
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Quote Originally Posted by Salem View Post
    So char and short are both the same size, and there is no padding to squeeze out.
    There may still be padding. struct glog should be 44 "chars" long. But instead he said sizeof gives 88. (Unless I've miscounted.)
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Dunno - I read it as being "44" as reported by sizeof(), and "88" as reported by say "ls -l" on the file system, which may be counting bytes in the more traditional sense.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Image(bitmap) larger size than the initial
    By nutzu2010 in forum Windows Programming
    Replies: 2
    Last Post: 08-02-2011, 06:01 AM
  2. Defining Buffer size for write to file
    By catchaat in forum C Programming
    Replies: 5
    Last Post: 03-21-2011, 02:11 AM
  3. write an array of unspecified size to a file
    By c++guy in forum C++ Programming
    Replies: 3
    Last Post: 09-22-2010, 10:54 PM
  4. Write Binary File
    By doia in forum C Programming
    Replies: 14
    Last Post: 02-26-2010, 10:20 AM
  5. Need larger shmget() size.
    By endomlic in forum C Programming
    Replies: 2
    Last Post: 04-10-2009, 02:22 PM

Tags for this Thread