Thread: Read/Write a binary variable length sequential file in C

  1. #1
    Registered User
    Join Date
    Jul 2005
    Location
    Austin Texas
    Posts
    7

    Read/Write a binary variable length sequential file in C

    Help!!!!! Well here is the deal, Ive been a COBOL coder for my entire career. I now work for a new company that has asked me to pick up a C book and become an expert in one week! Hooraaa! My first assignment is to read in a binary variable length file (which has yet to be defined) and then write it back out, (which has yet to be defined).... So at least I am trying to figure out how to read this type of structure in and how to write it back out and I will plug in the details later. Trouble is I can't find any definitive info on how to do this, or it is staring me right in the face and I don't understand it (probably this one). Can anyone cut/paste an example of this for me? I'm not sure if I should be using seeks or scans or what the heck I should do...

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >I now work for a new company that has asked me to pick up a C book and become an expert in one week!
    That's not unheard of, or even unrealistic depending on your past experience.

    >Can anyone cut/paste an example of this for me?
    Look up the fopen function. The mode you want to use for reading is "rb" and for writing, "wb". Once you have the files open, the simplest method is to copy byte by byte:
    Code:
    int ch;
    
    while ( ( ch = fgetc ( in ) ) != EOF )
      fputc ( ch, out );
    My best code is written with the delete key.

  3. #3
    Registered User
    Join Date
    Jul 2005
    Location
    Austin Texas
    Posts
    7
    Thanks but I have been told that reading byte to byte will kill performance. That I should dump data to a buffer and read from there using fread's. Also, since I am reading a binary file, shouldn't I be using an FEOF instead of EOF? I'm still totally confused... I don't understand how to dump data into a buffer and know how to detect the end of each record?? It's so easy to do in COBOL.... I know I will have a record length at the beginning of each record, I guess I can use that to read the size of each record, I just need to figure out how to do it.

  4. #4
    Registered User mitakeet's Avatar
    Join Date
    Jun 2005
    Location
    Maryland, USA
    Posts
    212
    I would not bother with performance metrics until the program works correctly AND it has been shown to be a performance bottleneck (in this case the OS will be buffering reads for you and I believe fgetc is a macro, so there is not a huge amount of overhead (it may not even be measurable if the file is not already in cache)). However, explore fread and fwrite for handling data in chunks.

    I am not sure if COBOL files are always in 'text', but all files are 'binary' as stored on the disk (and in memory), there is only different ways of interpreting the data. When you read in a file in 'text' mode, there is a lot of assumptions being made (particularlly wrt non-printable characters) which is not made in 'binary' mode. Since file contents are just a string of bits to the OS and program, any 'structure' is purely imposed by the programmer.

    Free code: http://sol-biotech.com/code/.

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw

  5. #5
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    There is no "FEOF". EOF is a... aw hell, just go read the FAQs on both.


    Quzah.
    Hope is the first step on the road to disappointment.

  6. #6
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >but I have been told that reading byte to byte will kill performance
    You shouldn't listen to people who overgeneralize, especially when they have no idea what they're talking about. Since they were probably thinking about fread and fwrite, both methods would rely solely on the buffering of the stream for performance. fread successively calls fgetc, and fwrite successively calls fputc.

    In my implementation of stdio, there was no noticeable peformance difference between the fgetc/fputc loop and the equivalent code using fread and fwrite because any device I/O was avoided until the buffer filled itself, and device I/O is the big hitter when it comes to performance in I/O libraries.

    >shouldn't I be using an FEOF instead of EOF?
    No, FEOF doesn't exist. There is an feof function, but beginners to C have a hard time using it correctly.

    >I'm still totally confused
    Get used to it. These days, I'm still confused, but it's a whole new level of confused.

    >I don't understand how to dump data into a buffer and know how to detect the end of each record??
    Now, you didn't say anything about individual records. Your words were "variable length file", with no mention of records. If you want a good answer, you need to ask a question accurate to your problem. But, if you want to copy blocks, you can do it like this:
    Code:
    char block[1024];
    size_t n;
    
    while ( ( n = fread ( block, 1, sizeof block, in ) ) == sizeof block ) {
      if ( fwrite ( block, 1, sizeof block, out ) != sizeof block ) {
        /* Write error */
      }
    }
    
    if ( feof ( in ) ) {
      /* Copy the last block */
      if ( fwrite ( block, 1, n, out ) != n ) {
        /* Write error */
      }
    }
    else {
      /* Read error */
    }
    When you get into separating the file into records, you have to format the file so as to make it easy. For example, if the file is written so that the byte count of each record is prepended to the record, you can read the file as you would in COBOL using fread as shown above.
    My best code is written with the delete key.

  7. #7
    Registered User
    Join Date
    Jul 2005
    Location
    Austin Texas
    Posts
    7
    First of all, thanks to everyone who has take a moment of their busy day to attempt to help me out! I appreciate it as this is a great company and I would hate to have to leave it. Looking at Prelude's information, I'm starting to get a vague picture of what I'm needing to do, and I will try to be more detailed! It looks like part of my problem has been that I didn't know the true file structure of my input file other than it is going to be a variable length binary file. I pushed for some kind of record format and here is what I got:

    HDR_LL SHORT; /* Length of the record */
    HDR_ZZ SHORT; /* ZZ field (should be zeros) */
    HDR_RTYP SHORT; /* Record Type */
    HDR_STYP SHORT; /* Record sub-type */
    HDR_M_VER SHORT; /* Meta Data Version */
    HDR_ABX_VER SHORT; /* File Export Version */
    HDR_CREATE_TIME INT; /* Creation Time */
    HDR_CREATE_DATE INT; /* Creation Date */
    HDR_DB2_PTR ????? /* Pointer to DB2 segment */
    HDR_IMS_PTR ????? /* Pointer to IMS segment */
    HDR_SEQ_PTR ????? /* Pointer to SEQ segment */
    HDR_VSAM_PTR ????? /* Pointer to VSAM segment */

    So I should have the record length from the first field. So by your statements Prelude, do I understand it that I read in up to 1024 bytes into my buffer and then determine the record length by reading the value in the SHORT bytes at the beginning? Then I read the next rec in the buffer using the same process?

  8. #8
    Registered User mitakeet's Avatar
    Join Date
    Jun 2005
    Location
    Maryland, USA
    Posts
    212
    Since you cannot rely on structure memory layout (due to padding), you need to read in each header variable as a separate read. I presume short is two bytes on your machine and further that there are no endian issues, so do an fread with two bytes into a pointer to a short (cast to an unsigned char * if you are using a c++ compiler). Repeat that for all the shorts, then switch to sizeof(int) (presumably 4, but be certain of that) for the time/date, then get the ptrs (I presume they would also be sizeof(int), but you must verify).

    If you have endian issues, you are going to have to do byte swapping after you do your reads.

    Free code: http://sol-biotech.com/code/.

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw

  9. #9
    Registered User
    Join Date
    Jul 2005
    Location
    Austin Texas
    Posts
    7
    I have to use a C compiler only because they told me that the code has to be multi-platform compatible (LUW) although this is being built strictly on the mainframe (Zos). They say that C is better at that than C++.

  10. #10
    Registered User mitakeet's Avatar
    Join Date
    Jun 2005
    Location
    Maryland, USA
    Posts
    212
    There are not very many environments where there is a lack of capable C++ compilers, though I can't speak about mainframes. C++ compilers do better type checking (even on C code) and almost all warnings/errors you get from a C++ compiler are likely bugs in your C code. Just stick with ANSI C and use a C++ compiler while doing your development and testing and when you port it to your mainframe environment you should not have any problems at all.

    Free code: http://sol-biotech.com/code/.

    It is not that old programmers are any smarter or code better, it is just that they have made the same stupid mistake so many times that it is second nature to fix it.
    --Me, I just made it up

    The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore, all progress depends on the unreasonable man.
    --George Bernard Shaw

  11. #11
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by pdkhoury
    So I should have the record length from the first field. So by your statements Prelude, do I understand it that I read in up to 1024 bytes into my buffer and then determine the record length by reading the value in the SHORT bytes at the beginning? Then I read the next rec in the buffer using the same process?
    This old post seems to be somewhat related to what you are trying to do. Maybe there is something there that will be helpful.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  12. #12
    Registered User
    Join Date
    Jul 2005
    Location
    Austin Texas
    Posts
    7
    Thanks for the old link, I have been looking at the BasicIO.ccp and it is helping; however, I did a cut/paste into my Studio and go "cannot convert from 'void *' to 'char *' " on the memory allocation lines. I wonder, is this caused because I am compiling for C and not C++?
    The malloc line looks like this:
    info.name = malloc(info.names * sizeof(*info.name));

  13. #13
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by pdkhoury
    I did a cut/paste into my Studio and go "cannot convert from 'void *' to 'char *' " on the memory allocation lines. I wonder, is this caused because I am compiling for C and not C++?
    That message is typical of compiling C code with a C++ compiler.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  14. #14
    Registered User
    Join Date
    Jul 2005
    Location
    Austin Texas
    Posts
    7
    Well I'm ready to give up. I think I can hang out here a few more weeks before they fire me. Then maybe I can try truck driving school!

  15. #15
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    Maybe some of these will help (take a look at the source, hopefully it'll give you some idea how to read file!).

    http://www.dignus.com/freebies/

    Or try a redbook:
    http://www.redbooks.ibm.com/abstract...5992.html?Open
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File transfer- the file sometimes not full transferred
    By shu_fei86 in forum C# Programming
    Replies: 13
    Last Post: 03-13-2009, 12:44 PM
  2. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  3. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM