Thread: Parsing a file of ints into an int array?

  1. #1
    Registered User
    Join Date
    May 2012
    Posts
    8

    Parsing a file of ints into an int array?

    I am trying to parse a file filled with integers (one per line, presumably EOF at the end) into an array of ints. The file will contain several thousand ints but I want o produce well designed code that will work with any size file and will do some error checking.

    What is the most efficient way of figuring out the number of ints before asking malloc to reserve the memory for the array of ints?

    If I use atoi() or strtol() for parsing, I presume I will have an outer loop checking if the next char is not EOF ( while (c != EOF) ) then have an inner loop parses line by line (looking for \n ??).

    Should I use strtol (better error checking ? ) instead of atoi. All the ints will fit easily within a 32 bit int.

    How do I go about doing this efficiently and well?

    Thanks.

  2. #2
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    What is the most efficient way of figuring out the number of ints before asking malloc to reserve the memory for the array of ints?
    Depends on where you want the efficiency to be. If you're willing to use a platform-specific (or non-portable, at least) construct, you can get the size of the file and use that to find an upper bound on potential memory usage: assume each line will be a single digit (plus newline), so divide the file size by 2 and allocate space for that many ints. Then you only have to read the file once; however, you'll allocate more memory than you actually need. This might not be an issue, and you can always realloc() at the end.

    If you don't want to over-allocate, you'll just have to read the file twice, assuming you need to allocate the proper number up front: one read to count newlines, the next to read them. The second loop, on most sensible systems, will be fast because the file will be cached by the OS.

    There's also the possibility of guessing at a reasonable number, and calling realloc() as necessary while looping. This doesn't fit your requirement of allocating all beforehand, but it's worth a mention.
    If I use atoi() or strtol() for parsing, I presume I will have an outer loop checking if the next char is not EOF ( while (c != EOF) ) then have an inner loop parses line by line (looking for \n ??).

    Should I use strtol (better error checking ? ) instead of atoi. All the ints will fit easily within a 32 bit int.
    Read lines with fgets(), parse with strtol(). atoi() returns int, which might be 16 bits, and if you overflow with atoi(), it's undefined behavior. strtol() returns a long, which cannot be smaller than 32 bits, plus it will tell you if something went wrong.

  3. #3
    Registered User
    Join Date
    Mar 2011
    Posts
    546
    1. you could do two passes through the file. first pass counts the number of ints, second pass allocates the array and stores the ints. tradeoff is reading the file twice.
    2. you could do a single pass where you start with a guess of how big the file will be (maybe get the file size and guess from that) and then if the count is higher than you expected, realloc a new size. tradeoff is possible copying when doing the realloc.

    strtol's error checking is not really enough, as it will return 0 if the value is not parseable but you don't know if the value was actually 0 or garbage. sscanf returns a separate indication that the value was convertible.
    the end of file indication is when your fread, fgetc etc returns an end of file indication. its different depending on how you read a line.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > strtol's error checking is not really enough, as it will return 0 if the value is not parseable but you don't know if the value was actually 0 or garbage.
    strtol(3): convert string to long integer - Linux man page
    But coupled with analysing the endptr and the value of errno, it is possible to work out what happened.

    > sscanf returns a separate indication that the value was convertible.
    scanf functions do not detect overflow.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading ints from text file to an array
    By hotshotennis in forum C Programming
    Replies: 4
    Last Post: 04-23-2012, 12:33 PM
  2. Parsing a formatted text file into an array of structs
    By onus1111 in forum C Programming
    Replies: 2
    Last Post: 06-19-2011, 04:44 AM
  3. Parsing a csv file crashing reading array of pointers
    By slackwarefan in forum C Programming
    Replies: 2
    Last Post: 07-23-2010, 01:23 AM
  4. Reading a list of ints from file into an array
    By mesmer in forum C Programming
    Replies: 1
    Last Post: 11-10-2008, 06:45 AM

Tags for this Thread