Thread: Reading large data sets from a file

  1. #1
    Registered User
    Join Date
    Jan 2011
    Posts
    5

    Reading large data sets from a file

    This question is about reading large sets of formatted data from a file.

    For a file containing two columns, we can read each column in with the fscanf function:


    Code:
    fscanf(file, '%f %f',data[0], data[1]);

    This is great when there is a small, known number of columns of data. But what is typically done when the number of columns is very large, or is unknown. The above strategy becomes very limiting. Is there an easy way to read in formatted data item by item in a loop until the end of the line is reached? How is this task typically coded?

    Thanks in advance

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    fscanf doesn't require (or know about, really) lines. You can just put fscanf(file, "%f", &something) in a loop and suck that file dry.

    If you need to know when a line break happens, you can use fgets and a large buffer to read in one line from the file, then use strtod to get all the numbers out of it. (The advantage of strtod over something like sscanf is that strtod will tell you where to start reading the next number.)

  3. #3
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Depends.

    If your row (or record) can be defined as a struct, you can read it a struct at at time. (or a chunk at a time and suck your structs out).

    If you don't know the format of the data you are reading, and need to get data out, then that's a tougher nut to crack, unless you can programmatically find the data you need to get to based on what knowledge you do have of the data.

    Or, you can use a loop like you suggest, parsing as you go.

    This task is probably tackled many different ways each day.
    Mainframe assembler programmer by trade. C coder when I can.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    There has to be *some* format there or no other program would ever read it...

    Generally there are only 3 ways data is written to a file...

    1) Records ... using Structs that are simply dumped onto disk and read back later (binary mode)

    2) Delimited... values separated by comas or some other convenient delimiter.

    3) Sized ... where the size of each variable is known at write time and read back accordingly.

    Method 1 can usually be spotted by long strings of spaces or zeros after blocks of text, interspersed with random looking, odd, characters that turn out to be numerical values.

    Method 2 will look something like

    123,this is fred,44,56,he lives in Canada,3362

    followed by a newline.

    Method 3 is likey to be text interspersed with random "squiggles" but no padding like you'd see in method 1.

    Which is your file?
    Last edited by CommonTater; 01-17-2011 at 07:32 PM.

  5. #5
    Registered User
    Join Date
    Jan 2011
    Posts
    5

    using fscanf

    fscanf doesn't seem to behave as described for me. Specifically, it seems to only store zero.

    When I run this structure, I expect floating point numbers to be read from the file one-by-one, stored temporarily in 'data', then printed to the screen.

    Code:
    if((fp = fopen(argv[1],"rb"))==NULL){
    		printf("cannot open %s\n", argv[1]);
    }
    else{
    	for(i=0; i < 3; i++){
    		for(j=0; j<3; j++){
    			fscanf(fp, "%f", &data);
    			printf("%f \n",data);
    		}
    	}		
    fclose(fp);
    }
    Instead, the output is:
    0.000000
    0.000000
    0.000000
    0.000000
    0.000000
    0.000000
    0.000000
    0.000000
    0.000000



    The input file is:
    0.000000 0.000000 0.000000
    0.000000 1.000000 2.000000
    0.000000 2.000000 4.000000

    Is it clear what I am doing wrong?

    Thanks again!

  6. #6
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    You are not checking to see if fscanf() worked as you expected. See here: fscanf [C++ Reference]
    Mainframe assembler programmer by trade. C coder when I can.

  7. #7
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    The output you describe is consistent with data being declared as type double rather than type float.-

  8. #8
    Registered User
    Join Date
    Jan 2011
    Posts
    5

    Thanks

    Thanks Tabstop! Your advice about fscanf and the observation about the double declaration have gotten me over this latest obstacle. I now get the output that I expect.

    Thanks also to everyone else!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Memory Leak in AppWizard-Generated Code
    By jrohde in forum Windows Programming
    Replies: 4
    Last Post: 05-19-2010, 04:24 PM
  2. Reading data from a text file
    By Dark_Phoenix in forum C++ Programming
    Replies: 8
    Last Post: 06-30-2008, 02:30 PM
  3. reading in file data
    By sickofc in forum C Programming
    Replies: 2
    Last Post: 03-01-2006, 05:16 PM
  4. Replies: 3
    Last Post: 03-04-2005, 02:46 PM
  5. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM