Thread: Reading data file efficiently

  1. #1
    Confused
    Join Date
    Nov 2002
    Location
    Warwick, UK
    Posts
    209

    Reading data file efficiently

    Hello,

    I'd like some advice on how to go about reading a data file efficiently - that is, with as little wasted operation as possible.

    The files I'm trying to read are in this format :

    Code:
    # This is a comment.
    1     1     0.24452     5.58872     3.54826     1.58262     -1
    # Comments can appear anywhere
    2     1     7.47274     0.37462     -1.28472     8.27462     1
    The columns should appear as either integers or floats, but in some files, I have E-notation on all columns, even those constrainted to be integers ( a quick test tells me reading this isn't a problem ).

    Columns represent entry ID ( int ), structural location ( int ), x, y, z coordinates ( floats ), radius ( float ) and parent node ( int ).

    I need to read in all non-comment columns, where the second column has a value of 3. I'm not sure what the best way to do this is. Knowing that each line is either a comment or consists of seven numbers, I could potentially read in a line using getline into a string, and test if the first character is a hash. If not, I could somehow use stringstreams, perhaps, though I still haven't worked out how.

    So far, I'm working with commentless files, and the temptation is simply to read in two doubles from the line, and if the second has a value of 3, then continue reading and use the line appropriately. Obviously, this won't work once I have files with comments.

    So, my question is simply - what would be the best way to deal with this ? I'm trying to waste as little computation as possible because the files can get relatively long, and each line with a 3 in the second column needs to be placed in some kind of structure, possibly a vector of double arrays, but that's still undecided. First, I'd love some advice on how you'd go about reading this kind of file, with these constraints in mind. I'm not asking for code, but suggestions on methodology would be fantastic.

    Thanks very much,
    Quentin

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Use fgets() to read one line at a time - use a while loop, there are lots of examples how to do this.

    If the first character isn't #, then parse the line.
    Crudely, use sscanf(), otherwise use strtod() and strtol() to walk through the string one conversion at a time.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Confused
    Join Date
    Nov 2002
    Location
    Warwick, UK
    Posts
    209
    Thanks for the help, Salem.

    I've decided to use the following, though it's probably not optimal :

    Code:
    ifstream fin;
    string myString;
    istringstream mySStream;
    list <double> myList; // List used for random access
    vector <list <double> > data;
    double temp;
    
    while(getline(fin, myString))
    {
        mySStream.str(myString);
        myList.clear();
    
        if(myString.at(0) != '#')
        {
            for(int i = 0; i < 7; i++)
            {
                mySStream >> temp;
                myList.push_back(temp);
            }
    
            data.push_back(myList);
        }
    
        myString.clear();
        mySStream.clear();
    }
    This seems to work well enough, though again, it's not optimal... I'll look into optimisations, but this is certainly not a bottleneck in the code.

    Thanks again !

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,656
    Why bother to optimise something which isn't a bottleneck to begin with?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Data Structure Eror
    By prominababy in forum C Programming
    Replies: 3
    Last Post: 01-06-2009, 09:35 AM
  2. xor linked list
    By adramalech in forum C Programming
    Replies: 23
    Last Post: 10-14-2008, 10:13 AM
  3. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 05:02 AM
  4. gcc link external library
    By spank in forum C Programming
    Replies: 6
    Last Post: 08-08-2007, 03:44 PM
  5. what does this mean to you?
    By pkananen in forum C++ Programming
    Replies: 8
    Last Post: 02-04-2002, 03:58 PM