Thread: Handling Large Amounts of Data

  1. #1
    Registered User
    Join Date
    May 2002
    Posts
    208

    Handling Large Amounts of Data

    Hey Guys,

    I was hoping to get some peoples opinion on something. I am not new to C++/C programming but by no means am I an expert. I am a physics student who uses it in his research. I have an extremely large Data set which I want to visualize. I have the graphics portion of the code done, I just need a way to handle reading all the data into the program. The data is organized into 100 different files (each file essentially represents a frame); within each file there is 1000 rows and 11 columns. Reading this data in, normalizing it and displaying it file by file is not working (Maybe I am just implementing it inefficently) and reading it all in at the begining seems counter productive and slow. Does anyone know of an efficent way to handle this much information within a program?
    Jeff Paddon
    Undergraduate Research Assistant
    Physics Department
    St. Francis Xavier University

  2. #2
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Depends on many factors, kas2002. You will have to be a little more precise on how the data is being displayed and what type of data are we talking about... strings, numbers, ... and probably also on what type of operations you are performing on that data.

    Usually the best way nonetheless is to try and follow some data treatment and displaying scheme that allows you to use lazy evaluation techniques. But do provide more info.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  3. #3
    Registered User
    Join Date
    Jan 2005
    Location
    Estonia
    Posts
    131
    Could you specify the things that the program has to do and the code that you have written for doing it.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > The data is organized into 100 different files
    Well the simple test is to do
    Code:
    while ( f.getline(string) ) {
    }
    inside the loop which opens and closes all 100 files.

    This will give you the benchmark for simply reading all the data.

    At least you will have something to compare against if the next step doubles the time.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    May 2002
    Posts
    208
    The data files are just numbers give as follows:

    int int int float float float float float float float float float float float

    First 3 integers are just coordinates of a small area of the system (called a bin). Within the simulation program (which is seperate from the visualization) we do specific types of averaging in these bins, and those floats in the data file are that output. What I want to do is read in the x,y,z coords, plot a box at that point with it's transluceny dependant on one of the floats.

    As I said above each file represents a frame, or in otherwords a timestep in the simulation. What I need is an efficent way to read in this data and display it, and move on to the next timestep without lag or absolutly clogging memory. i.e. some sort of data structure which can either efficiently access the data on the fly OR store all of the data within the program in a concise easy to access manner. I have tried two dimensional vectors but that was memory hogging/counter accesible.

    When I only needed to display one timestep at a time this is how I went about it (I use the Irrlicht game engine for rendering purposes).

    1) I read in the file storing all the data in 11 different vectors.
    2) Using Irrlicht and glut primitives I plot the boxes and render the scene.
    Jeff Paddon
    Undergraduate Research Assistant
    Physics Department
    St. Francis Xavier University

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > As I said above each file represents a frame,
    So if you get say
    2 3 4
    in two different frame files, which obviously update the same part of the visualisation, what do you expect to happen?

    From what you've said, you need
    - the bins from summing all the previous frames
    - the current row of data from the current file.

    I don't think you need to read in the whole file in order to update the visualisation.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    May 2002
    Posts
    208
    Quote Originally Posted by Salem
    > As I said above each file represents a frame,
    So if you get say
    2 3 4
    in two different frame files, which obviously update the same part of the visualisation, what do you expect to happen?
    I expect the color of each bin to change according to the difference in values. i.e. 2 3 4 represents the same bin in each frame.....however one of the 8 corresponding floats will change with each timestep. It is that value which I am attempting to correlate to spatial properties.

    and I do need to read in the whole file because each file defines the entire system for one timestep and I would like to see the entire system.
    Jeff Paddon
    Undergraduate Research Assistant
    Physics Department
    St. Francis Xavier University

  8. #8
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > and I do need to read in the whole file
    I said, you only need to store the current row, not the entire file.

    Code:
    while ( f.getline() ) {
      // process line
    }
    // display final result
    Vs.
    Code:
    while ( f.getline() ) {
      // store line in a vector
    }
    for ( each vector element ) {
      // process
    }
    // display final result
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  9. #9
    Registered User
    Join Date
    Jan 2003
    Posts
    311
    If your files are text you can also improve performance by writing a preprocessing step that will convert your text files into something like
    Code:
    struct element {
        int x,y,z;
        float value;
    };
    and then use ifstream.read()'s. if each line in the text file is exactly the same number of characters you can use some seekg()'s to skip to the next line faster.

  10. #10
    Registered User
    Join Date
    Mar 2006
    Posts
    725
    SQLite seems appropriate for this...
    Code:
    #include <stdio.h>
    
    void J(char*a){int f,i=0,c='1';for(;a[i]!='0';++i)if(i==81){
    puts(a);return;}for(;c<='9';++c){for(f=0;f<9;++f)if(a[i-i%27+i%9
    /3*3+f/3*9+f%3]==c||a[i%9+f*9]==c||a[i-i%9+f]==c)goto e;a[i]=c;J(a);a[i]
    ='0';e:;}}int main(int c,char**v){int t=0;if(c>1){for(;v[1][
    t];++t);if(t==81){J(v[1]);return 0;}}puts("sudoku [0-9]{81}");return 1;}

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 48
    Last Post: 09-26-2008, 03:45 AM
  2. Need simple code for large file handling
    By spiit231 in forum C Programming
    Replies: 4
    Last Post: 02-27-2008, 01:05 AM
  3. C diamonds and perls :°)
    By Carlos in forum A Brief History of Cprogramming.com
    Replies: 7
    Last Post: 05-16-2003, 10:19 PM
  4. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM
  5. accepting large amounts of data
    By Sekti in forum C++ Programming
    Replies: 1
    Last Post: 04-05-2002, 05:45 PM