Thread: Reading CSV file

  1. #1
    Registered User
    Join Date
    Oct 2010
    Posts
    11

    Reading CSV file

    I was thinking of reading some data from a file and perform calculation on it. The job has been broken down as follow:

    1. welcome message and ask for input filename
    2. read header from file that tells the size of the upcoming 2d array in the input file
    3. dynamically allocate a 2d array
    4.1 reading the remainder of the file in a do loop, line by line
    4.2 parse the line, break down the value and insert into each element of the array defined earlier
    5. ask for further parameters
    6. calculation
    7. result output

    the file will look something like this, where the first line states the size of the array the program need to deal with.

    ______
    3, 3
    1.0, 2.0, 3.0
    4.0, 5.0, 6.0
    7.0, 8.0, 9.0
    ______

    while I have no problem with 1, 2, 5, 6 and 7, 3 to 4 seems to pose some trouble for me, especially 4.

    I been successful in dynamically allocating 1d array using runtime input, so thats fine. I have read that 2d array is just a nested allocation, so I guess that will also be fine (will just have to try it).

    The thing I really stumble on is the parsing of the file. Parsing that I have came across usually involves fgets() and strtok(), embarassingly I seem not to be able to understand the long code others have written. However I am wondering as why there wasn't a more "high level" way.

    Surely as the designer of the program, I have already predefined the input format, easiest thing to do will be fscanf("%format%", fp). because I have defined the input format (all float numbers), if I am able to produce a format string base on the input at the start of the file ("%d, %d, %d" in the example case), I will never have to go through the trouble of working through defining a parse function. Trouble is if this is just php or fortran I will write a do loop to create a format string, however when it comes to C, as I know it doesn't support string, I don't know how shall I perform this.

    Any suggestions?

  2. #2
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    My best suggestion is that you give it your best try. If you run into problems post your code and let the gang here make suggestions...

  3. #3
    Registered User
    Join Date
    Oct 2010
    Posts
    11
    Ok, but before I continue, could I know if the fscanf() function takes a string variable as the format argument? Or is there actually string argument in C? or does C treat a character array as a string?

    The only problem being that the array is dynamic, if it wasn't, I could just hard code the fscanf format.
    Last edited by jimmychauck; 10-27-2010 at 11:20 PM.

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by jimmychauck View Post
    Ok, but before I continue, could I know if the fscanf() function takes a string variable as the format argument? Or is there actually string argument in C? or does C treat a character array as a string?

    The only problem being that the array is dynamic, if it wasn't, I could just hard code the fscanf format.
    Yes it does: fscanf(filePointerName,"%s", mycharArrayName);

    C works with strings as char's, but has several string functions, just for string work. So you can "roll your own", or use the string functions included with the string.h file, just by including it in your program.

    Since your input data is strictly formatted, and it's all numbers, I suggest using fscanf(), but use it for numbers, not strings. After the first two integers, the rest of the data is all doubles or floats.

    Welcome to the forum, Jimmy!
    Last edited by Adak; 10-27-2010 at 11:33 PM.

  5. #5
    Registered User
    Join Date
    Oct 2010
    Posts
    11
    ok, so I think I will try the following

    Code:
    fscanf(fp, "%d, %d", &row, &column)  /*to get the number of columns in the array*/
    
    char formatstring[4 * column - 2] /*declare format string array*/
    
    formatstring[0] = '%';   /*create the "%d, %d, %d, %d, ........%d" */
    formatstring[1] = 'd';
    i = 0;
    do
    {
    formatstring[i+2] = ',';
    formatstring[i+3] = ' ';
    formatstring[i+4] = '%';
    formatstring[i+5] = 'd';
    i++;
    }
    while (i < column - 1)
    
    do
    {
    fscanf(fp, formatstring, arraydata[][i]);
    }
    while(not eof)
    code is incomplete, its just the concept, however I really want to know if the line

    fscanf(fp, formatstring, arraydata[][i]);

    will work.

  6. #6
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Where/how is your arraydata variable defined?
    If you understand what you're doing, you're not learning anything.

  7. #7
    Registered User
    Join Date
    Oct 2010
    Posts
    11
    This batch of code I tested, and is working
    Code:
    printf("please enter the number of elements you want to appear in the array: \n");
    	scanf("%d", &a);
    	array1 = (float*) malloc(a * sizeof array1);
    	
    	for (i = 0; i < a; i++)
    	{
    		array1[i] = i;
    		printf("%f\n", array1[i]);
    	}

    This batch of code I read from the internet, thought through, think its what I like, will use.
    Code:
    void allocate2D(int** array, int nrows, int ncols) {
         
         /*  allocate array of pointers  */
         array = ( int** )malloc( nrows*sizeof( int* ) );
         
         /*  allocate each row  */
         int i;
         for(i = 0; i < nrows; i++) {
              array[i] = ( int* )malloc( ncols*sizeof( int ) );
         }
     
    }
    The programming is going on, will let everyone know the result

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    That's how I malloc for a 2D array, as well. Except be sure to include stdlib.h, and you don't need to cast the return, in C (well, almost never).

    This is how you can read the data:
    Code:
    #include <stdio.h>
    #define ROWS 3
    #define COLS 3
    
    int main() {
      int i,j, rows, cols; 
      double dat[ROWS][COLS];
      FILE *fp=fopen("0jimmy.txt", "r");
      if(!fp) {
        printf("Error opening file");
        return 1;
      }
      printf("\n\n\n");
      fscanf(fp, "%d%*c %d%*c", &rows, &cols); //the *suppresses storing that item
    
      for(i=0;i<rows;i++) {
        fscanf(fp, "%lf%*c %lf%*c %lf%*c",&dat[i][0],&dat[i][1],&dat[i][2]);
        printf("\n%.3lf %.3lf %.3lf", dat[i][0],dat[i][1],dat[i][2]);
      }
    
      fclose(fp);
      return 0;
    }
    The %*c is for the comma's and the newline at the end of each row.

  9. #9
    Registered User
    Join Date
    Oct 2010
    Posts
    11
    hmm.. yes, but the whole point I am trying to stress here is dynamic. The size of the array is unknown until the first line of the input file is read.

    I tried to do
    Code:
    float arraydat[row][col];
    after I read the numbers but the compiler says, "constant expression expected".

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Step by step, JC.

    That shows reading the file data. Plug in your dynamic memory array code that you already posted.

    This is an example of that:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    void allocate2D(double **dat, int nrows, int ncols);
    
    int main() {
      int i,j, rows, cols; 
      double **dat;
      FILE *fp=fopen("0jimmy.txt", "r");
      if(!fp) {
        printf("Error opening file");
        return 1;
      }
      printf("\n\n\n");
      fscanf(fp, "%d%*c %d%*c", &rows, &cols); //the *suppresses storing that item
    
      allocate2D(dat, rows, cols);
    
      for(i=0;i<rows;i++) {
        fscanf(fp, "%lf%*c %lf%*c %lf%*c",&dat[i][0],&dat[i][1],&dat[i][2]);
        printf("\n%.3lf %.3lf %.3lf", dat[i][0],dat[i][1],dat[i][2]);
      }
    
      fclose(fp);
      for(i=0;i<rows;i++)
        free(dat[i]);
      free(dat);
    
      return 0;
    }
    void allocate2D(double **dat, int nrows, int ncols) {
         int i;
         /*  allocate array of pointers  */
         dat = malloc( nrows*sizeof( double* ) );
         
         /*  allocate each row  */
         
         for(i = 0; i < nrows; i++) {
              dat[i] = malloc( ncols*sizeof( double ) );
         }
        if(dat==NULL || dat[i-1]==NULL) {
           printf("\nError allocating memory\n");
           exit(1);
        }
       
    }
    Seems to work, but barely tested. You'll get your own file name, of course.
    Last edited by Adak; 10-28-2010 at 04:31 AM.

  11. #11
    Registered User
    Join Date
    Aug 2010
    Posts
    231
    Here a variation from before with dynamic memory allocation for your rows+cols. It uses fgets for line reading and sscanf for process each linestring.
    Code:
    int main() {
      int i,j,rownum=0, rows, cols;
      double *dat;
      char row[512];
    
      FILE *fp=fopen("0jimmy.txt", "r");
      if(!fp) {
        printf("Error opening file");
        return 1;
      }
      printf("\n\n\n");
    
      /* read limits with error-handling */
      if( 2!=fscanf(fp, "%d%*[ ,]%d%*[ \n]", &rows, &cols) ) { puts("error"); fclose(fp); exit(1); }
    
      dat = malloc( rows*cols*sizeof*dat );
      while( fgets(row, 512, fp) ) /* first read line */
        if( rownum < rows )
        {
          char *c=row;
          int n=0,num=0;
          for( ; num<cols; ++num ) /* and THAN process each line */
          {
            c+=n; sscanf(c,"%*[ ,]%n",&n); c+=n;
            if( 1!=sscanf(c,"%lf%n",&dat[rownum*cols+num],&n))
              break;
          }
          if( num==cols )
            ++rownum;
        }
    
      for( i=0; i<rows; puts(""),++i )
        for( j=0; j<cols; ++j )
          printf("%5.2f", dat[i*cols+j]);
    
      free( dat );
      fclose(fp);
      return 0;
    }
    Last edited by BillyTKid; 10-28-2010 at 04:27 AM.

  12. #12
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Using an *[ ,] scanf() ignore set - clever.

  13. #13
    Registered User
    Join Date
    Oct 2010
    Posts
    11
    Thanks everyone for the help, however still no luck with the dynamic allocation.

    This is the current program
    Code:
    #define TRUE = 1
    #include <stdio.h>
    #include <math.h>
    main()
    {	char filename[100];
    	FILE *inputfile;
    	int row, col, i, j, k, l;
    	char* formatstring;
    	float** datapoint;
    
    	/*information*/
    	printf("Please enter the name of the input file (enter exit for termination): ");
    	
    	while(1==1)
    		{
    		scanf("%s", filename);
    		inputfile = fopen(filename, "r"	);
    		if (filename != NULL)/*file open without error*/
    			{
    			if (filename == "exit") exit(0);
    			if (inputfile == 0) /*file not found*/
    				{
    				printf("file not found"); exit(0);
    				}
    			else
    				{
    				printf("file opened\n");
    				fscanf(inputfile, "%d, %d", &row, &col);	 /*read from input file number of rows and colums*/
    				printf("%d rows and %d columns of data will be read\n", row, col);
    				formatstring = (char*) malloc(col * 4 - 2); /*allocate string array for format string*/
    				formatstring[0] = '%';   /*create the "%d, %d, %d, %d, ........%d" */
    				formatstring[1] = 'd';
    				i = 0;
    				do
    					{
    					formatstring[i*4+2] = ',';
    					formatstring[i*4+3] = ' ';
    					formatstring[i*4+4] = '%';
    					formatstring[i*4+5] = 'd';
    					i++;
    					}
    				while (i < col - 1);
    				
    				/*allocate 2d array*/
    				allocate2D(datapoint, row, col);
    				
    				for (i = 0; i < 13; i++)
    					{
    					for (j = 0; j < 14; j++)
    						{
    						datapoint[i][j] = (float) i + j; /*assignment*/
    						printf("%d\n", datapoint[i][j]); /*debug*/
    						}
    					}
    
    				for (i = 0; i < 13; i++)
    					{
    					for (j = 0; j < 14; j++)
    						{
    						printf("%d, %d, %d\n", datapoint[i][j] ,i , j);
    						}
    					}					
    				break;
    				}
    			}
    		else /*file opern error*/
    			{
    			printf("error when opening file! program abort"); exit(0);
    			}
    		}
    		free (datapoint);
    }
    
    allocate2D(array, nrows, ncols)
    float** array;
    int nrows;
    int ncols;
    {
         /*  allocate array of pointers  */
         array = (float**)malloc(nrows*sizeof(float*));
         /*  allocate each row  */
         int i;
         for(i = 0; i < nrows; i++) {
              array[i] = (float*)malloc(ncols*sizeof(float));
         }
    }
    though the allocate array function went through without problem, problem occurs whenever I try to assign any value to the array in the line /*assignment*/ and /*debug*/, program just terminates there stating "program.exe stopped working". The Visual studio debugger seems to indicate an access violation error, I think it has everything to do with the dynamic array allocation.

    I get the idea that a 2d array is an array of pointers, of which each pointer is on its own an array, increasing dimension of the array should work the same way. However I just cannot get it to work

  14. #14
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I just became aware of that. It needs a return from the allocating function, after malloc gives it an address.

    Try this:

    Code:
    /* dynamically creates a 2D array of pointers, in C */
    
    #include <stdio.h>
    #include <stdlib.h>
    
    double** allocate2D(int nrows, int ncols);
    
    int main() {
      int i,j, rows, cols; 
      double **dat;
    
      printf("\n\n\n How many rows do you want?\n ");
      scanf("%d", &rows);
      (void) getchar();
      printf(" How many columns do you want?\n ");
      scanf("%d", &cols);
      (void) getchar();
    
      dat = allocate2D(rows, cols);
      for(i=0;i<rows;i++) {
        for(j=0;j<cols;j++) {
          dat[i][j] = i+j;
        }
      }
    
      for(i=0;i<rows;i++) {
          for(j=0;j<cols;j++) 
            printf("\n%.3lf", dat[i][j]);
      }
       
      for(i=0;i<rows;i++)
        free(dat[i]);
    
      free(dat);
    
      printf("\n\t\t\t    press enter when ready");
      (void) getchar();
      return 0;
    }
    double** allocate2D(int nrows, int ncols) {
      int i;
      double **dat2;
      /*  allocate array of pointers  */
      dat2 = malloc( nrows*sizeof(double*));
         
      if(dat2==NULL) {        /* Thanks, Salem */
        printf("\nError allocating pointers-terminating");
        exit(1);
      } 
      /*  allocate each row  */
      for(i = 0; i < nrows; i++) {
        dat2[i] = malloc( ncols*sizeof(double));
        if(dat2[i]==NULL) {
          printf("\nError allocating array memory-terminating");
          exit(1);
        }
      }
      
    
      return dat2;
    }
    Last edited by Adak; 10-29-2010 at 12:27 AM.

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    If dat2 is NULL at the point of checking, you're already dead at the previous dat2[i] reference.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. opening empty file causes access violation
    By trevordunstan in forum C Programming
    Replies: 10
    Last Post: 10-21-2008, 11:19 PM
  2. sequential file program
    By needhelpbad in forum C Programming
    Replies: 80
    Last Post: 06-08-2008, 01:04 PM
  3. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 05:02 AM
  4. Possible circular definition with singleton objects
    By techrolla in forum C++ Programming
    Replies: 3
    Last Post: 12-26-2004, 10:46 AM
  5. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM