Thread: Confused parsing a CSV txt file... strtok, fgets, sscanf

  1. #1
    Registered User towely's Avatar
    Join Date
    Oct 2009
    Posts
    12

    Exclamation Confused parsing a CSV txt file... strtok, fgets, sscanf

    Hi everyone.

    I'm working with a comma-delimited text file (in this example, it's abc.txt), with the following structure:

    Code:
    299, 26, 10, 7, 45, 87.688493, 112.055298, 54697, 3362, 1992, 2569, 793, 81638, 3
    299, 26, 10, 8, 0, 85.539068, 114.966376, 49899, 5755, 3751, 3014, 994, 89688, 5
    299, 26, 10, 8, 15, 83.402953, 117.925606, 47183, 7941, 4041, 3164, 1084, 93264, 5
    I'd like to parse this code into fourteen separate arrays, so each 'column' is in its own array. For example, the output for the first four items of each line should be in this structure:

    Code:
    array1[299,299,299]
    array2[26,26,26]
    array3[10,10,10]
    array4[7,8,8]
    I'm incorporating this code into a program that can read from different user-selected text files which have different numbers of lines, but will always the same fourteen-variable structure. From what I've gathered, I'll need to use a While loop with fgets inside of it to accomplish this.

    I've tried several different methods to get the data into the arrays, such as strtok, sccanf, and a combination of the two. Everything I've ended up with was a bunch of garbage numbers, or blank space.

    Please forgive any noobie mistakes I may have made. This is my first outing with C programming. Here's the code I've worked with so far:


    This first attempt I made at the code produces garbage numbers.
    Code:
    	
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main()
    {
    
    char line[200];
    int i;
    int aJulianDay[50];
    int aDay[50];
    int aMonth[50];
    int aHour[50];
    int aMinute[50];
    float a1[50];
    float a2[50];
    int a3[50];
    int a4[50];
    int a5[50];
    int a6[50];
    int a7[50];
    int a8[50];
    int aHazardRating[50];
    
    
    	FILE *pFile;
    	pFile = fopen("c:\abc.txt", "rt");
    
    
    
    // Load each different 'column' of values into an array
    
    
    //fgets gets a string from a file
    //one string per line
    
        while(fgets(line, sizeof line, pFile) != NULL)
        {
    // sscanf reads the data from the strings and stores it in arrays
    // since the line is in a string, it finds each of the values and stores
    // them into the specified arrays. The loop does this for each line
    // so, each julian day etc will be read into array aJulianDay[] each time there's a line
    // Thus each variable is stored into a different array 
        if(sscanf(line, "%d,%d,%d,%d,%d,%lf,%lf,%d,%d,%d,%d,%d,%d,%d", &aJulianDay[i], &aDay[i], &aMonth[i], &aHour[i], &aMinute[i], &a1[i], &a2[i], &a3[i], &a4[i], &a5[i], &a6[i], &a7[i], &a8[i], &aHazardRating[i]) == 14)
        {
        ++i;
        }
        }
    
    //Close the file
    fclose(pFile);
    
    //Check the 2nd element of each array to see if it worked
    printf("%d\n",aDay[1]);
    printf("%d\n",aMonth[1]);
    printf("%d\n",aHour[1]);
    printf("%d\n",aMinute[1]);
    printf("%d\n",a1[1]);
    printf("%d\n",a2[1]);
    printf("%d\n",a3[1]);
    printf("%d\n",a4[1]);
    printf("%d\n",a5[1]);
    printf("%d\n",a6[1]);
    printf("%d\n",a7[1]);
    printf("%d\n",a8[1]);
    printf("%d\n",aHazardRating[1]);
    
    getch();
    
      return 0;
      
    }
    I scrapped that and took a different approach, using strtok to try to parse the files using the comma delimiters. I got as far as this:

    Code:
    //Working copy
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main()
    {
    
    
    	FILE *pFile;
    	pFile = fopen("c:\abc.txt", "r");
    
    
    //fgets gets a string from a file
    //one string per line
    
        fgets(line, sizeof line, pFile);
    	
    	char delimiter[] = ",";
    	char *token;
    	int i5 = 0;
    	int input[15];
    	
    	token = strtok(line, delimiter);
    	while (token != NULL)
        
    	{
    	
    
        sscanf(token, "%d", &JulianDay1); 
        input[i5++] = JulianDay1[i5];
        
    	token = strtok(NULL, delimiter);
    	}
    //Close the file
    fclose(pFile);
    
    //Test to see if elements are correct
    printf("%d\n",JulianDay1[0]);
    printf("%d\n",JulianDay1[1]);
    printf("%d\n",JulianDay1[2]);
    printf("%d\n",JulianDay1[3]);
    printf("%d\n",JulianDay1[5]);
    printf("%d\n",JulianDay1[7]);
    printf("%d\n",JulianDay1[8]);
    printf("%d\n",JulianDay1[9]);
    printf("%d\n",JulianDay1[10]);
    printf("%d\n",JulianDay1[11]);
    printf("%d\n",JulianDay1[12]);
    printf("%d\n",JulianDay1[13]);
    printf("%d\n",JulianDay1[13]);
    
    getch();
    
    
    
    	  getch(); //See exactly what you're doing
      
      return 0;
      
    }
    But, that didn't work either. So, I tried another approach:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main()
    {
      
    
    	char line[300];
    	char delimiter[] = ",";
    	char *token;
    	int var;
    	int input[15];
    	int i = 0;
    
    	FILE *pFile;
    	pFile = fopen("c:\abc.txt", "r");  //Open that file 	in Read mode
    	
        while(fgets(line, sizeof line, pFile)) //Read the file one line at a time until the EOF
    	{
    	
    		token = strtok (line, delimiter); //Pull the string apart into tokens using the commas
    		while (token != NULL)
    			{
    			sscanf (token, "%d", &var); //Scan that token into your placeholder
    			input[i++] = var;
    			printf("%d/n", input[i]); //Print out the array
    
    			token = strtok (NULL, delimiter);
    			}
    
    	}
    	
    	fclose(pFile); //Close that file
    	
    	  getch(); //See exactly what you're doing
      
      return 0;
      
    }
    That didn't work either.

    My head is spinning at this point, and I'm not sure which one of these methods I'm closest to getting any kind of valid output from. I would really appreciate any help in getting these fourteen arrays populated with my data. I've tried every avenue I've found, but I can't quite figure it out. Sorry in advance for any newbie mistakes I may have made.

  2. #2
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Your first attempt was probably the closest... in fact it might have worked if you had intialized your i variable to 0.

    Other necessary refinements like checking if the file actually opens and expandable arrays can always come after you have it parsing the file correctly...
    Last edited by CommonTater; 07-06-2011 at 05:25 AM.

  3. #3
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by CommonTater View Post
    Your first attempt was probably the closest... in fact it might have worked if you had intialized your i variable to 0.
    I've never been a fan of strtok, so I'd probably stick with the first one. Like Tater said here though:
    Quote Originally Posted by CommonTater View Post
    Other necessary refinements like checking if the file actually opens
    Because:
    Code:
    pFile = fopen("c:\abc.txt", "rt");
    That doesn't do what you think it does.


    Quzah.
    Hope is the first step on the road to disappointment.

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Nice catch Quzah...

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Indentation - cpwiki
    Good indentation will make your life easier, and make it more likely for us to read it when you do post here.

    One tip is make sure you have the "use spaces for tabs" option set in the IDE. Mixed spaces and tabs on a forum is a disaster.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    fscanf() and/or sscanf(), is probably the easiest input method for a beginner, because scanf() is the first input method generally taught in C. Strtok() involves using a pointer, so I doubt you'll like it.

    so:

    set i = 0, before the while loop in the first example, and double up on the \ char in the file name: c:\\etc.

    Don't know if it applies to what you're doing here, but generally, I'd put 14 bits of data that were related, into one struct with 14 struct members (fields), and then make 1 array of those structs.

  7. #7
    Registered User towely's Avatar
    Join Date
    Oct 2009
    Posts
    12
    Sorry about the indentation problems.

    Ah, I can't believe I forgot to intialize i to 0. I wasn't aware that paths needed two slashes, thanks for that. I figured I was missing something trivial, thanks guys.

    However
    , I'm still having one issue with the output. The floating-point type numbers aren't coming out correctly. For example, using this code:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int main()
    {
    
    char line[200];
    int i = 0;
    int aJulianDay[50];
    int aDay[50];
    int aMonth[50];
    int aHour[50];
    int aMinute[50];
    float a1[50];
    float a2[50];
    int a3[50];
    int a4[50];
    int a5[50];
    int a6[50];
    int a7[50];
    int a8[50];
    int aHazardRating[50];
    
    
    FILE *pFile;
    pFile = fopen("c:\\abc.txt", "rt");
    
    
    
    while(fgets(line, sizeof line, pFile) != NULL)
    {
    
    	if(sscanf(line, "%d,%d,%d,%d,%d,%lf,%lf,%d,%d,%d,%d,%d,%d,%d", &aJulianDay[i], &aDay[i], &aMonth[i], &aHour[i], &aMinute[i], &a1[i], &a2[i], &a3[i], &a4[i], &a5[i], &a6[i], &a7[i], &a8[i], &aHazardRating[i]) == 14)
    	{
    		++i;
    	}
    }
    
    //Close the file
    fclose(pFile);
    
    //Check the 3rd element of each array to see if it worked
    printf("%d\n",aJulianDay[2]);
    printf("%d\n",aDay[2]);
    printf("%d\n",aMonth[2]);
    printf("%d\n",aHour[2]);
    printf("%d\n",aMinute[2]);
    printf("%d\n",a1[2]);
    printf("%d\n",a2[2]);
    printf("%d\n",a3[2]);
    printf("%d\n",a4[2]);
    printf("%d\n",a5[2]);
    printf("%d\n",a6[2]);
    printf("%d\n",a7[2]);
    printf("%d\n",a8[2]);
    printf("%d\n",aHazardRating[2]);
    
    getch();
    
    return 0;
    
    }
    on this line from the file:
    Code:
    299, 26, 10, 8, 15, 83.402953, 117.925606, 47183, 7941, 4041, 3164, 1084, 93264, 5
    yields this result:

    Code:
    299
    26
    10
    8
    15
    -1073741824
    -536870912
    47183
    7941
    4041
    3164
    1084
    93264
    5
    As you can see, the integers come out correctly, but the floating-point ones do not. Any advice on what I'm doing wrong?

    Quote Originally Posted by Adak View Post
    Don't know if it applies to what you're doing here, but generally, I'd put 14 bits of data that were related, into one struct with 14 struct members (fields), and then make 1 array of those structs.
    If I'm understanding what you're saying correctly, I thought about doing that. But, because the number of lines per CSV file is going to vary, I figured I'd rather have fourteen arrays of variable lengths rather than a variable number of arrays. (The number of lines in any file will never go above 300, so I'm under the assumption that I can set the size of the arrays and the max number of lines in fgets to 300 and be good to go.)
    Last edited by towely; 07-06-2011 at 05:25 PM. Reason: grammar

  8. #8
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Code:
    printf("%f\n",a1[2]);
    Use %f to print an double.

    Note: You need to learn what an array of structure data type is.
    Is the number of columns in your data fixed?
    If yes, then using an structure makes sense to me, also.

    Tim S.
    Last edited by stahta01; 07-06-2011 at 05:32 PM. Reason: grammer

  9. #9
    Registered User towely's Avatar
    Join Date
    Oct 2009
    Posts
    12
    Quote Originally Posted by stahta01 View Post
    Code:
    printf("%f\n",a1[2]);
    Use %f to print an double.
    Hmm, I changed the code as directed, and the output is still wrong....
    Code:
    printf("%f\n",a1[2]);
    printf("%f\n",a2[2]);
    on the same line as before outputs these:

    Code:
    -116933779947120400000000000000000.000000
    0.000000
    Am I doing something wrong with the sscanf code?

    Quote Originally Posted by stahta01 View Post
    Note: You need to learn what an array of structure data type is.
    Is the number of columns in your data fixed?
    If yes, then using an structure makes sense to me, also.

    Tim S.
    Yes, the number of columns is always fourteen. Only the number of lines will vary. I'm not sure what a structure data type array is. What advantages would that have?

  10. #10
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Quote Originally Posted by towely View Post
    Yes, the number of columns is always fourteen. Only the number of lines will vary. I'm not sure what a structure data type array is. What advantages would that have?
    Readability and maintainability will be much better using a structure, look-up the struct keyword.

    I now see problem with sscanf; found problem "%lf" is for double data type, you are using float data type.

    You need to add more error checking.
    Like adding and else to your "if(sscanf(line," statement that says which line had the error on it.
    Checking to make sure the file open right and message and exit if it fails to open.

    Tim S.
    Last edited by stahta01; 07-06-2011 at 05:55 PM. Reason: error checking and sscanf problem

  11. #11
    Registered User towely's Avatar
    Join Date
    Oct 2009
    Posts
    12
    Quote Originally Posted by stahta01 View Post
    Readability and maintainability will be much better using a structure, look-up the struct keyword.

    I now see problem with sscanf; found problem "%lf" is for double data type, you are using float data type.

    You need to add more error checking.
    Like adding and else to your "if(sscanf(line," statement that says which line had the error on it.
    Checking to make sure the file open right and message and exit if it fails to open.

    Tim S.
    Aha! changing %lf to %f in sscan() fixed it. Thanks! I'm glad to know that I had the right idea from the start, and it was just minor things keeping my output from being correct.

    I'll look into using struct, but honestly, at this point I think I'd rather stick to the code I have, unless there was an extremely compelling argument as to why I need to use struct.

    Now onto figuring out how to search through the arrays using pointers... If I have issues with that later on, should I recycle this thread, or should I start a new one?

  12. #12
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Quote Originally Posted by towely View Post
    If I have issues with that later on, should I recycle this thread, or should I start a new one?
    If the Title still applies add on to this thread (and, if this thread is not over a month old).
    If not, start new thread with link back to this thread.

    Tim S.

  13. #13
    Registered User towely's Avatar
    Join Date
    Oct 2009
    Posts
    12
    Sounds good, thanks.

  14. #14
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Oh good grief! Don't use pointers to work within the array. Use index numbers -- WAY easier, even if you are used to using pointers.

    Structs are records, and struct members are fields within the struct. Say you wanted to work with info from students. Your struct might include:
    firstName, lastName, age, sex, major, advisor, phoneNum, inDorm, level.

    And once you have an array of these structs, it becomes pretty easy to access all the data on a student, just using that one array, because it keeps all the data grouped together, even though each students struct has integers, floats, strings, bools, etc.
    Last edited by Adak; 07-06-2011 at 07:55 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help with reading a file in C, with fgets()/sscanf()
    By ErickN in forum C Programming
    Replies: 3
    Last Post: 04-23-2011, 10:54 AM
  2. Parsing using strtok() and sscanf()
    By NuNn in forum C Programming
    Replies: 13
    Last Post: 02-12-2009, 02:43 PM
  3. Fgets + sscanf
    By MethodMan in forum C Programming
    Replies: 3
    Last Post: 03-15-2004, 08:53 PM
  4. Using strtok and sscanf
    By scaven in forum C Programming
    Replies: 5
    Last Post: 04-14-2003, 11:45 PM
  5. fgets && sscanf
    By GaPe in forum C Programming
    Replies: 3
    Last Post: 12-24-2001, 05:39 PM