Thread: Help with getting data out of an excel file

  1. #16
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    So know the field position meanings and maximum sizes:
    Code:
    Max line size 172 (with newline).
                                 max field length
    0   userid                   7
    1   age                      3
    2   dob_day                  7
    3   dob_year                 8
    4   dob_month                9
    5   gender                   6
    6   tenure                   6
    7   friend_count            12
    8   friendships_initiated   21
    9   likes                    5
    10  likes_received          14
    11  mobile_likes            12
    12  mobile_likes_received   21
    13  www_likes                9
    14  www_likes_received      18
    You have some sample data
    Code:
        
    2094382    14    19    1999    11    male    266   0    0    0    0    0    0    0    0
    1192601    14    2     1999    11    female  6     0    0    0    0    0    0    0    0
    2083884    14    16    1999    11    male    13    0    0    0    0    0    0    0    0
    1203168    14    25    1999    12    female  93    0    0    0    0    0    0    0    0
    1733186    14    4     1999    12    male    82    0    0    0    0    0    0    0    0
    1524765    14    1     1999    12    male    15    0    0    0    0    0    0    0    0
    1136133    13    14    2000    1     male    12    0    0    0    0    0    0    0    0
    1680361    13    4     2000    1     female  0     0    0    0    0    0    0    0    0
    1365174    13    1     2000    1     male    81    0    0    0    0    0    0    0    0
    A couple of problems. What's up with all those zeroes? It may be okay, but it seems strange.
    Also, the max field size for those fields seem far too large. Presumably mobile_likes_received is a number, but it's max field size is 21. It could be mostly spaces, though. You should take a look at that one with:
    Code:
    awk -F'\t' 'length($15) == 21'
    Anyway, with the info above and the example program you should be able to get the job done.

    Remember that you will need to skip the first line since it's just the field names. That just requires an extra call to fgets before the loop.

    Then you'd want to add an if inside the tokenizing loop. For example, to count the number of females:
    Code:
        if (i == 5 && strcmp(tok, "female") == 0)  // field at offset 5 is "gender"
            females++;   // a counter that we initialized to 0 and will print at the end

  2. #17
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Quote Originally Posted by algorism View Post
    So know the field position meanings and maximum sizes:
    Code:
    Max line size 172 (with newline).
                                 max field length
    0   userid                   7
    1   age                      3
    2   dob_day                  7
    3   dob_year                 8
    4   dob_month                9
    5   gender                   6
    6   tenure                   6
    7   friend_count            12
    8   friendships_initiated   21
    9   likes                    5
    10  likes_received          14
    11  mobile_likes            12
    12  mobile_likes_received   21
    13  www_likes                9
    14  www_likes_received      18
    You have some sample data
    Code:
        
    2094382    14    19    1999    11    male    266   0    0    0    0    0    0    0    0
    1192601    14    2     1999    11    female  6     0    0    0    0    0    0    0    0
    2083884    14    16    1999    11    male    13    0    0    0    0    0    0    0    0
    1203168    14    25    1999    12    female  93    0    0    0    0    0    0    0    0
    1733186    14    4     1999    12    male    82    0    0    0    0    0    0    0    0
    1524765    14    1     1999    12    male    15    0    0    0    0    0    0    0    0
    1136133    13    14    2000    1     male    12    0    0    0    0    0    0    0    0
    1680361    13    4     2000    1     female  0     0    0    0    0    0    0    0    0
    1365174    13    1     2000    1     male    81    0    0    0    0    0    0    0    0
    A couple of problems. What's up with all those zeroes? It may be okay, but it seems strange.
    Also, the max field size for those fields seem far too large. Presumably mobile_likes_received is a number, but it's max field size is 21. It could be mostly spaces, though. You should take a look at that one with:
    Code:
    awk -F'\t' 'length($15) == 21'
    Anyway, with the info above and the example program you should be able to get the job done.

    Remember that you will need to skip the first line since it's just the field names. That just requires an extra call to fgets before the loop.

    Then you'd want to add an if inside the tokenizing loop. For example, to count the number of females:
    Code:
        if (i == 5 && strcmp(tok, "female") == 0)  // field at offset 5 is "gender"
            females++;   // a counter that we initialized to 0 and will print at the end

    Thanks for the help. However, even with the example program, I do not know how to parse the data that I read from the file into my structures. I'm also using a linked list since I need to find the median of the amount of likes and I figured it should help. Also, I need to find the maximum and minimum number of likes. How can I do that? Any help will be appreciated.

    I tried the command but there was no output on the screen.
    Last edited by SomeNoob; 06-27-2017 at 10:53 AM.

  3. #18
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    So to make it more clearer, my goal is to extract and print out the following statistic figures grouped by the user genders(ie. "male' and 'female'):
    -Valid entry count.
    -Highest likes received count.
    -Lowest likes received count.
    -Likes received count total (the sum of all the like received count).
    -Likes received count average.
    -Likes received count median.

    I would appreciate if someone could help me with the codes. I have been trying to figure out how to code it for 2 days straight, but I still have no idea how to do it. Many thanks in advance.

  4. #19
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    Here's a cleaned-up version of the example program. Look up any library functions you don't know and figure out how it works.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    
    #define MAXLINE 1000
    
    int main()
    {
        FILE *f = fopen("pseudo_facebook.tsv", "r");
        if (!f) { perror("fopen"); exit(EXIT_FAILURE); }
    
        char line[MAXLINE];
    
        fgets(line, MAXLINE, f); // consume the header line
    
        while (fgets(line, MAXLINE, f) != NULL)
        {
            int i = 0;
            char *tok = strtok(line, "\t\n");
            while (tok)
            {
                if (i == 5) // the field at offset 5 (gender)
                {
                    // do what you want with it
                    printf("%s\n", tok);
                }
    
                tok = strtok(NULL, "\t\n");
                i++;
            }
        }
    
        fclose(f);
        return 0;
    }
    Then modify it to do all your stats except for the median. You don't need any structs or lists or anything for that. Just some variables to hold the counts.

    Post that code when you're done.

    BTW, I was confused about the large field value lengths in the data (in post #16). Those were obviously just the lengths of the field names themselves! So that max-field-length data is useless, but you don't really need it.

  5. #20
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Quote Originally Posted by algorism View Post
    Here's a cleaned-up version of the example program. Look up any library functions you don't know and figure out how it works.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    
    #define MAXLINE 1000
    
    int main()
    {
        FILE *f = fopen("pseudo_facebook.tsv", "r");
        if (!f) { perror("fopen"); exit(EXIT_FAILURE); }
    
        char line[MAXLINE];
    
        fgets(line, MAXLINE, f); // consume the header line
    
        while (fgets(line, MAXLINE, f) != NULL)
        {
            int i = 0;
            char *tok = strtok(line, "\t\n");
            while (tok)
            {
                if (i == 5) // the field at offset 5 (gender)
                {
                    // do what you want with it
                    printf("%s\n", tok);
                }
    
                tok = strtok(NULL, "\t\n");
                i++;
            }
        }
    
        fclose(f);
        return 0;
    }
    Then modify it to do all your stats except for the median. You don't need any structs or lists or anything for that. Just some variables to hold the counts.

    Post that code when you're done.

    BTW, I was confused about the large field value lengths in the data (in post #16). Those were obviously just the lengths of the field names themselves! So that max-field-length data is useless, but you don't really need it.
    Thanks for the help, but using structures is one of the requirement of my assignment. I have to store certain columns from the excel file somewhere, and I need a struct for it.

  6. #21
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Quote Originally Posted by algorism View Post
    Here's a cleaned-up version of the example program. Look up any library functions you don't know and figure out how it works.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <ctype.h>
    
    #define MAXLINE 1000
    
    int main()
    {
        FILE *f = fopen("pseudo_facebook.tsv", "r");
        if (!f) { perror("fopen"); exit(EXIT_FAILURE); }
    
        char line[MAXLINE];
    
        fgets(line, MAXLINE, f); // consume the header line
    
        while (fgets(line, MAXLINE, f) != NULL)
        {
            int i = 0;
            char *tok = strtok(line, "\t\n");
            while (tok)
            {
                if (i == 5) // the field at offset 5 (gender)
                {
                    // do what you want with it
                    printf("%s\n", tok);
                }
    
                tok = strtok(NULL, "\t\n");
                i++;
            }
        }
    
        fclose(f);
        return 0;
    }
    Then modify it to do all your stats except for the median. You don't need any structs or lists or anything for that. Just some variables to hold the counts.

    Post that code when you're done.

    BTW, I was confused about the large field value lengths in the data (in post #16). Those were obviously just the lengths of the field names themselves! So that max-field-length data is useless, but you don't really need it.

    Could you provide an example of how I would get just the likes from the data instead of only the gender? I still do not understand how I can modify the code to get the other values. Also, how do I get the median if I do not use a linked list or a struct? Can you shed some light on that? Thanks.
    Last edited by SomeNoob; 06-28-2017 at 04:09 AM.

  7. #22
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Since all fields are simple numbers or strings without spaces, try this
    Code:
    if ( sscanf(buff,
      "%d %d %d %d %d"     // userid, age, dob
      "%s "                // gender
      "%d %d %d "          // tenure, friendships
      "%d %d %d %d %d %d", // likes
      &v, &v, &v, &v, &v,
      &v,
      &v, &v, &v,
      &v, &v, &v, &v, &v, &v ) == 15 ) {
      // successfully parsed the line
    } else {
      // header line won't parse, but everything else should.
    }
    I'll leave it to you to figure out the names and types of all the &v parameters to sscanf. We've been doing all the work up to now.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. [C] Need help to export data to excel
    By daydreamin in forum C Programming
    Replies: 2
    Last Post: 05-06-2013, 03:37 AM
  2. Sending Data to Excel with C++
    By chopshardiman in forum C++ Programming
    Replies: 9
    Last Post: 05-17-2012, 11:24 AM
  3. Writing data to Excel file
    By Hankyaku in forum C++ Programming
    Replies: 3
    Last Post: 04-05-2003, 08:33 AM
  4. how do access and read data in from an excel file
    By hamilton73 in forum C++ Programming
    Replies: 2
    Last Post: 09-26-2001, 10:00 PM

Tags for this Thread