Help with getting data out of an excel file

**algorism** · 06-26-2017

So know the field position meanings and maximum sizes:

Code:

Max line size 172 (with newline).
                             max field length
0   userid                   7
1   age                      3
2   dob_day                  7
3   dob_year                 8
4   dob_month                9
5   gender                   6
6   tenure                   6
7   friend_count            12
8   friendships_initiated   21
9   likes                    5
10  likes_received          14
11  mobile_likes            12
12  mobile_likes_received   21
13  www_likes                9
14  www_likes_received      18

You have some sample data

Code:

    
2094382    14    19    1999    11    male    266   0    0    0    0    0    0    0    0
1192601    14    2     1999    11    female  6     0    0    0    0    0    0    0    0
2083884    14    16    1999    11    male    13    0    0    0    0    0    0    0    0
1203168    14    25    1999    12    female  93    0    0    0    0    0    0    0    0
1733186    14    4     1999    12    male    82    0    0    0    0    0    0    0    0
1524765    14    1     1999    12    male    15    0    0    0    0    0    0    0    0
1136133    13    14    2000    1     male    12    0    0    0    0    0    0    0    0
1680361    13    4     2000    1     female  0     0    0    0    0    0    0    0    0
1365174    13    1     2000    1     male    81    0    0    0    0    0    0    0    0

A couple of problems. What's up with all those zeroes? It may be okay, but it seems strange.
Also, the max field size for those fields seem far too large. Presumably mobile_likes_received is a number, but it's max field size is 21. It could be mostly spaces, though. You should take a look at that one with:

Code:

awk -F'\t' 'length($15) == 21'

Anyway, with the info above and the example program you should be able to get the job done.

Remember that you will need to skip the first line since it's just the field names. That just requires an extra call to fgets before the loop.

Then you'd want to add an if inside the tokenizing loop. For example, to count the number of females:

Code:

    if (i == 5 && strcmp(tok, "female") == 0)  // field at offset 5 is "gender"
        females++;   // a counter that we initialized to 0 and will print at the end

**SomeNoob** · 06-27-2017

Originally Posted by algorism

So know the field position meanings and maximum sizes:

Code:

Max line size 172 (with newline).
                             max field length
0   userid                   7
1   age                      3
2   dob_day                  7
3   dob_year                 8
4   dob_month                9
5   gender                   6
6   tenure                   6
7   friend_count            12
8   friendships_initiated   21
9   likes                    5
10  likes_received          14
11  mobile_likes            12
12  mobile_likes_received   21
13  www_likes                9
14  www_likes_received      18

You have some sample data

Code:

    
2094382    14    19    1999    11    male    266   0    0    0    0    0    0    0    0
1192601    14    2     1999    11    female  6     0    0    0    0    0    0    0    0
2083884    14    16    1999    11    male    13    0    0    0    0    0    0    0    0
1203168    14    25    1999    12    female  93    0    0    0    0    0    0    0    0
1733186    14    4     1999    12    male    82    0    0    0    0    0    0    0    0
1524765    14    1     1999    12    male    15    0    0    0    0    0    0    0    0
1136133    13    14    2000    1     male    12    0    0    0    0    0    0    0    0
1680361    13    4     2000    1     female  0     0    0    0    0    0    0    0    0
1365174    13    1     2000    1     male    81    0    0    0    0    0    0    0    0

A couple of problems. What's up with all those zeroes? It may be okay, but it seems strange.
Also, the max field size for those fields seem far too large. Presumably mobile_likes_received is a number, but it's max field size is 21. It could be mostly spaces, though. You should take a look at that one with:

Code:

awk -F'\t' 'length($15) == 21'

Anyway, with the info above and the example program you should be able to get the job done.

Remember that you will need to skip the first line since it's just the field names. That just requires an extra call to fgets before the loop.

Then you'd want to add an if inside the tokenizing loop. For example, to count the number of females:

Code:

    if (i == 5 && strcmp(tok, "female") == 0)  // field at offset 5 is "gender"
        females++;   // a counter that we initialized to 0 and will print at the end

Thanks for the help. However, even with the example program, I do not know how to parse the data that I read from the file into my structures. I'm also using a linked list since I need to find the median of the amount of likes and I figured it should help. Also, I need to find the maximum and minimum number of likes. How can I do that? Any help will be appreciated.

I tried the command but there was no output on the screen.

**SomeNoob** · 06-27-2017

So to make it more clearer, my goal is to extract and print out the following statistic figures grouped by the user genders(ie. "male' and 'female'):
-Valid entry count.
-Highest likes received count.
-Lowest likes received count.
-Likes received count total (the sum of all the like received count).
-Likes received count average.
-Likes received count median.

I would appreciate if someone could help me with the codes. I have been trying to figure out how to code it for 2 days straight, but I still have no idea how to do it. Many thanks in advance.

**algorism** · 06-27-2017

Here's a cleaned-up version of the example program. Look up any library functions you don't know and figure out how it works.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAXLINE 1000

int main()
{
    FILE *f = fopen("pseudo_facebook.tsv", "r");
    if (!f) { perror("fopen"); exit(EXIT_FAILURE); }

    char line[MAXLINE];

    fgets(line, MAXLINE, f); // consume the header line

    while (fgets(line, MAXLINE, f) != NULL)
    {
        int i = 0;
        char *tok = strtok(line, "\t\n");
        while (tok)
        {
            if (i == 5) // the field at offset 5 (gender)
            {
                // do what you want with it
                printf("%s\n", tok);
            }

            tok = strtok(NULL, "\t\n");
            i++;
        }
    }

    fclose(f);
    return 0;
}

Then modify it to do all your stats except for the median. You don't need any structs or lists or anything for that. Just some variables to hold the counts.

Post that code when you're done.

BTW, I was confused about the large field value lengths in the data (in post #16). Those were obviously just the lengths of the field names themselves! So that max-field-length data is useless, but you don't really need it.

**SomeNoob** · 06-27-2017

Originally Posted by algorism

Here's a cleaned-up version of the example program. Look up any library functions you don't know and figure out how it works.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAXLINE 1000

int main()
{
    FILE *f = fopen("pseudo_facebook.tsv", "r");
    if (!f) { perror("fopen"); exit(EXIT_FAILURE); }

    char line[MAXLINE];

    fgets(line, MAXLINE, f); // consume the header line

    while (fgets(line, MAXLINE, f) != NULL)
    {
        int i = 0;
        char *tok = strtok(line, "\t\n");
        while (tok)
        {
            if (i == 5) // the field at offset 5 (gender)
            {
                // do what you want with it
                printf("%s\n", tok);
            }

            tok = strtok(NULL, "\t\n");
            i++;
        }
    }

    fclose(f);
    return 0;
}

Then modify it to do all your stats except for the median. You don't need any structs or lists or anything for that. Just some variables to hold the counts.

Post that code when you're done.

BTW, I was confused about the large field value lengths in the data (in post #16). Those were obviously just the lengths of the field names themselves! So that max-field-length data is useless, but you don't really need it.

Thanks for the help, but using structures is one of the requirement of my assignment. I have to store certain columns from the excel file somewhere, and I need a struct for it.

**SomeNoob** · 06-28-2017

Originally Posted by algorism

Here's a cleaned-up version of the example program. Look up any library functions you don't know and figure out how it works.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAXLINE 1000

int main()
{
    FILE *f = fopen("pseudo_facebook.tsv", "r");
    if (!f) { perror("fopen"); exit(EXIT_FAILURE); }

    char line[MAXLINE];

    fgets(line, MAXLINE, f); // consume the header line

    while (fgets(line, MAXLINE, f) != NULL)
    {
        int i = 0;
        char *tok = strtok(line, "\t\n");
        while (tok)
        {
            if (i == 5) // the field at offset 5 (gender)
            {
                // do what you want with it
                printf("%s\n", tok);
            }

            tok = strtok(NULL, "\t\n");
            i++;
        }
    }

    fclose(f);
    return 0;
}

Then modify it to do all your stats except for the median. You don't need any structs or lists or anything for that. Just some variables to hold the counts.

Post that code when you're done.

BTW, I was confused about the large field value lengths in the data (in post #16). Those were obviously just the lengths of the field names themselves! So that max-field-length data is useless, but you don't really need it.

Could you provide an example of how I would get just the likes from the data instead of only the gender? I still do not understand how I can modify the code to get the other values. Also, how do I get the median if I do not use a linked list or a struct? Can you shed some light on that? Thanks.

**Salem** · 06-28-2017

Since all fields are simple numbers or strings without spaces, try this

Code:

if ( sscanf(buff,
  "%d %d %d %d %d"     // userid, age, dob
  "%s "                // gender
  "%d %d %d "          // tenure, friendships
  "%d %d %d %d %d %d", // likes
  &v, &v, &v, &v, &v,
  &v,
  &v, &v, &v,
  &v, &v, &v, &v, &v, &v ) == 15 ) {
  // successfully parsed the line
} else {
  // header line won't parse, but everything else should.
}

I'll leave it to you to figure out the names and types of all the &v parameters to sscanf. We've been doing all the work up to now.

Thread: Help with getting data out of an excel file

Thread Tools

Search Thread

Display

Similar Threads

[C] Need help to export data to excel

Sending Data to Excel with C++

Writing data to Excel file

how do access and read data in from an excel file

Tags for this Thread