Thread: Help with getting data out of an excel file

  1. #1
    Registered User
    Join Date
    Jun 2017
    Posts
    12

    Help with getting data out of an excel file

    I have an excel file that contains the number of user in facebook, their ID, their gender, and their like count. I would like to know how I can retrieve the data from the excel file and output it into a statistic. For example, I would like to know how to retrieve the total number of entries for the male users, find the highest like received count, the lowest like received count, and the total number of likes received. This will be repeated for the female users as well.

    Code:
     
    #include <stdio.h>
    #include <stdlib.h>
    
    
    
    /* Structure to store all the data that is required */
    typedef struct{
        char *userid;
        int entryCount;
        int like;
        char gender[7];
    } statistics;
    
    
    /* Main function */
    int main (int argc, char * argv[]) {
        FILE *fp;
        fp = fopen ("./pseudo_facebook.tsv", "r");
        if (fp == NULL) {
            printf("File not found");
        }
        else {
            printf("File successfully opened");
        }
    
    
    }
    This is all I have so far. I would appreciate it if someone could point me in the right direction on how to complete this, as I have been through many sites but I still have not found a suitable solution to my problem. Thanks in advance.
    Last edited by SomeNoob; 06-25-2017 at 11:51 AM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Your next step would be in the else
    Code:
    char buff[BUFSIZ];
    while ( fgets(buff,BUFSIZ,fp) != NULL ) {
      printf("%s",buff);
    }
    I'm guessing from the suffix that you have a Tab Separated Values (tsv) file, which excel will happily read, but it isn't an excel native file format (like .xls).

    >char *userid;
    I would make this
    char userid[100];
    until you know what you're doing.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Thanks for the answer. So fgets gets the data from the excel file and parses it into the columns in the struct? Also, you were right about it being a .tsv file. Does it matter if it is not in .xls? Another question, does buffer size means the number of data I have in my excel file?
    Last edited by SomeNoob; 06-26-2017 at 08:45 AM.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Yes, the next step is to parse the line into it's components, and assign to your struct variable.

    > Does it matter if it is not in .xls?
    Well .xls is Microsoft proprietary file format, which is a complex binary (not readable using any text editor). Whilst reading it is possible, it is very much not a newbie task.

    .tsv by comparison is easy with only a modicum of experience.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Thanks. Also, does that buffer size means the number of user I have in the excel file?

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    BUFSIZ is the maximum number of characters fgets() will store in buff, and fgets() also stops at the first newline (whichever comes first).

    BUFSIZ is always at least 256 (it happens to be 8192 for me), so unless you have massively long lines containing many fields, it shouldn't be a problem.

    For your homework purpose, I'd say it's safe to assume that one call to fgets puts one complete record in buff for you to parse.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    I have many records in my excel file though. I have 99004 column of users, with 15 rows of data. So what should my buffer size be?

  8. #8
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    How long are your lines?

    What OS are you on?

    Code:
    $ cat foo.tsv
    one	two	theee	four
    1	2	3	4
    I	II	III	IV
    1	10	11	1000
    $ awk -F'\t' '{ print NR, NF, length($0) }' foo.tsv 
    1 4 18
    2 4 7
    3 4 11
    4 4 12
    This awk program (eg: on Ubuntu -> sudo apt-get install mawk) prints the number of tab separated fields, and the length of each line.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  9. #9
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Thanks for the command. My output was 99004 15 68.

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > I have 99004 column of users, with 15 rows of data
    Er, what?

    > My output was 99004 15 68.
    This suggests exactly the opposite - 99004 rows, 15 columns, with 68 characters of data up to the newline.

    This seems trivial:
    while ( fgets(buff,BUFSIZ,fp) != NULL ) will just loop 99000+ times, and you parse out 15 fields of information;
    In what seems to amount to physically short lines (well within what any BUFSIZ you may have).
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    We need the max line size, so try
    Code:
    awk -F'\t' '{if (length($0) > m) m = length($0)}; END{print m}'
    
    Also, lets see if all lines have 15 fields:
    Code:
    awk -F'\t' 'NF != 15 {print NR, "has", NF, "fields"}'
    
    The output of head might be interesting too (shows the first 10 lines).

  12. #12
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Quote Originally Posted by Salem View Post
    > I have 99004 column of users, with 15 rows of data
    Er, what?

    > My output was 99004 15 68.
    This suggests exactly the opposite - 99004 rows, 15 columns, with 68 characters of data up to the newline.

    This seems trivial:
    while ( fgets(buff,BUFSIZ,fp) != NULL ) will just loop 99000+ times, and you parse out 15 fields of information;
    In what seems to amount to physically short lines (well within what any BUFSIZ you may have).

    I must've gotten rows and columns mixed up then. Sorry. So what should my buffer size be?

  13. #13
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Quote Originally Posted by algorism View Post
    We need the max line size, so try
    Code:
    awk -F'\t' '{if (length($0) > m) m = length($0)}; END{print m}'
    
    Also, lets see if all lines have 15 fields:
    Code:
    awk -F'\t' 'NF != 15 {print NR, "has", NF, "fields"}'
    
    The output of head might be interesting too (shows the first 10 lines).
    I got 171 for the first command, and the 2nd command did not show any output. Also, what do you mean by the output of head?

  14. #14
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    So the max line size is 171 (including the newline).
    No output from the second program means that all lines have 15 fields, which is good.

    head is just another program. Just run head pseudo_facebook.tsv

    Here's a program that shows the maximum size for each field, as well as the maximum line size.
    It also demonstrates how we would access the fields (using strtok).
    Post it's output, too, and consider how you would modify it for your task.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAXLINE   1000
    #define NUMFIELDS   15
    
    int main() {
        FILE *f = fopen("pseudo_facebook.tsv", "r");
        if (!f) { perror("fopen"); exit(EXIT_FAILURE); }
    
        char line[MAXLINE];
        int maxfields[NUMFIELDS] = {0};
        int maxline = 0;
    
        while (fgets(line, MAXLINE, f) != NULL) {
            int len = strlen(line);
            if (len > maxline) maxline = len;
    
            char *tok = strtok(line, "\t\n");
            int i = 0;
            while (tok) {
                int len = strlen(tok);
                if (len > maxfields[i]) maxfields[i] = len;
                tok = strtok(NULL, "\t\n");
                i++;
            }
        }
    
        printf("%3d: ", maxline);
        for (int i = 0; i < NUMFIELDS; i++)
            printf("%3d ", maxfields[i]);
        putchar('\n');
    
        fclose(f);
        return 0;
    }

  15. #15
    Registered User
    Join Date
    Jun 2017
    Posts
    12
    Quote Originally Posted by algorism View Post
    So the max line size is 171 (including the newline).
    No output from the second program means that all lines have 15 fields, which is good.

    head is just another program. Just run head pseudo_facebook.tsv

    Here's a program that shows the maximum size for each field, as well as the maximum line size.
    It also demonstrates how we would access the fields (using strtok).
    Post it's output, too, and consider how you would modify it for your task.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAXLINE   1000
    #define NUMFIELDS   15
    
    int main() {
        FILE *f = fopen("pseudo_facebook.tsv", "r");
        if (!f) { perror("fopen"); exit(EXIT_FAILURE); }
    
        char line[MAXLINE];
        int maxfields[NUMFIELDS] = {0};
        int maxline = 0;
    
        while (fgets(line, MAXLINE, f) != NULL) {
            int len = strlen(line);
            if (len > maxline) maxline = len;
    
            char *tok = strtok(line, "\t\n");
            int i = 0;
            while (tok) {
                int len = strlen(tok);
                if (len > maxfields[i]) maxfields[i] = len;
                tok = strtok(NULL, "\t\n");
                i++;
            }
        }
    
        printf("%3d: ", maxline);
        for (int i = 0; i < NUMFIELDS; i++)
            printf("%3d ", maxfields[i]);
        putchar('\n');
    
        fclose(f);
        return 0;
    }

    userid age dob_day dob_year dob_month gender tenure friend_count friendships_initiated likes likes_received mobile_likes mobile_likes_received www_likes www_likes_received
    2094382 14 19 1999 11 male 266 0 0 0 00 0 0 0
    1192601 14 2 1999 11 female 6 0 0 0 00 0 0 0
    2083884 14 16 1999 11 male 13 0 0 0 00 0 0 0
    1203168 14 25 1999 12 female 93 0 0 0 00 0 0 0
    1733186 14 4 1999 12 male 82 0 0 0 00 0 0 0
    1524765 14 1 1999 12 male 15 0 0 0 00 0 0 0
    1136133 13 14 2000 1 male 12 0 0 0 00 0 0 0
    1680361 13 4 2000 1 female 0 0 0 0 00 0 0 0
    1365174 13 1 2000 1 male 81 0 0 0 00 0 0 0

    Output of head.

    172: 7 3 7 8 9 6 6 12 21 5 14 12 21 9 18

    Output of the program.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. [C] Need help to export data to excel
    By daydreamin in forum C Programming
    Replies: 2
    Last Post: 05-06-2013, 03:37 AM
  2. Sending Data to Excel with C++
    By chopshardiman in forum C++ Programming
    Replies: 9
    Last Post: 05-17-2012, 11:24 AM
  3. Writing data to Excel file
    By Hankyaku in forum C++ Programming
    Replies: 3
    Last Post: 04-05-2003, 08:33 AM
  4. how do access and read data in from an excel file
    By hamilton73 in forum C++ Programming
    Replies: 2
    Last Post: 09-26-2001, 10:00 PM

Tags for this Thread