So know the field position meanings and maximum sizes:
Code:
Max line size 172 (with newline).
max field length
0 userid 7
1 age 3
2 dob_day 7
3 dob_year 8
4 dob_month 9
5 gender 6
6 tenure 6
7 friend_count 12
8 friendships_initiated 21
9 likes 5
10 likes_received 14
11 mobile_likes 12
12 mobile_likes_received 21
13 www_likes 9
14 www_likes_received 18
You have some sample data
Code:
2094382 14 19 1999 11 male 266 0 0 0 0 0 0 0 0
1192601 14 2 1999 11 female 6 0 0 0 0 0 0 0 0
2083884 14 16 1999 11 male 13 0 0 0 0 0 0 0 0
1203168 14 25 1999 12 female 93 0 0 0 0 0 0 0 0
1733186 14 4 1999 12 male 82 0 0 0 0 0 0 0 0
1524765 14 1 1999 12 male 15 0 0 0 0 0 0 0 0
1136133 13 14 2000 1 male 12 0 0 0 0 0 0 0 0
1680361 13 4 2000 1 female 0 0 0 0 0 0 0 0 0
1365174 13 1 2000 1 male 81 0 0 0 0 0 0 0 0
A couple of problems. What's up with all those zeroes? It may be okay, but it seems strange.
Also, the max field size for those fields seem far too large. Presumably mobile_likes_received is a number, but it's max field size is 21. It could be mostly spaces, though. You should take a look at that one with:
Code:
awk -F'\t' 'length($15) == 21'
Anyway, with the info above and the example program you should be able to get the job done.
Remember that you will need to skip the first line since it's just the field names. That just requires an extra call to fgets before the loop.
Then you'd want to add an
if inside the tokenizing loop. For example, to count the number of females:
Code:
if (i == 5 && strcmp(tok, "female") == 0) // field at offset 5 is "gender"
females++; // a counter that we initialized to 0 and will print at the end