Thread: reading strings from .txt and entering in an array

  1. #1
    Registered User
    Join Date
    Sep 2007
    Posts
    15

    reading strings from .txt and entering in an array

    I am trying to read a file of 500k+ words that will be entered into a binary search tree. First I am just trying to read these words into an array of strings but it does not seem to be working. Below is the code I wrote but after spending an hour figuring out why every entry is being set to the very last word in the .txt I am at a loss. Below the code is the output, I don't need the output but I was using it as an attempt to debug the code or to see what is going wrong... but it did not help.

    Code:
    int main() {
    	FILE *file;
    	int numWords=0, i, j;
    	char fileName[30], wordString[30];
    	char **wordArray;
    	printf("Enter the name of the file:");
    	scanf("%s", fileName);
    
    	file = fopen(fileName, "r");
    	if(!file) {
    	  printf("File does not exist.  Program terminated");
    	  return 1;
    	}
    
    	else {
    	  while(!feof(file)) {
    	    fgets(wordString, 30, file);
    	    numWords++;
    	  } rewind(file);
    	  
    	  wordArray = (char **)checked_malloc(numWords*sizeof(char*));
    	  for(i = 0; fgets(wordString, 30, file); ++i) {
    	    printf("%d %s", i, wordString);
    	    wordArray[i] = (char*)checked_malloc(strlen(wordString)*sizeof(char));
    	    wordArray[i] = wordString;
    	    
    	  } fclose(file);
    	}
    
    	printf("i=%d    words=%d  \n", i, numWords-1);
    	for(j = 0; j < numWords-1; j++)
    	  printf("%d: %s ", j, wordArray[j]);
    
    	free(wordArray);
    	return 0;
    }
    Enter the name of the file:small.txt
    0 cat
    1 mouse
    2 bear
    3 deer
    4 haha
    5 newbie
    6 llama
    7 dog
    8 animal
    9 rocket
    10 ship
    11 blanket
    12 snake
    13 ohnoes
    14 wtfbbq
    i=15 words=15
    0: wtfbbq
    1: wtfbbq
    2: wtfbbq
    3: wtfbbq
    4: wtfbbq
    5: wtfbbq
    6: wtfbbq
    7: wtfbbq
    8: wtfbbq
    9: wtfbbq
    10: wtfbbq
    11: wtfbbq
    12: wtfbbq
    13: wtfbbq
    14: wtfbbq
    As the output shows, every string in the array wordArray is set to the last word in the txt. If anyone could show me what part I am doing wrong it would be greatly appreciated.

    Thanks.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What does this line do [and think about it in the sense of what the compiler makes of it, rather than what it looks like to you]:
    Code:
    wordArray[i] = wordString;
    I guarantee, that this is exactly the only line you would need to change.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Not surprised it doesn't work...
    OK, here goes...

    Quote Originally Posted by trancekid View Post
    Code:
    int main() {
    	FILE *file;
    	int numWords=0, i, j;
    	char fileName[30], wordString[30];
    	char **wordArray;
    	printf("Enter the name of the file:");
    /* Don't use scanf to read strings! It's unsafe! Recommend use fgets. */
    	scanf("&#37;s", fileName);
    
    	file = fopen(fileName, "r");
    	if(!file) {
    	  printf("File does not exist.  Program terminated");
    	  return 1;
    	}
    
    	else {
    	  while(!feof(file)) {
    	    fgets(wordString, 30, file);
    	    numWords++;
    	  } rewind(file); /* Come on, put that on a separate line, it's confusing */	  
    
    	  wordArray = (char **)checked_malloc(numWords*sizeof(char*));
    	  for(i = 0; fgets(wordString, 30, file); ++i) {
    /* Reading the contents of the file twice. Waste of time. Merge it into one loop and read them */
    	    printf("%d %s", i, wordString);
    	    wordArray[i] = (char*)checked_malloc(strlen(wordString)*sizeof(char));
    	    wordArray[i] = wordString; /* *cough* Read a tutorial on strings *cough */
    	    
    	  } fclose(file);
    	}
    
    	printf("i=%d    words=%d  \n", i, numWords-1);
    	for(j = 0; j < numWords-1; j++)
    	  printf("%d: %s ", j, wordArray[j]);
    
    
    /* Bad! Free the elements of the array separately, THEN free wordArray */
    /* You allocated a char** array first, then EACH of that array allocated MORE memory, so you have a leak */
    	free(wordArray);
    	return 0;
    }

  4. #4
    Registered User
    Join Date
    Sep 2007
    Posts
    15
    Well, I kind of thought it was that line for a while but I guess my reasoning has failed me yet again. Anyways...

    wordString is obviously the first, or next, word in the file. So I am allocating enough memory at wordArray[i] (since it is a double pointer, for an array of strings when a malloc string is just an array of characters) for the size of whatever this word may be. I assume that once wordArray[i] has been given the correct amount of memory, I can simply set it = to the string of characters that make up wordString... I guess this is not the case but I don't see where it is going wrong. Do I have to run another loop inside the fgets loop that enters each char of said string? Like...

    for(j = 0; j < strlen(wordString); j++) {
    wordArray[i][j] = wordString[j];

    ??

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Yes, it's inefficient to read 500k lines of words. But it's also inefficient to realloc() many times over, if you don't get that part right.

    And as usual, "It's easier to make a correct program run faster than to fix a fast program full of bugs" [paraphrasing here], so optimize later on if needs be.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by trancekid View Post
    wordString is obviously the first, or next, word in the file. So I am allocating enough memory at wordArray[i] (since it is a double pointer, for an array of strings when a malloc string is just an array of characters) for the size of whatever this word may be. I assume that once wordArray[i] has been given the correct amount of memory, I can simply set it = to the string of characters that make up wordString... I guess this is not the case but I don't see where it is going wrong. Do I have to run another loop inside the fgets loop that enters each char of said string? Like...
    http://www.cprogramming.com/tutorial/c/lesson9.html

  7. #7
    Registered User
    Join Date
    Sep 2007
    Posts
    15
    Code:
    int main() {
    	FILE *file;
    	int numWords=0, i, j;
    	char fileName[30], wordString[30];
    	char **wordArray;
    	printf("Enter the name of the file:");
    	scanf("%s", fileName);
    
    	file = fopen(fileName, "r");
    	if(!file) {
    	  printf("File does not exist.  Program terminated");
    	  return 1;
    	}
    
    	else {
    	  while(!feof(file)) {
    	    fgets(wordString, 30, file);
    	    numWords++;
    	  } 
    	  rewind(file);  //just for you angry guy!!!
    	  
    	  wordArray = (char **)checked_malloc(numWords*sizeof(char*));
    	  for(i = 0; fgets(wordString, 30, file); ++i) {
    	    printf("%d %s", i, wordString);
    	    wordArray[i] = (char*)checked_malloc(strlen(wordString)*sizeof(char));
    	    for(j = 0; j < strlen(wordString); j++)
    	      wordArray[i][j] = wordString[j];
    	    
    	  } fclose(file);
    	}
    
    	printf("i=%d    words=%d  \n", i, numWords-1);
    	for(j = 0; j < numWords-1; j++) {
    	  printf("%d: %s ", j, wordArray[j]);
    	  free(wordArray[j]);
    	}
    
    	free(wordArray);
    	return 0;
    }
    Thank to both of you for the constructive criticism. It works like a charm now.

    But the reason I am rewinding the file and reading everything a second time is because I need to know how many words are in the file to allocate the correct amount of space. Would you suggest I simply allocated a whole bunch of space then trim it down to the correct size by deleting all null pointers in the **?

    Thanks again guys!

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by trancekid View Post
    Thank to both of you for the constructive criticism. It works like a charm now.

    But the reason I am rewinding the file and reading everything a second time is because I need to know how many words are in the file to allocate the correct amount of space. Would you suggest I simply allocated a whole bunch of space then trim it down to the correct size by deleting all null pointers in the **?

    Thanks again guys!
    You could make a dynamic array. Guess how much space you need, allocate, read and if it turns out you don't have enough, use realloc to get more.
    Read the string tutorial I gave you and also think about your allocation/free procedure, because you're leaking memory.

    Oh, and you want to know what happens when you do:
    Code:
    char*p = "My string here";
    char* p2 = p;
    Hm? p's address is assigned to p2. The string isn't copied, so when p dies, p2 is invalid. Which turns out to be your case too (when you read a new line, the contents of your pointers change because it points to the address of your buffer).

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    And instead of making your own strcpy() implementation, why not use:
    Code:
    strcpy(wordArray[i], wordString);
    Also, you don't free all the entries in wordArray, only the first allocation.

    And there's another bug that both me and elysia missed: You are allocating strlen bytes, you need one more to store the zero at the end of the string.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Registered User
    Join Date
    Sep 2007
    Posts
    15
    Quote Originally Posted by matsp View Post
    And instead of making your own strcpy() implementation, why not use:
    Code:
    strcpy(wordArray[i], wordString);
    Also, you don't free all the entries in wordArray, only the first allocation.

    And there's another bug that both me and elysia missed: You are allocating strlen bytes, you need one more to store the zero at the end of the string.

    --
    Mats
    When I do this...

    Code:
    	printf("i=%d    words=%d  \n", i, numWords-1);
    	for(j = 0; j < numWords-1; j++) {
    	  printf("%d: %s ", j, wordArray[j]);
    	  for( i = 0; i < strlen(wordArray[j]); i++)
    	    free(wordArray[j][i]);
    	  free(wordArray[j]);
    	}
    I get this error...

    Dict.c:69: warning: passing argument 1 of ‘free’ makes pointer from integer without a cast
    I also added one extra byte for the \0.

    Code:
    	    wordArray[i] = (char*)checked_malloc(strlen(wordString)*sizeof(char)+1);

  11. #11
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by trancekid View Post
    When I do this...

    Code:
    	printf("i=&#37;d    words=%d  \n", i, numWords-1);
    	for(j = 0; j < numWords-1; j++) {
    	  printf("%d: %s ", j, wordArray[j]);
    	  for( i = 0; i < strlen(wordArray[j]); i++)
    	    free(wordArray[j][i]);
    	  free(wordArray[j]);
    	}
    I get this error...
    I also added one extra byte for the \0.
    Well, first it's a warning.
    And duh... have a look at your allocation code:
    Code:
    	  wordArray = (char **)checked_malloc(numWords*sizeof(char*));
    	    wordArray[i] = (char*)checked_malloc(strlen(wordString)*sizeof(char));
    wordArray is allocated.
    wordArray[i] is allocated.
    But you're freeing:
    wordArray[j][i]
    wordArray[j]

    Btw, wordChar[j][i] == char. It's a dereferenced pointer. You need to release the pointers, not the data.

  12. #12
    Registered User
    Join Date
    Sep 2007
    Posts
    15
    well then duh it was right the second time I fixed it.

    Code:
    	for(j = 0; j < numWords-1; j++) {
    	  printf("%d: %s ", j, wordArray[j]);
    	  free(wordArray[j]);
    	}
    
    	free(wordArray);
    	return 0;

  13. #13
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    The thing is, it's not.
    Do the reverse of your allocation. Replace malloc with free.
    Free every element in wordArray. From wordArray[0] to wordArray[numWords - 1].
    THEN free wordArray itself.
    You're just freeing the FIRST element of wordArray.

  14. #14
    Registered User
    Join Date
    Sep 2007
    Posts
    15
    Quote Originally Posted by Elysia View Post
    The thing is, it's not.
    Do the reverse of your allocation. Replace malloc with free.
    Free every element in wordArray. From wordArray[0] to wordArray[numWords - 1].
    THEN free wordArray itself.
    You're just freeing the FIRST element of wordArray.
    Code:
    	for(j = 0; j < numWords-1; j++) {
    	  printf("&#37;d: %s ", j, wordArray[j]);
    	  free(wordArray[j]);
    	}
    
    	free(wordArray);
    	return 0;
    That is exactly how I have had it since you first mentioned the memory leak. If that is not the solution I don't know what is and at this point I couldn't care less.

  15. #15
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Haha, must be getting tired. Missed that part.
    You're right obviously. You outsmarted your teacher
    However... scanf("&#37;s", ...) is unsafe. There's still one thing to fix
    And you can still do dynamic memory allocation (though it's a bit more advanced). So there's plenty more work to do if you want it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. entering numbers to array in a row whithout length input
    By transgalactic2 in forum C Programming
    Replies: 55
    Last Post: 01-02-2009, 04:02 AM
  2. Entering data into array of char pointers
    By richdb in forum C Programming
    Replies: 13
    Last Post: 04-09-2006, 07:30 AM
  3. Entering strings into a list?
    By chadsxe in forum C++ Programming
    Replies: 10
    Last Post: 06-02-2005, 12:40 PM
  4. Entering strings with spaces
    By spudtheimpaler in forum C Programming
    Replies: 6
    Last Post: 12-08-2003, 01:23 PM
  5. entering into 2d array
    By gammacad in forum C Programming
    Replies: 7
    Last Post: 06-09-2002, 03:21 PM