Thread: I can't tell what is wrong with this code ARRR!

  1. #1
    Temporal Apparition qubit67's Avatar
    Join Date
    Jan 2007
    Posts
    85

    I can't tell what is wrong with this code ARRR!

    I have written a get_line function to read lines from a file. The file contains dictionary phrases and defintion file names separated by a ";". I have also written a read_ index function that reads form the file a line using get_line then splits the line up into it's needed parts and then inserts the 2 parts into a BST. For the last 2 weeks, I have not been able to get this to work on my school server, although it works fine on my home pc. Is there anyone that can help me spot the bug? I am totally exhausted and have tried over 15 different configurations to quell the problem with no success. Here is the code, I suspect the problem may lye in freeing the buffer after it is used...
    Please HeLp Me!
    Code:
    char *get_line(FILE *file) 
    {
    	FILE *fp = file;
    	
    	char *buffer;
    	int ch,  i=0;
    	unsigned int buffer_size = BUFFER_SIZE, new_buffer_size;
    
    	assert(buffer_size > 0);
    
    	buffer = (char *) safe_malloc (buffer_size * sizeof (char));
    	
    	while ((ch = getc(fp)) != EOF && ch != '\n')
    	{
    		if (buffer_size == i) 
    		{
    			new_buffer_size = buffer_size * 2;
    			assert(new_buffer_size >= buffer_size);
    			buffer = (char *) safe_realloc (buffer, new_buffer_size * sizeof (char));
    			buffer_size = new_buffer_size;
    		}
    		buffer[i] = ch;
    		i++;
    	}
    	if (ch == '\n')			 
    	{
    		buffer[i] = '\0';
    		return buffer; 		/*finish it off with a null byte*/ 
    	}
    	if ((ch == EOF) && (i >= 1))
    	{
    		buffer[i] = '\0';
    		return buffer;
    	}
    	else
    	{
    		free (buffer);
    		return NULL;
    	}
    }
    Code:
    bstree_t *read_index (char *file_name)
    {
    	FILE *fp;
    	char *buffer, *phrase, *def_file_name;
    	bstree_t *index_tree = NULL;
    	size_t str_len, phrase_len, def_file_name_len;
    	
    	fp = safe_open (file_name, "r");
    		
    	while ((buffer = get_line(fp)) != NULL)
    	{
    		/*split the string at the point ";" into phrase and def_file_name then insert the value
    		*/
    		str_len = strlen (buffer);
    		phrase_len = strcspn (buffer, ";");
    		phrase = (char *) safe_malloc (phrase_len + 1 * sizeof (char));
    		strncpy (phrase, buffer, phrase_len);
    		phrase[phrase_len+1] = '\0';
    	
    		def_file_name_len = str_len - phrase_len - 1;
    	
    		def_file_name = (char *) safe_malloc ((def_file_name_len+1) * sizeof (char));
    	
    		strcpy (def_file_name, buffer + phrase_len + 1);
    	
    		def_file_name[def_file_name_len-1] = '\0';
    		
    		index_tree = bstree_insert (index_tree, phrase, def_file_name);
    
    	}
    	
       return index_tree;
    }
    [/code]

  2. #2
    Registered User ssharish2005's Avatar
    Join Date
    Sep 2005
    Location
    Cambridge, UK
    Posts
    1,732
    What happenes when you run this in the school. Do you get any runtime error. What compile do you use at school? Did you use this same input of what u used at home.

    ssharish

  3. #3
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    There's one definite error I see in the code. You wind up doing this:
    phrase = (char *) safe_malloc (phrase_len + 1 * sizeof (char));
    phrase[phrase_len+1] = '\0';
    You're writing one beyond the end of your allocated space. If you allocate n bytes, you can only write up to the n-1st byte. That is to say, if you do:
    char *s = malloc(5);
    You're only allowed to access s[0], s[1], s[2], s[3], and s[4] (assuming, of course, the malloc() succeeds).

  4. #4
    Temporal Apparition qubit67's Avatar
    Join Date
    Jan 2007
    Posts
    85
    I use gcc with the flags "-Wall -ansi -g -o" both at home and at school. School uses solaris, I use debian. Debugging with gdb shows different values on the school computer compared to my home computer. I.E, the right ones on my pc, the wrong ones on the school pc. For example at home I can read in the line : "DNCRI;def2034" and get "DNCRI" and "def2034", but when using the school ocmputers, a few values turn out wrong "@NCRI" etc. and cause a segmentation fault when reading the index into the BST. BTW my line endings are set at LF for unix.

  5. #5
    Temporal Apparition qubit67's Avatar
    Join Date
    Jan 2007
    Posts
    85
    cas are you saying that :

    phrase = (char *) safe_malloc (phrase_len + 1 * sizeof (char));
    phrase[phrase_len+1] = '\0';

    should be :

    phrase = (char *) safe_malloc (phrase_len + 1 * sizeof (char));
    phrase[phrase_len] = '\0'; ?



    I know I need one more for the null, but does this mean that phrase_len+1 updates phrase_len by one and thus only phrase_len alone needs to be used for string phrase?

  6. #6
    Registered User ssharish2005's Avatar
    Join Date
    Sep 2005
    Location
    Cambridge, UK
    Posts
    1,732
    Code:
    def_file_name_len = str_len - ( phrase_len - 1 );
    change that, some time you will end up with the negative value.

    Code:
    def_file_name[def_file_name_len-1] = '\0';
    Wouldn't this need to be

    Code:
    def_file_name[def_file_name_len + 1] = '\0';
    ssharish

  7. #7
    Temporal Apparition qubit67's Avatar
    Join Date
    Jan 2007
    Posts
    85
    thanks ssharish, will give your advice a shot

  8. #8
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    Quote Originally Posted by qubit67 View Post
    cas are you saying that :

    phrase = (char *) safe_malloc (phrase_len + 1 * sizeof (char));
    phrase[phrase_len+1] = '\0';

    should be :

    phrase = (char *) safe_malloc (phrase_len + 1 * sizeof (char));
    phrase[phrase_len] = '\0'; ?



    I know I need one more for the null, but does this mean that phrase_len+1 updates phrase_len by one and thus only phrase_len alone needs to be used for string phrase?
    I assume if buffer points to "DNCRI;def2034" then you want phrase to point to (the new string) "DNCRI". If that's the case then yes, the substitution you mentioned is correct.

    In this case, phrase_len would be 5, because "DNCRI" has a length of 5. So to make a string, you'd want to allocate 6 bytes, which is phrase_len + 1. Because strncpy() is a bizarre little function (as you are apparently aware, it doesn't always create a string), you need to add the null byte yourself. In this case, it's at location 5, or phrase[phrase_len]. Array members 0, 1, 2, 3, 4 are the characters you just copied over, so member 5 is where a null byte goes. As you can see, 0-5 is 6 elements, which is why you used phrase_len + 1 as the size to allocate.

  9. #9
    Temporal Apparition qubit67's Avatar
    Join Date
    Jan 2007
    Posts
    85
    ok thankyou, I get it now, the array starts at 0 and the length needs to be re-adjusted for this.

    Still trying to get it right, the chages i have made have not solved my problem

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Your get_line() function still writes past the end of allocated memory if an input line is exactly a multiple of BUFFER_SIZE in length.

    Add
    assert( i < buffer_size );
    at the exit of the while loop.

    Why are you casting the result of safe_malloc(). Isn't it like normal malloc in that it returns a void* ?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Temporal Apparition qubit67's Avatar
    Join Date
    Jan 2007
    Posts
    85
    #define BUFFER_SIZE 1

    i should have included this in my post

    As far as i can see from gdb :

    read_index (file_name=0x8061a68 "index") at project_libs.v7.c:131
    131 while ((buffer = get_line(fp)) != NULL)
    (gdb)
    get_line (file=0x80616c0) at project_libs.v7.c:48
    48 FILE *fp = file;
    (gdb)
    51 int ch, i=0;
    (gdb)
    52 unsigned int buffer_size = BUFFER_SIZE, new_buffer_size;
    (gdb)
    54 assert(buffer_size > 0);
    (gdb)
    56 buffer = (char *) safe_malloc (buffer_size * sizeof (char));
    (gdb)
    safe_malloc (bytes=1) at project_libs.v7.c:38
    38 if ((p = malloc (bytes)) == NULL)
    (gdb)
    43 return p;
    (gdb)
    44 }
    (gdb)

    buffer is reset to 1 every time the function is used,
    Am i still at risk of overflow with this?

  12. #12
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    OK, so trace through that function when you have a line which contains only 1 character and a newline.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by qubit67 View Post
    Am i still at risk of overflow with this?
    If your line is exactly the same lenght as whatever buffer_size you have when you find the newline, the if to check for the i == buffersize will be missed out, and then you will write to buffer[buffersize] which is one byte too far.

    I would also recommend that you start with a BUFFER_SIZE of (say) 16 or 32 bytes. Doing realloc 4/5 times for a 16/32 byte string is too much overhead. The amount of memory used behind the scenes in malloc/realloc is probably in the order of 16 bytes anyways (at the very least 4 bytes on a 32-bit machine, but most likely more, if nothing else because malloc aligns your allocation to some nice even boundary of 4, 8 or 16), so the overhead of allocating a few too many bytes for really short lines is a marginal saving.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Temporal Apparition qubit67's Avatar
    Join Date
    Jan 2007
    Posts
    85
    ok, so i think if i take out the guard of the new line char in the while loop, and put it into the body of the while loop, after the i++ in an if statement it will work for the new line.

    But what about the EOF in the while guard?

    I need it there for the end of file, the function should return null for the read index function if it is encountered on a new line as the first char.

    If the EOF is encountered at the end of a string, I need to insert a '/0' in place of the EOF and return the buffer as opposed to a new line with just the EOF : return NULL.

    How do I design for this?

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    One way would be to ensure that the string always remains \0 terminated in the loop, then you don't need extra code outside the loop.

    Then inside the loop, all you need is
    if ( ch == '\n' ) break;
    after you've done everything else.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. what is wrong in this simple code
    By vikingcarioca in forum C Programming
    Replies: 4
    Last Post: 04-23-2009, 07:10 AM
  2. what is wrong with this code please
    By korbitz in forum Windows Programming
    Replies: 3
    Last Post: 03-05-2004, 10:11 AM
  3. I cant find what is wrong with this code
    By senegene in forum C Programming
    Replies: 1
    Last Post: 11-12-2002, 06:32 PM
  4. Anyone see what is wrong with this code?
    By Wise1 in forum C Programming
    Replies: 2
    Last Post: 02-13-2002, 02:01 PM
  5. very simple code, please check to see whats wrong
    By Unregistered in forum C Programming
    Replies: 3
    Last Post: 10-10-2001, 12:51 AM