Thread: Critique my string splitting API

  1. #1
    Registered User
    Join Date
    Oct 2019
    Posts
    82

    Critique my string splitting API

    Hello,

    This is probably petty, if not very petty but...

    Can someone critique my string splitting API.

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <stdlib.h>
    
    
    static char** splitstring(char *str, int *noOfTokens)
    { 
    	char **result = malloc(sizeof(char));
    	char *token = strtok(str, " ");
    	*noOfTokens = 0;
    
    
    	while (token)
    	{
    		result[*noOfTokens] = malloc(strlen(token));
    		strncpy(result[*noOfTokens], token, strlen(token));
    
    
    		token = strtok(NULL, " ");
    		(*noOfTokens)++;
    	}
    
    
    	return result;
    }
    
    
    int main(int argc, char *argv[])
    {
    	int noOfTokens;
    	char tokenize[] = "This is a string\0";
    	char ** tokens = splitstring(tokenize, &noOfTokens);
    
    
    	for (int i = 0; i < noOfTokens; i++)
    	{
    		printf("%s\n", tokens[i]);
    	}
    
    
    	return 0;
    }

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > char **result = malloc(sizeof(char));
    This doesn't allocate anywhere near enough space.
    At the very least, you need to start with
    Code:
    char **result = malloc(sizeof(char*));
    Additionally, you want to end up with something like
    char **result = malloc( sizeof(char*)*(*noOfTokens) );
    But you don't know that in advance.

    So you need to use realloc to expand result as necessary as you discover additional tokens.

    > result[*noOfTokens] = malloc(strlen(token));
    This doesn't allocate space for the \0 at the end.

    > strncpy(result[*noOfTokens], token, strlen(token));
    Nor will this strncpy as written append a \0 to the string.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    This line allocates enough space for only one char:

    Quote Originally Posted by ghoul View Post
    Code:
        char **result = malloc(sizeof(char));
    Since you don't know how many tokens are in the string, you'll need to dynamically (re-)allocate result for each token (alternatively, when you need to reallocate, reallocate more than you need (exponentially) so you don't have to reallocate for every single token).

    This line allocates one fewer char than is needed (it doesn't allocate the null terminator):

    Quote Originally Posted by ghoul View Post
    Code:
            result[*noOfTokens] = malloc(strlen(token));
    Edit: Salem beat me to it while I was writing my response!

  4. #4
    Registered User
    Join Date
    Oct 2019
    Posts
    82
    Noticed this after making the post here:

    Code:
        char**result = malloc(sizeof(char));
    And was even more surprised that code was working... What exactly was happening?

    So, I have this, let the point where reallocating memory with each token found has not been done yet.

    Code:
    static char** splitstring(char *str, int *noOfTokens)
    { 
    	char **result = malloc(sizeof(char*));
    	char *token = strtok(str, " ");
    	*noOfTokens = 0;
    
    
    	while (token)
    	{
    		if (*noOfTokens)
    			result = realloc(result, sizeof(char*) * ((*noOfTokens) + 1));
    		result[*noOfTokens] = malloc(strlen(token) + 1);
    		strncpy(result[*noOfTokens], token, strlen(token) + 1);
    
    
    		token = strtok(NULL, " ");
    		(*noOfTokens)++;
    	}
    
    
    	return result;
    }
    I'm not sure how the part about reallocating more memory than I need to avoid calling realloc with every run would be done though I am sure I have seen this used before in some code. Any hints?

  5. #5
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    Quote Originally Posted by ghoul View Post
    Noticed this after making the post here:

    Code:
        char**result = malloc(sizeof(char));
    And was even more surprised that code was working... What exactly was happening?
    My guess is you were just lucky. Most memory allocators will typically allocate a somewhat larger block than you request (to keep memory blocks aligned), so you were probably writing into that extra memory that technically doesn't really "belong" to your program.

    Quote Originally Posted by ghoul View Post
    I'm not sure how the part about reallocating more memory than I need to avoid calling realloc with every run would be done though I am sure I have seen this used before in some code. Any hints?
    This is more of an optimization to reduce the number of times memory is reallocated. I would worry about getting your code working correctly first before optimizing it.

    To use this technique, you need to keep track of how much space is allocated. If you need to expand the memory block (larger than how much is currently allocated), increase the allocation size by some percentage (50% is common) and remember that new size (if the expanded size still isn't big enough for what you need then you can just set the new size to what you need), and then reallocate the memory to the new size. That's basically it.

    Once you get that working you could make some improvements on the basic idea, like starting with a size greater than 1 (perhaps 32 bytes, or 4 64-bit pointers, or whatever) so you don't get a lot of memory reallocations in the beginning, or changing the growth to linear once the allocation gets above a certain size (say, 1 MB) to limit internal fragmentation.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String splitting
    By RyanC in forum C Programming
    Replies: 3
    Last Post: 05-22-2019, 10:10 PM
  2. Help on splitting string in C
    By bladez in forum C Programming
    Replies: 11
    Last Post: 02-21-2018, 10:02 AM
  3. splitting a string.
    By Romyo2 in forum C Programming
    Replies: 27
    Last Post: 06-11-2015, 04:41 PM
  4. Splitting up a string
    By monki000 in forum C Programming
    Replies: 12
    Last Post: 03-04-2010, 12:40 PM
  5. splitting a string
    By smegly in forum C Programming
    Replies: 6
    Last Post: 05-20-2004, 12:04 PM

Tags for this Thread