Thread: Removing Leading & Trailing Whitespace

  1. #1
    Registered User
    Join Date
    Apr 2005
    Posts
    42

    Removing Leading & Trailing Whitespace

    Hi, I need help with removing trailing and leading whitespace from a char array. Does anyone have any advice as to how I should do this. I can remove trailing whitespace by working backwards through a pointer (e.g. char *end = input + (strlen(input) - 1) and then do this:
    Code:
    while (*end && end == ' ') end--;
    *(end + 1) = '\0';
    Leading whitespace is harder because while I am able to work out how much of it there is by a simple loop I then need to move everything 'forwards' a few places, to effectively overwrite the whitespace (e.g. ' hello' would be moved two places to the left making 'hello').
    Can anyone help me with moving the whole string (up to a '\0') to the left by a few places?
    Thanks for all of your help.

  2. #2
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Check out the memmove() function.
    If you understand what you're doing, you're not learning anything.

  3. #3
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Basically you do:
    Code:
    char *realstart = string + num_leading_spaces; // Pointer to first non-space character
    
    memmove(string, realstart, strlen(realstart) + 1);
    If you understand what you're doing, you're not learning anything.

  4. #4
    Registered User
    Join Date
    Apr 2005
    Posts
    42
    I made this quick test function (well program) to test it out however when I run it I get a Bus Error (aka a seg fault). Here is the code that I am trying to use:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int stripWhitespace (char *inputStr)
    {
    	char *start, *end;
    	start = inputStr;
    	while (*start && *start == ' ') start++;
    	printf("It is %i and %i\n", inputStr, start);
    	printf("Strlen + 1 is: %i\n", strlen(start) + 1);
    	memmove(inputStr, start, strlen(start) + 1);
    	return 0;
    }
    
    int main ()
    {
    	char *mychr  = "   see spot   see";
    	stripWhitespace(mychr);
    	puts(mychr);
    	return 0;
    }
    My debugging printf's seem to show that the problem is with memmove but I am not sure what I am doing wrong with it.

  5. #5
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    The problem is that you're using a string that's not modifiable. Try changing:
    Code:
    char *mychr = "    see spot   see";
    to:
    Code:
    char mychr[] = "   see spot   see";
    The first way just creates a pointer to a string literal which is stored in read-only memory (on most systems). The second way creates an array in which the string is copied. The array contents are modifiable.

    EDIT: By the way, your usage of memmove() itself looks fine. It's just that the pointers that you're passing to memmove() point to read-only memory.
    Last edited by itsme86; 11-30-2005 at 03:51 PM.
    If you understand what you're doing, you're not learning anything.

  6. #6
    Registered User
    Join Date
    Apr 2005
    Posts
    42
    Thank you very much. It all seems to work fine now! In the end I came up with this function (which works perfectly) and would like to know if you have any suggestions for how it could be improved:
    Code:
    void stripWhitespace (char *inputStr)
    {
    	char *start, *end;
    	start = inputStr;
    	while (*start && *start == ' ') start++;
    	memmove(inputStr, start, strlen(start) + 1);
    	end = inputStr + strlen(inputStr) - 1;
    	while (*end && *end == ' ') end--;
    	*(end + 1) = '\0';
    }
    Thanks again.

  7. #7
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Code:
    while (*start && *start == ' ') start++;
    The first condition is unnecessary. If *start is a space then there's no need to check if *start is non-zero. I'd just do:
    Code:
    while (*start == ' ') start++;
    Code:
    while (*end && *end == ' ') end--;
    You're working backwards through the string so *end will never be zero (as long as you're still pointing to somewhere within the string). Instead you want to make sure that end is greater than inputStr to avoid running past the beginning of the string. I'd write it like:
    Code:
    while (end >= inputStr && *end == ' ') end--;
    And just for efficiency you might want to avoid calling strlen() twice. You could create a size_t len and set that just before the memmove(). Then you could pass len to memmove() and use it in the calculation for your end pointer.

    Also, sometimes it's helpful, if your function isn't going to return anything else, to return a pointer to the finished string. Sort of like strcpy() and the rest of the standard string functions do.

    Looks good though!

    My finished function, if I were to write it, would probably look like:
    Code:
    char *stripWhitespace (char *inputStr)
    {
      char *start, *end;
      size_t len;
    
      /* Strip leading whitespace */
      start = inputStr;
      while(*start == ' ') start++;
      len = strlen(start);
      memmove(inputStr, start, len + 1);
    
      /* Strip trailing whitespace */
      end = inputStr + len - 1;
      while(end >= inputStr && *end == ' ') end--;
      *(end + 1) = '\0';
    
      return inputStr;
    }
    Last edited by itsme86; 12-01-2005 at 01:24 PM.
    If you understand what you're doing, you're not learning anything.

  8. #8
    Registered User
    Join Date
    Dec 2005
    Posts
    15
    I usually tend to handle constant string parsing with a struct, such that I maintain two endpoints. I seldom use the standard C runtime string handling, because this technique is far superior.

    The beauty of doing things like this is that I can chew up and pass around substrings from very large sets of input data without moving any memory around, or allocating any memory. It's fairly easy to write even an entire XML parser this way that never allocates a thing, and only (recursively) passes around pointers to its self as it finds the ends of fields within the file.

    Code:
    typedef struct cstr
    {
        const char* begin; /* Beginning of string */
        const char* end;    /* End of string */
    } cstr;
    
    /* Initialization from various scenarios... */
    #define cstr_ptr( cstr, sz )  \
    { \
       (cstr)->begin = (cstr)->curr = (sz); \
       (cstr)->end = (cstr)->begin + strlen((cstr)->begin); \
    }
    
    /* How big is a cstr */
    #define cstr_len( cstr )  ( (cstr)->end - (cstr)->begin )
    
    /* Strip white space from either end of string */
    void cstr_stripwhite( cstr* self )
    {
       const char* begin = self->begin;
       const char* end = self->end;
       while( begin < end && isspace(*begin) )
          begin++;
       while( end > begin && isspace(*(end-1) )
          end--;
       self->begin = begin;
       self->end = end;
    }
    Of course, if you don't want to do this whole structured approach, do keep in mind that you can go a long way with changing a pointer (such as a pointer to the beginning of a string), rather than moving data around, which can be very time consuming.

    Added...
    (Of course, the actual code I use is a macro template that swaps out char/wchar_t/etc. functions and manufactures the appropriate string handlers....)
    Last edited by evildave; 12-01-2005 at 01:54 PM.

  9. #9
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    What's with all the "evil" people in this topic?

    Anyway, as far as I can tell, that's a pretty damn good first post evildave. *claps*
    Sent from my iPadŽ

  10. #10
    Registered User KidA's Avatar
    Join Date
    Nov 2005
    Location
    Ohio, USA
    Posts
    26
    One suggestion I would make is that "whitespace" includes other characters as well, such as horizontal tab and linefeed. For this reason I would replace direct comparisons to ' ' with a call to isspace(), like evildave did.

    While you may only want to strip space characters today, later you may want to use it to strip any whitespace...
    "So I was sitting in my cubicle today, and I realized, ever since I started working, every single day of my life has been worse than the day before it. So that means that every single day that you see me, that's on the worst day of my life" - Peter Gibbons

  11. #11
    Registered User
    Join Date
    Apr 2005
    Posts
    42
    What's with all the "evil" people in this topic?
    It is a well known fact that a common enemy to all evil people is whitespace, as it is the one thing that stands between well obfuscated code.

    I really do like your object orientated approach, evildave and will try to keep it up. On a side note the function was not meant to be void, I am planning to making it return the number of chars that it stripped (but just have not got round to do doing it yet).

    Thanks for all of your help guys.

    *EDIT* A quick question about memmove, inputStr is == to start will it actually bother to move the memory or will it realise that there is no point as both pointers point to the same place?
    Last edited by EvilGuru; 12-01-2005 at 03:35 PM.

  12. #12
    Registered User
    Join Date
    Dec 2005
    Posts
    15
    Aww, shucks! Well, just for all that back-patting...

    Code:
    /* Strip white space from either end of string, return number of characters stripped */
    size_t cstr_stripwhite( cstr* self )
    {
       const char* begin = self->begin;
       const char* end = self->end;
       size_t len = end-begin;
       while( begin < end && isspace(*begin) )
          begin++;
       while( end > begin && isspace(*(end-1) )
          end--;
       self->begin = begin;
       self->end = end;
       return len-(end-begin);
    }
    Keep in mind that strings handled in this way are not necessarily NULL terminated (i.e. the spaces are all still there, just outside the pointers). If you were to put it back into a NULL terminated (hence C runtime compatible) string, you'll have to finally copy it.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Fix sprintf removing leading 0's
    By k2712 in forum C Programming
    Replies: 2
    Last Post: 09-10-2007, 10:58 AM
  2. using sscanf to skip trailing whitespace
    By y_cant_i_C in forum C Programming
    Replies: 7
    Last Post: 09-25-2006, 06:01 PM
  3. Removing leading and trailing whitespace
    By JimpsEd in forum C Programming
    Replies: 2
    Last Post: 05-14-2006, 03:55 PM
  4. k&r ex1-18 removing trailing spaces.
    By xion in forum C Programming
    Replies: 1
    Last Post: 07-14-2003, 02:20 PM
  5. Removing whitespace
    By Unregistered in forum C Programming
    Replies: 13
    Last Post: 12-31-2001, 08:17 AM