Thread: Is there a better way to trim extra whitespace from a string?

  1. #1
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657

    Is there a better way to trim extra whitespace from a string?

    Code:
    Example input:
    "         Foo  Bar                                   Xip Bas                      "
    Output:
    "Foo Bar Xip Bas"
    My function just shifts each of the characters by a certain number of places, which is determined by the number of spaces already encountered.
    Code:
    char* alltrim(char* s)
    {
        int l=strlen(s);
        int shift=0,i,prev=0,last=-1;
        
        for(i=0;i<l;++i)
        {
            //fprintf(stderr,"i=%d\tshift=%d\tprev=%d,\tlast=%d\n",i,shift,prev,last);
            if(isspace(s[i]))
            {
                if(!prev)
                    shift++;
                prev=0;
            }
            else
            {
                if(shift)
                {
                    s[i-shift]=s[i];
                    s[i]=' ';
                }
                prev=1;
                last=i-shift;
            }
        }
        s[last+1]='\0';
        return s;
    }
    Btw, this works only when I pass it a char [] but not for char*, where it segfaults (when shifting ) -- why does that happen?

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > but not for char*, where it segfaults (when shifting ) -- why does that happen?
    You're only just now realising that "string constants" are stored in read-only memory?

    You can't modify them - they're read-only!
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by Salem View Post
    > but not for char*, where it segfaults (when shifting ) -- why does that happen?
    You're only just now realising that "string constants" are stored in read-only memory?

    You can't modify them - they're read-only!
    Totally forgot that..
    So far, going through C is turning up these little things I missed/forgot that were abstracted away in C++.

  4. #4
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by manasij7479 View Post
    Btw, this works only when I pass it a char [] but not for char*, where it segfaults (when shifting ) -- why does that happen?
    Let me guess. You tried something like
    Code:
    char *test = alltrim(" some test string ");
    You cannot modify literal string constants, because they are stored in read-only memory. You can do
    Code:
    char test[] = " some test string ";
    
    alltrim(test);
    because then you don't have a literal string constant, but just a character array that is initialized to a string.

    However, I found your shifting logic extremely difficult to follow.

    Why not just do a copy operation?
    Code:
    char *alltrim(char *const string)
    {
        size_t  r = 0; /* Read index */
        size_t  w = 0; /* Write index */
    
        if (!string)
            return NULL;
    
        /* Skip leading whitespace. */
        while (isspace(string[r]))
            r++;
    
        /* Copy loop. */
        while (string[r] != '\0')
            if (isspace(string[r])) {
                /* Skip all consecutive whitespaces. */
                while (isspace(string[r]))
                    r++;
    
                /* End copy loop if at end of string. */
                if (string[r] == '\0')
                    break;
    
                /* Since there must be at least one
                 * non-white-space character, add a space. */
                string[w++] = ' ';
            }
    
            /* Copy the current non-whitespace character. */
            string[w++] = string[r++];
        }
    
        /* The output string length is w. Terminate string. */
        string[w] = '\0';
    
        return string;
    }

  5. #5
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Our versions of the function basically follow the same process.
    To explain my logic in terms of yours.. consider my 'i - shift' value to be your 'w' and 'i' to be your 'r'.

    I agree that your code is more clearer than mine.
    Last edited by manasij7479; 11-05-2012 at 02:39 PM.

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    O_o

    You could express the same logic more generally and clearly with iterators.

    Out of curiosity, what is the intent of this bit of code?

    Soma

  7. #7
    Registered User
    Join Date
    May 2012
    Posts
    1,066
    Quote Originally Posted by manasij7479 View Post
    Our versions of the function basically follow the same process.
    There is one small but important difference. Your version goes through the string two times because you use strlen() to compute the length of the string.

    If you change your comparison to
    Code:
    for (i = 0; s[i] != '\0'; i++)
    you also have a one pass solution.

    Bye, Andreas

  8. #8
    [](){}(); manasij7479's Avatar
    Join Date
    Feb 2011
    Location
    *nullptr
    Posts
    2,657
    Quote Originally Posted by phantomotap View Post
    You could express the same logic more generally and clearly with iterators.
    Can 'r' and 'w' of post#4 be considered iterators?
    Out of curiosity, what is the intent of this bit of code?
    Sanitizing input, in general.
    (The words can come from sources which can ocassionally send blanks.)

  9. #9
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by manasij7479 View Post
    To explain my logic in terms of yours.. consider my 'i - shift' value to be your 'w' and 'i' to be your 'r'.
    Right. I don't have trouble reading the code, I just found it very difficult to see it from your perspective/approach.

    In long term, I've found it much more important to understand the idea and the approach used. The code can always be dissected to see exactly what it does, but it is not so easy to see the algorithm, the "why", the intent behind the code.

    In some cases a big comment block, outlining the algorithm, before a chunk of code is the best solution. Sometimes, like in my example, using an approach that is familiar to many, is good too.

    Whatever you do, do keep your own perspective to things: it is what brings the richness to the software world, and is at the root of new innovations. (You just sometimes have to spend a little extra effort in explaining your view and approach to others )

  10. #10
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Can 'r' and 'w' of post#4 be considered iterators?
    Nope.

    The words can come from sources which can ocassionally send blanks.
    It is usually "better" to code consumption of spaces into routines which process input.

    (I'm not implying directly copying the mechanism into every such routine.)

    If we consider the iterator based approach as our foundation, moving the mechanism into such routines may significantly reduce the burden of correct usage without clouding intent having only a trivial implementation cost.

    Code:
    void ProcessInput
    (
        char * fStart
      , char * fEnd
      , int fTrim
    )
    {
        do {
            if(fTrim)
            {
                fStart = Trim(fStart, fEnd);
            }
            // Process the part of the string we care about.
        } while(fStart != fEnd);
    }
    Soma

  11. #11
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Another approach:

    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main(void) {
       char str1[BUFSIZ]={"Night   time      would   find me in    Rosa's Cantina.\n"
          "Music   would   play and     Felina would                        whirl.      "};
       char str2[BUFSIZ];
       
       char str3[]={"Night time would find me in Rosa's Cantina.\n"   //to test against
          "Music would play and Felina would whirl."};
       int i,j,space=0;
    
       for(i=0,j=0;str1[i];i++) {
          while(str1[i]==' ') {
             space=1;
             ++i;
          }
          if(str1[i] && space) {
            space=0;
            str2[j++]=' ';
          } 
          str2[j]=str1[i];
          ++j;
       }
       
       str2[j]='\0';
      
       printf("str1: \n%s\n\nstr2:\n%s\n\n",str1,str2);
    
       printf("str2: %d    str3: %d \n",strlen(str2),strlen(str3));
       
       printf("\n\n%s\n%s\n\n",str2,str3);
       i=0;
       /*while(str2[i]) {
          if(str2[i] == str3[i]) {
             printf("%c   %c   \n",str2[i],str3[i]);
          }
          ++i;
    
       }
       */
       return 0;
    }
    Last edited by Adak; 11-05-2012 at 04:03 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. trim last character of a string
    By khoavo123 in forum C Programming
    Replies: 11
    Last Post: 02-28-2012, 01:56 AM
  2. Replies: 10
    Last Post: 12-04-2010, 12:04 AM
  3. Trim A String
    By pittuck in forum C++ Programming
    Replies: 5
    Last Post: 12-06-2003, 07:38 AM
  4. trim string function (code)
    By ipe in forum C Programming
    Replies: 9
    Last Post: 01-06-2003, 12:28 AM
  5. Extracting Whitespace from String
    By bob2509 in forum C++ Programming
    Replies: 1
    Last Post: 04-23-2002, 11:27 PM