Thread: Extra Chars Inserted

  1. #1
    Old Fashioned
    Join Date
    Nov 2016
    Posts
    137

    Question Extra Chars Inserted

    I wrote this code:

    Code:
    char* remove_WS(char* input)
    {
      char *temp, *temp2;
      unsigned int counter = 0;
      int original_length = strlen(input), i;
      printf("ORIGINAL INPUT LENGTH: %d\n", original_length);
      temp = malloc(sizeof(char)*original_length);
      for(i = 0; i < original_length; i++)
        {
          if (input[i] == ' ' || input[i] == '\n')
        continue;
          else
        {
          printf("Char recorded %d: %c\n",i,input[i]);
          counter++;
          temp[i] = input[i];
        }
        }
      printf("temp after loop: %s\n", temp);
      printf("temp STRLEN after loop: %d\n", strlen(temp));
      temp2 = realloc(temp,strlen(temp));
      printf("temp2 strlen: %d\n", strlen(temp2));
      return temp2;
    }
    The problem I am having is that somehow, there is junk being inserted into the char* and I have no idea how/why, given the output from the code:


    Code:
    ORIGINAL INPUT LENGTH: 12
    Char recorded 0: Z
    Char recorded 1: e
    Char recorded 2: l
    Char recorded 3: l
    Char recorded 4: o
    Char recorded 6: W
    Char recorded 7: o
    Char recorded 8: r
    Char recorded 9: l
    Char recorded 10: d
    Char recorded 11: !
    temp after loop: Zello+World!)\MiΘÑP
    temp STRLEN after loop: 22
    temp2 strlen: 17
    You can sorta see my feeble debugging attempts here, and I am stumped. Clearly, the iteration is only going through 12 times because Char recorded only happens 12 times total. However, the char* temp that the loop has stored the characters in is length 22???? That's where I'm lost. I've also tried using counter in realloc instead of strlen but no change. Thank you!

    Btw, sorry for the 'Z' this function is part of a bigger program that I am writing to work with strings and I had earlier used a different function to change the H to a Z!

  2. #2
    Registered User
    Join Date
    Jun 2015
    Posts
    1,640
    Don't you know by now that a C-string MUST have a zero-terminator to mark the end? It doesn't happen by magic. You need to put it in.

  3. #3
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    General notes first:

    • Neatness matters! If code is easy to read, it's harder to make mistakes and easier to find and fix them when you do. On the flip side, messy code can make debugging more difficult.
    • Neat code also makes it easier for us to help you, and thus we're more likely to help. You can preview your post before submitting. Also, double check after submission that all looks good. If not, edit your post to fix it.
    • The declaration on line 5 looks messy. Declare original_length and i on separate lines. If you're using C99 or later (basically anything but MS Visual C or Turbo C), you can (and arguably should) declare i in the for loop: for (int i = 0; i < ...).
    • Your indentation could use a little work around lines 11-17.
    • The if (is a space) continue else { do stuff } is unnecessary. You can clean it up by reversing the if condition so that you basically have if (is not a space) { do stuff }.
    • sizeof(char) is redundant, the C standard guarantees it will always be 1. Remove it from the malloc.
    • You should check the return value of malloc (yes, it can fail). If it does, print a useful error message (look into perror or strerr functions) and return immediately from your function (probably returning NULL). Your function can't continue if you can't allocate a new buffer.
    • Likewise realloc can fail. If it does, you may want to log an error, but no need to return. You still have temp, which is a pointer to valid memory. Just return that.
    • The ctype.h header provides a function called isspace that may be of use to you.


    On to the problem you asked about. Nothing is inserting junk, strictly speaking. When you allocate memory with malloc/calloc/realloc, it comes uninitialized (just like local variables1). That means the memory you have is pre-loaded with junk. Unless you do something to deal with that junk (overwrite it with the data you want), it remains junk.

    So how did that '+' end up in between "Zello" and "World"? As written, the counter variable is unnecessary as you never use it's value (most compilers have an option to warn you if you never use a variable's value -- enable it in yours). It seems like counter is tracking the number of non-space characters, i.e. the number of chars in temp. This happens to also be the index of the next free position in temp, which is the index of temp where you should storing input[i], something like
    Code:
    temp[counter++] = input[i];
    Why is the length of temp/temp2 22 chars? Well, strings in C need to be terminated by a null character, '\0'. However, when you copy characters to temp, you never copy the null character. Just do so explicitly after the for loop:
    Code:
    temp[counter++] = '\0';
    One more thing to remember. strlen returns the length of the string without the terminating null character. So strlen("foobar") returns 6, though "foobar" takes up 7 bytes in memory. Make sure you malloc/realloc enough space for the string plus 1 more char for the null terminator. If you don't allocate space for the null, and you copy it anyway, you will have a buffer overflow, writing to memory you don't technically own. This results in undefined behavior, which is bad.

    1 Technically this only applies to automatic storage duration local variables (non-static), but that's the default and by far the most common case, so no need to worry about static local variables just yet.
    Last edited by anduril462; 12-14-2016 at 12:41 AM.

  4. #4
    Old Fashioned
    Join Date
    Nov 2016
    Posts
    137
    Quote Originally Posted by algorism View Post
    Don't you know by now that a C-string MUST have a zero-terminator to mark the end? It doesn't happen by magic. You need to put it in.
    Naw don't give me too much credit, I came from C# so null termination is a sticking point. I am also quite new to C but I may at times seem more experienced just because I'm not new to programming but am absolutely new to C. Not just that but frankly, string manipulation has never been my strength in any language. I'm trying to fix that! Thanks for pointing that out as that was definitely the heart of the problem here.

    anduril462 -

    1. I apologize for the formatting. Believe it or not, the formatting in my actual code is actually standard, but somehow it didn't paste well into here and I neglected to check that. I actually tried to edit but it had been after 60 minutes of course so that's that. So again, I apologize and thank you for pointing out the preview. If I'm going to be on here a lot (and it seems like I am, because I love C), I need to be paying better attention to the quality of my posts and it starts with that, so thanks again.

    2. Thanks for all the other info as well. Your post was very resourceful and shows your experience with the language and char arrays. Frankly, I forgot about the \0 termination string thing because it just so happened that this is the first time I set myself up with a situation that required me to manually enter it, especially since I've picked up bad habits (so to speak) from C# and other languages which don't deal with strings that way at the end-user level. I also appreciate the isspace, strlen, and memory allocation suggestions you provided.

  5. #5
    Registered User
    Join Date
    Jun 2011
    Posts
    4,513
    Quote Originally Posted by Asymptotic View Post
    1. I apologize for the formatting. Believe it or not, the formatting in my actual code is actually standard, but somehow it didn't paste well into here ...
    If you haven't done so already, I'd suggest setting up your editor to use "spaces instead of tabs". This will help ensure consistency when copy/pasting the code.

  6. #6
    Old Fashioned
    Join Date
    Nov 2016
    Posts
    137
    Thank you! I had wondered why my code pasted online was always turning out a mess (ppl yelled at me at Stack Exchange before too)! I never really knew that spaces/tabs was a "thing."

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. 0 being inserted at start of linked-list
    By latte123 in forum C Programming
    Replies: 5
    Last Post: 02-15-2013, 01:17 AM
  2. Replies: 8
    Last Post: 08-29-2011, 08:37 AM
  3. Check if user inserted value in a TextBox is a float
    By jmarcos in forum Windows Programming
    Replies: 6
    Last Post: 09-08-2009, 01:04 PM
  4. Start program when USB-memory is inserted?
    By Marcux in forum C Programming
    Replies: 3
    Last Post: 04-18-2008, 11:32 PM
  5. just inserted this into my code
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 10
    Last Post: 11-11-2003, 01:33 PM

Tags for this Thread