Thread: A bullet-proof fgets wrapper?

  1. #1
    Registered User
    Join Date
    Sep 2004
    Posts
    44

    A bullet-proof fgets wrapper?

    I have been working on a wrapper for fgets that will return all input even if the destination buffer length is exceeded. This is what I came up with, tear it apart.
    Code:
    #include <stdio.h>
    #include <conio.h>
    #include <string.h>
    #include <malloc.h>
    
    char *sgets(char *buf, int bufsize);
    size_t strlcpy(char *d, const char *s, size_t bufsize);
    size_t strlcat(char *d, const char *s, size_t bufsize);
    
    int main(void)
    {
        char buf1[6];
        char *pbuf1;
        char buf2[6];
        char *pbuf2;
    
        printf("Enter String 1: ");
        pbuf1 = sgets(buf1, sizeof(buf1));
        if(pbuf1 == NULL)
        {
    	printf("Input error: String 1\n");
    	return(1);
        }
    
        printf("Enter String 2: ");
        pbuf2 = sgets(buf2, sizeof(buf2));
        if(pbuf2 == NULL)
        {
    	printf("Input error: String 2\n");
    	return(2);
        }
    
        printf("\n");
        printf("String 1 = %s \n", pbuf1);
        printf("String 2 = %s \n", pbuf2);
    
        printf("\nPress any key to continue...");
        getch();
        return(0);
    }
    
    char *sgets(char *buf, int bufsize)
    {
        char *newbuf;
        char tempbuf[BUFSIZ];
        size_t newsize;
    		
        if(fgets(buf, bufsize, stdin) == NULL)
        {
    	return(NULL);  
        }
        else
        {
            if( strchr(buf, '\n') == NULL )
            {	
                newbuf = malloc(bufsize);
                strlcpy(newbuf, buf, bufsize);
                newsize = bufsize;
                }
                else
                {
                    return(buf);
                }
    		
                while( strchr(newbuf, '\n') == NULL )
                {
                    newsize += (BUFSIZ * sizeof(char));
                    newbuf = realloc(newbuf, newsize);
                    if( newbuf ==  NULL )
                    {
                        return(NULL);
                    }
                    else
                    {
                            if(fgets(tempbuf, sizeof(tempbuf), stdin) == NULL)
    			{
                                 return(NULL);
                            }
                            else
    			{
                                strlcat(newbuf, tempbuf, newsize);
                            }
                    }
            }
            return(newbuf);
        }
    }
    
    size_t strlcpy(char *d, const char *s, size_t bufsize)
    {
        size_t len;
        size_t ret;
        
        if (!d || !s)
        {
    	return(0);
        }
        len = strlen(s);
        ret = len;
        if (bufsize <= 0)
        {
    	return(0);
        }
        if (len >= bufsize)
        {		
    	len = bufsize-1;
        }
        memcpy(d, s, len);
        d[len] = 0;
        
        return(ret);
    }
    
    size_t strlcat(char *d, const char *s, size_t bufsize)
    {
        size_t len1;
        size_t len2;
        size_t ret;
        
        if (!d || !s || bufsize <= 0)
        {
    	return(0);
        }
        
        len1 = strlen(d);
        len2 = strlen(s);
        ret = len1 + len2;
        if (len1+len2 >= bufsize) 
        {
            len2 = bufsize - (len1+1);
        }
        if (len2 > 0) 
        {
            memcpy(d+len1, s, len2);
            d[len1+len2] = 0;
        }
        return(ret);
    }
    Last edited by n7yap; 12-11-2004 at 11:22 PM.

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >tear it apart
    As ordered. I'll even ignore the non-portable stuff that you probably have already been flamed about.

    >#include <malloc.h>
    This is an archaic header that was supplanted by stdlib.h.

    >char *sgets(char *buf, int bufsize);
    This is the first scary part. You expect to pass in a primary buffer and buffer size, yet the function is supposed to return a pointer to dynamic memory. What happens to buf?

    >size_t strlcpy(char *d, const char *s, size_t bufsize);
    >size_t strlcat(char *d, const char *s, size_t bufsize);
    Both of these invade the implementation's namespace by using str followed by a lower case letter. Even worse, they're clones of two common extension functions, so the chances of a name conflict with the imlementation are pretty good.
    Code:
    if(fgets(buf, bufsize, stdin) == NULL)
    {
      return(NULL);  
    }
    All is well so far. If fgets fails then sgets fails. That's a reasonable expectation.
    Code:
    if( strchr(buf, '\n') == NULL )
    {	
      newbuf = malloc(bufsize);
      strlcpy(newbuf, buf, bufsize);
      newsize = bufsize;
    }
    Everything is still okay, but now I begin to wonder why you only allocate memory if a newline wasn't in the buffer, and whether the lack of + 1 for a null character in malloc is an error or you already accounted for it.
    Code:
    else
    {
      return(buf);
    }
    Bzzt! Your code will be the source of endless confusion because the client will expect a pointer to dynamic memory to be returned. Because buf is actually an array, when the client tries to do the right thing and free their memory when they're done, the program will likely crash. Either return the buffer you're given or a pointer to memory allocated within the function, but not both.

    >newsize += (BUFSIZ * sizeof(char));
    sizeof(char) is guaranteed to be 1, so you can omit that part of the expression.

    >newbuf = realloc(newbuf, newsize);
    Once again, how do I know that you've left room for a null character at the end? People expect to see this, so if you don't include it then a comment explaining why is a good idea.
    Code:
    if( newbuf ==  NULL )
    {
      return(NULL);
    }
    This causes a memory leak. No problem if your system reclaims memory when the short running program terminates, not okay if the function is used in a long running program or if the system doesn't reclaim memory properly.

    >return(newbuf);
    This is where you return freeable memory, but if clients crash half the time they use your function, they'll probably break the code even more by not calling free. Or they'll use fgets, which is clearly easier and more predictable.

    Let's try again, this time with the following goals in mind:
    • Handle empty lines
    • Handle EOF
    • Handle input errors
    • Work with any input stream
    • Always return freeable memory
    • Be conservative with the returned memory if possible
    • Remove the trailing newline character
    • Return NULL only if no input could be processed, otherwise return what we could get
    • Avoid leaking memory

    The simplest possible declaration for such a function would be:
    Code:
    char *readline(FILE *in);
    It's equally simple to use, and a test program is immediately obvious:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    char *readline(FILE *in);
    
    int main(void)
    {
      char *p = readline(stdin);
    
      if (p != NULL) {
        printf("[%s]\n", p);
        free(p);
      }
    
      return 0;
    }
    
    /* Definition for readline here */
    As with your sgets, my readline will use a temporary buffer and fgets, but I'll be liberal with the memory allocation to be on the safe side (I whipped this out pretty quickly, so I'd rather err on the side of having too much). Because of this, and one of the goals, I'll attempt to shrink wrap the memory to fit perfectly just before returning.

    The outline might look like this if we know that a pointer to char is needed for the final string, an array of char for the buffer, a save pointer for safe realloc calls, and the current allocation size:
    Code:
    char *readline(FILE *in)
    {
      char *ret = NULL, *save;
      char buf[BUFSIZ];
      int ret_size = 0;
    
      while (fgets(buf, sizeof buf, in) != NULL) {
        /* Resize */
        /* Copy */
        /* Are we done? */
      }
    
      /* Shrink wrap if possible */
      if (ret != NULL) {
        save = realloc(ret, strlen(ret) + 1);
        if (save != NULL)
          ret = save;
      }
    
      return ret;
    }
    The resize part is easy, but I do make use of an obscure feature of realloc. If the pointer argument is NULL then realloc acts like malloc, so there's no need for a preliminary malloc call.
    Code:
    save = realloc(ret, ret_size + BUFSIZ + 1);
    if (save == NULL)
      return ret;
    ret = save;
    The copy part is equally easy once you figure out how to copy on the first go around and concatenate every time after. The quick and dirty solution is just to test and call either strcpy or strcat. ret_size can't be modified until after this test though, so the increment is a part of the copy code.
    Code:
    ret_size == 0 ? strcpy(ret, buf) : strcat(ret, buf);
    ret_size += BUFSIZ;
    Lastly, check to see if the buffer had a newline. If it did then to meet one of the requirements we need to turn it into a null character before breaking from the loop and trying to shrink wrap. There's a slight performance penalty here because strchr returns a pointer to within the string. If we used strchr and buf then the newline wouldn't be removed in ret, so ret must be used with strchr and ret might be long.
    Code:
    save = strchr(ret, '\n');
    if (save != NULL) {
      *save = '\0';
      break;
    }
    Here's the finished code:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    char *readline(FILE *in);
    
    int main(void)
    {
      char *p = readline(stdin);
    
      if (p != NULL) {
        printf("[%s]\n", p);
        free(p);
      }
    
      return 0;
    }
    
    char *readline(FILE *in)
    {
      char *ret = NULL, *save;
      char buf[BUFSIZ];
      int ret_size = 0;
    
      while (fgets(buf, sizeof buf, in) != NULL) {
        /* Resize */
        save = realloc(ret, ret_size + BUFSIZ + 1);
        if (save == NULL)
          return ret;
        ret = save;
    
        /* Copy */
        ret_size == 0 ? strcpy(ret, buf) : strcat(ret, buf);
        ret_size += BUFSIZ;
    
        /* Are we done? */
        save = strchr(ret, '\n');
        if (save != NULL) {
          *save = '\0';
          break;
        }
      }
    
      /* Shrink wrap if possible */
      if (ret != NULL) {
        save = realloc(ret, strlen(ret) + 1);
        if (save != NULL)
          ret = save;
      }
    
      return ret;
    }
    Compare and contrast.
    My best code is written with the delete key.

  3. #3
    ---
    Join Date
    May 2004
    Posts
    1,379
    Wow, Prelude, you managed to cut that down pretty well.

  4. #4
    Registered User
    Join Date
    Jan 2002
    Location
    Vancouver
    Posts
    2,212
    I'm never letting prelude anywhere near my code. I'm too scared.

  5. #5
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    Quote Originally Posted by Prelude
    >newsize += (BUFSIZ * sizeof(char));
    sizeof(char) is guaranteed to be 1, so you can omit that part of the expression.
    Er, what about Unicode?
    Please don't hurt me.

  6. #6
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >Er, what about Unicode?
    If you're working with Unicode, then you shouldn't be using char in the first place. But to answer your question (sort of), sizeof(char) must be 1, regardless of how char is represented. For example: char, short, and int could all be the same size (say, 4 bytes) and sizeof(char) would still evaluate to 1 because it's a hard limit specified by the C standard.
    My best code is written with the delete key.

  7. #7
    Registered User
    Join Date
    Sep 2004
    Posts
    44
    Prelude,

    Wow, you're good. All good points and all points taken. I'll work on it some more.

    [edit]
    Your 'readline' function works very well. So far, it seems to be pretty 'bullet-proof'.
    [edit]
    Last edited by n7yap; 12-12-2004 at 02:28 PM.

  8. #8
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    Just for grins (and because I'm bored), here's readline with a little more effort put into it:
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    char *readline(FILE *in);
    
    int main(void)
    {
      char *p = readline(stdin);
    
      if (p != NULL) {
        printf("[%s]\n", p);
        free(p);
      }
    
      return 0;
    }
    
    char *readline(FILE *in)
    {
      char buf[BUFSIZ];
      char *save, *ret = NULL;
      size_t len, ret_size = 0;
    
      while (fgets(buf, sizeof buf, in) != NULL) {
        /* Resize */
        len = strlen(buf);
        save = realloc(ret, ret_size + len + 1);
        if (save == NULL)
          break;
    
        /* Copy */
        ret = save;
        ret_size == 0 ? strcpy(ret, buf) : strcat(ret, buf);
        ret_size += len;
    
        /* Are we done? */
        if (ret[ret_size - 1] == '\n') {
          ret[ret_size - 1] = '\0';
          break;
        }
      }
    
      return ret;
    }
    My best code is written with the delete key.

  9. #9
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    I don't like strcat() and it's really unncessary here:
    Code:
    ret_size == 0 ? strcpy(ret, buf) : strcat(ret, buf);
    You can save having to loop through ret again by using strcpy() like this:
    Code:
    ret_size == 0 ? strcpy(ret, buf) : strcpy(ret + ret_size, buf);
    Or replace the whole line with a simple:
    Code:
    strcpy(ret + ret_size, buf);
    Just a simple performance gain. But it also makes the code look cleaner I think.
    If you understand what you're doing, you're not learning anything.

  10. #10
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    I'm uncertain of how much work you are intending to put into this venture, but it seems to me you are merely trying to get a line from a file without buffer overflow issues. Would it not be prudent to make a function that uses lower level calls as a replacement for fgets() over a wrapper that incorporates it?

  11. #11
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >But it also makes the code look cleaner I think.
    Cleaner, but not necessarily clearer. My primary goal was to make the operation blatantly obvious. If I had bothered to optimize then my next pass probably would have made a similar change.

    >Would it not be prudent to make a function that uses lower level calls
    It depends on the needs of the function. Since we write portable code here, fgets was the function of choice. Also, if you use a lower level system call then you may not get the same semantics for error and EOF as fgets, and you would need to write them yourself, thus making the function more complicated.
    My best code is written with the delete key.

  12. #12
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    >Also, if you use a lower level system call then you may not get the same semantics for error and EOF as fgets, and you would need to write them yourself, thus making the function more complicated.

    Along those lines I also question reinventing the wheel. More buffer friendly functions do exist. I'm not too sure what operating system/compiler n7yap uses, but gcc's libraries do include buffer friendly (compiler specific) functions. Also, the newer msvc++ has m$'s own buffer safe functions (the standard is begging for something to bridge this gap).

    >Cleaner, but not necessarily clearer.

    Debatable, on this one I'm going to just go with Prelude is an older and more experienced programmer, she probably has had more instances of a co-worker asking "huh?" But if my opinion matters, I think the more optimized code is clear enough. I also dislike what you put since the compiler can't assume anything on that one and probably won't optimize it down as much as it would the alternative. But the point is moot since as Prelude said, this is an example showing a point, not "the most blazing fast, buffer friendly version of fgets EVER!"

  13. #13
    Registered User
    Join Date
    Sep 2004
    Posts
    44
    I am using MS Visual C++ 2003. I am aware of the compiler specific functions in 'strsafe.h'. I can just imagine the response if I posted code using those functions on this board. The idea was to use ANSI/ISO standard C only. Personally, I don't have a problem with using compiler specific extensions. But on this board it is probably best to stick with ANSI/ISO C. As you can tell from the code I posted am not an expert in C, I am in the process of improving my C skills.

  14. #14
    Banned master5001's Avatar
    Join Date
    Aug 2001
    Location
    Visalia, CA, USA
    Posts
    3,685
    I port a lot of code using compiler specific functions. Since its usually from GCC to MSVC++ (or vice versa) I'll just make some of those evil headers that define a function name based on what OS is being used. Usually the "safe" functions have the same arguments but different names so I most typically just define the GNU name to the M$ function. That usually works out just fine. Plus if it catches on, the GNU functions will make the standard and not the M$ ones (the way it usually tends to go). That said, what the hell is up with having fopen_s()??? Damn you Microsoft! Can't you leave well enough alone!

  15. #15
    Registered User
    Join Date
    Sep 2004
    Posts
    44
    I tried out Visual C++ 2005 Express Edition beta. I was surprised when I got compiler warnings for using standard CRT functions. Microsoft re-wrote a bunch of CRT functions to be more secure. They deprecated the old ones. These functions are in addition to the new 'strsafe.h' functions in Visual C++ 2003. In a way it's a good thing, but they are straying further from the standards. They submitted the new functions to the standards committee, but I'm not sure if they would ever become standard. Here is an article about it:

    http://msdn.microsoft.com/library/de...re03102004.asp

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. problem with fgets
    By learninC in forum C Programming
    Replies: 3
    Last Post: 05-19-2005, 08:10 AM
  2. problem with fgets
    By Smoot in forum C Programming
    Replies: 4
    Last Post: 12-07-2003, 03:35 AM
  3. fgets crashing my program
    By EvBladeRunnervE in forum C++ Programming
    Replies: 7
    Last Post: 08-11-2003, 12:08 PM
  4. pic movement
    By pode in forum Game Programming
    Replies: 31
    Last Post: 08-21-2002, 09:30 PM
  5. Allegro help-o
    By Vicious in forum Game Programming
    Replies: 109
    Last Post: 06-11-2002, 08:38 PM