Thread: mygets

  1. #1
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005

    mygets

    A common problem that arises is obtaining text from user input. I know this is addressed in the FAQ. But a couple of recent threads prompted me to revisit this a little. With that, here are some thoughts.
    • The standard library function scanf is more complex than many want to realize. (I like Chris Torek's lengthy explanations; here is one example.)
    • The standard library function gets is notorious for being unsafe. A very real issue with it is buffer overflow error.
    • The standard library function fgets is safe but can be annoying due to the fact that it may retain the new-line character in the string.
    Inspired by libclc (but preferring this forum), I'd like to implement a function that is a "bulletproof" version of gets. I also think it makes a good discussion of things that can go wrong in this "simple" process of obtaining text from the user.

    Here is some code to start with.
    Code:
    /*
     * Description:
     *    Define the functions mygets, a safe version of gets, and myprompt.
     *    Test the functions when the macro TEST is defined.
     */
    
    #include <stdio.h>
    #include <string.h>
    
    /*
     * Synopsis:
     *    char *mygets(char *buffer, int size);
     *
     * Description:
     *    The mygets function reads at most one less than the number of 
     *    characters specified by 'size' from the input stream pointed to by
     *    stdin, into the array pointed to by 'buffer'. No additional characters
     *    are read after a new-line character or after end-of-file. A null 
     *    character is written immediately after the last character read into
     *    the array. If present, the new-line character is overwritten with a
     *    null character.
     *
     * Returns:
     *    The mygets function returns 'buffer' if successful. If end-of-file is
     *    encountered and no characters have been read into the array, the
     *    contents of the array remain unchanged and a null pointer is returned.
     *    If a read error occurs during the operation, the array contents are
     *    indeterminate and a null pointer is returned.
     *
     * Remarks:
     *    The standard library function gets is notorious for being unsafe.
     *    Because you cannot specify the size of the buffer into which the data
     *    will be placed, it can never be used safely. The standard library
     *    function fgets, on the other hand, is safe but can be annoying due
     *    to the fact that it retains the new-line character when fewer than
     *    'size' - 1 characters are obtained. This function attempts to combine
     *    safety with convenience.
     */
    char *mygets(char *buffer, int size)
    {
       /*
        * Ensure that we won't use a null pointer with standard functions.
        */
       if ( buffer != NULL )
       {
          /*
           * Use fgets to obtain text from the stdin and put it in 'buffer'.
           * It stops reading when it reads either 'size' - 1 characters
           * or a newline character '\n', whichever comes first. It retains
           * the newline character, if present.
           */
          if ( fgets(buffer, size, stdin) != NULL )
          {
             /*
              * If a new-line character is present in 'buffer', replace it with
              * a null character. Use the function strchr to search for the
              * new-line character. (There are other methods like strtok.)
              */
             char *newline = strchr(buffer, '\n');
             if ( newline != NULL ) /* fewer than 'size' - 1 characters */
             {
                /*
                 * A new-line character was found, and is located where
                 * 'newline' points. Replace the new-line character '\n'
                 * with a null character '\0'.
                 */
                *newline = '\0';
                /*
                 * Note: Since a new-line character was found, fewer than
                 * 'size' - 1 characters were present in the stdin.
                 */
             }
             else /* No new-line characters was found */
             {
                /*
                 * Since a new-line character was not found, 'size' or more
                 * characters (too much for 'buffer') were present in the stdin.
                 */
                for ( ;; )
                {
                   /*
                    * Consume the 'extra' characters remaining in the stdin.
                    * This is done by reading the character and discarding
                    * it (by not placing it in 'buffer' which is already full).
                    */
                   int ch = getchar();
                   if ( ch == EOF )
                   {
                      /*
                       * End-of-file was encountered or an error occurred.
                       * Live up to our promise to return a null pointer.
                       */
                       return NULL;
                   }
                   if ( ch == '\n' )
                   {
                      /*
                       * The new-line character was found. Stop consuming
                       * characters from the stdin.
                       */
                      break;
                   }
                }
             }
          }
          else
          {
             /*
              * End-of-file was encountered or an error occurred with fgets.
              * Live up to our promise to return a null pointer.
              */
             return NULL;
          }
       }
       /*
        * Return a pointer to 'buffer' as promised. If a null pointer was
        * passed to this function, it will be returned.
        */
       return buffer;
    }
    
    /*
     * Synopsis:
     *    int myprompt(const char *prompt);
     *
     * Description:
     *    The myprompt function writes the string pointed to by 'prompt' to the
     *    output stream pointed to by stdout. The terminating null character is
     *    not written. The output stream pointed to by stdout is flushed, which
     *    causes any unwritten data for that stream to be delivered to the host
     *    environment to be written to the stdout.
     *
     * Returns:
     *    The myprompt function returns EOF if a write error occurs, otherwise
     *    it returns zero. If an error occurs attempting to flush the stream
     *    pointed to by stdout, the error indicator for the stdout is set.
     *
     * Remarks:
     *    When prompting for user input, flushing the stdout encourages the host
     *    to display the prompt. The function myprompt combines presenting the
     *    prompt text with flushing the output buffer.
     */
    int myprompt(const char *prompt)
    {
       /*
        * Ensure that we won't use a null pointer with standard functions.
        */
       if ( prompt != NULL )
       {
          /*
           * Write the string pointed to by 'prompt' to the stdout.
           */
          if ( fputs(prompt, stdout) < 0 )
          {
             return EOF;
          }
          /*
           * Flush the stdout.
           */
          if ( fflush(stdout) < 0 )
          {
             return EOF;
          }
       }
       return 0;
    }
    
    #define TEST
    #if defined(TEST)
    /*
     * Description:
     *    Test the functions mygets and myprompt defined in this module.
     */
    int main(void)
    {
       char text[10] = {0};
       while ( *text != '0' )
       {
          if ( myprompt("Enter text (\"0\" to exit): ") != EOF )
          {
             if ( mygets(text, sizeof(text)) != NULL )
             {
                printf("text = \"%s\"\n", text);
             }
             else
             {
                break;
             }
          }
       }
       return 0;
    }
    
    /* my output
    C:\Test>test
    Enter text ("0" to exit): here is my text
    text = "here is m"
    Enter text ("0" to exit): okay
    text = "okay"
    Enter text ("0" to exit): 1234567890
    text = "123456789"
    Enter text ("0" to exit): 123456789
    text = "123456789"
    Enter text ("0" to exit): 12345678
    text = "12345678"
    Enter text ("0" to exit): 0
    text = "0"
    
    C:\Test>test
    Enter text ("0" to exit): okay^Z..
    
    C:\Test>test
    Enter text ("0" to exit): here is some text^Z...
    
    C:\Test>
    */
    #endif
    What are recommendations for improvement?
    • Is there anything I might have overlooked?
    • What might need to be added?
    • What could be removed?
    • Are there any other comments?
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  2. #2
    Registered User Vber's Avatar
    Join Date
    Nov 2002
    Posts
    807
    Hey Dave, nice comments, it explains very well the function, but, what's the problem with fgets(), its a pretty good function, and if the problem is removing \n when possible, you can make this:
    Code:
    char *mygets (char *buf, int size) {
    	char *p;
    	
    	fgets(buf,size,stdin);
    	p = strchr(buf,'\n');
    	if(p) *p = '\0';
    	return buf;
    }

  3. #3
    Un Artiste Extraordinaire volk's Avatar
    Join Date
    Dec 2002
    Posts
    357
    Originally posted by vVv
    Two points.

    1: No offense, but is this a joke? Why do you open a new thread for a trivial snippet like this that I'm sure almost everyone around here has written a dozen times already? If you just want to see it in the FAQ, you could have PM'd Hammer.

    2: Since it's your own function, why retain the meaningless standard library return values? You could return the number of bytes read on success and EOF for an error or EOF.
    You sure told him. Dave is a little hurt right now, but he'll recover.

  4. #4
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >/*
    >* Consume the 'extra' characters remaining in the stdin.
    >* This is done by reading the character and discarding
    >* it (by not placing it in 'buffer' which is already full).
    >*/
    I'm not sure I like this part. If you create the library ad hoc then it's okay to assume that the one using the library wants something that in other situations may be undesirable. Generic libraries cannot be that careless, they have to assume that any further input would be needed and may not be modified (such decisions are the job of the one using the library). In this case your mygets function reads not only everything that it can into the buffer, it also reads and discards whatever is left in the input stream. What if I wanted to use your mygets to write a function that reads arbitrarily long lines (which happens to be a common occurance, I've done it many times)? I would find the functionality you support to be stifling and would turn to other libraries that better suit my purpose.

    Note that checking for a null argument may be overkill, you have to consider how much your users have to be protected from themselves at the cost of performance. Also, the newline is a valuable error checking tool, if you remove it and still return a null pointer, the user has no clue whether a recoverable error occurred, or everything is going to die. vVv's suggestion of returning a more useful value is a good idea, perhaps a set of constants that mark end of file, read error, success, empty line, leftover input, etc...

    -Prelude
    My best code is written with the delete key.

  5. #5
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    >>if ( ch == EOF )
    I definately don't like the idea of a line being considered invalid, and ignored if a newline character is not returned within the fgets() call. (this has probably been covered, but this is my 2-cents worth). A problem would is very visible if you redirect input into your program from a file. The last line is simply ingored. Example usage:

    c:\>davesprog.exe <file_with_three_lines.txt
    Enter text ("0" to exit): text = "line 1"
    Enter text ("0" to exit): text = "line 2"
    Enter text ("0" to exit):

    Good effort though, don't be put off by negative comments
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  6. #6
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    but, what's the problem with fgets(), its a pretty good function, and if the problem is removing \n when possible, you can make this:
    Code:
    char *mygets (char *buf, int size) {
    	char *p;
    	
    	fgets(buf,size,stdin);
    	p = strchr(buf,'\n');
    	if(p) *p = '\0';
    	return buf;
    }
    Because if this was the function and it was done more than once, and you entered a string that exceeded the buffer size, you would get something like this.
    Code:
    Enter text ("0" to exit): here is my text
    text = "here is m"
    Enter text ("0" to exit): text = "y text"
    Enter text ("0" to exit): 0
    text = "0"
    -----
    1: No offense, but is this a joke?
    No, no joke. If we took the code from the FAQ and made a function out of it, we'd have code similar to the above. Then to solve that problem, we might respond, "flush the input buffer". And I'm sure we could search and find some code that looks like this.
    Code:
    while ((ch = getchar()) != '\n' && ch != EOF);
    And we'd add this to the 'exceeded buffer size' condition. But being a function, maybe we'd return a value that tells us if an error occurred. That is essentially what I posted, with long-winded and standard-speak comments.
    If you just want to see it in the FAQ, you could have PM'd Hammer.
    If I thought it was a complete solution, I might have. But it was what is was. A start, with questions and a solicitation for suggestions. (And I'm pleased with the thoughtful responses so far.)
    2: Since it's your own function, why retain the meaningless standard library return values? You could return the number of bytes read on success and EOF for an error or EOF.
    >/*
    >* Consume the 'extra' characters remaining in the stdin.
    >* This is done by reading the character and discarding
    >* it (by not placing it in 'buffer' which is already full).
    >*/
    I'm not sure I like this part. [...] assume that any further input would be needed and may not be modified [...]
    Also, the newline is a valuable error checking tool, if you remove it and still return a null pointer, the user has no clue whether a recoverable error occurred, or everything is going to die. vVv's suggestion of returning a more useful value is a good idea, perhaps a set of constants that mark end of file, read error, success, empty line, leftover input, etc...
    What if I wanted to use your mygets to write a function that reads arbitrarily long lines (which happens to be a common occurance, I've done it many times)? [...]
    Then I might need to do another iteration of this function! Issues like these are what I wanted to hear. Thanks.

    "Trivial" as this may be, I don't work with hosted systems 'for real', so I have not handled such "common occurances" "dozens of times". And I am meaning no offense either. I'm just trying to learn the language better. And others stand a chance to learn something new as well.
    Note that checking for a null argument may be overkill, you have to consider how much your users have to be protected from themselves at the cost of performance.
    I considered making the null check an assert, but if the function stops the system to wait for user input, I wouldn't think performance was a big issue.
    I definately don't like the idea of a line being considered invalid, and ignored if a newline character is not returned within the fgets() call. (this has probably been covered, but this is my 2-cents worth). A problem would is very visibible if you redirect input into your program from a file. The last line is simply ingored. Example usage:

    c:\>davesprog.exe <file_with_three_lines.txt
    Enter text ("0" to exit): text = "line 1"
    Enter text ("0" to exit): text = "line 2"
    Enter text ("0" to exit):
    And another glaring problem I overlooked. Thanks for pointing it out.
    don't be put off by negative comments
    Where?
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  7. #7
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    Just for fun, here's another version. Sorry, I just couldn't bring myself to write as many comments as Dave

    Code:
    #include <stdio.h>
    #include <string.h>
    
    enum RC { RC_END = -1, RC_COMPLETE_LINE, RC_MAYBE_MORE_TO_COME };
    
    enum RC mygets (FILE *fp, char buf[], size_t buflen, size_t *len)
    {
      /*
       *  Parameters
       *    fp      Input file stream pointer, assumed to be open.
       *    buf     Buffer to store the incoming data. Must not be NULL
       *    buflen  Length of the buffer.
       *    len     Pointer where number of bytes read will be stored.
       *            This number includes the trailing \0
       */
       
      enum RC rc;
      char *p;
      
      if (fgets(buf, buflen, fp) != NULL)
      {
        if ((p = strchr(buf, '\n')) != NULL)
        {
          *p = '\0';
          rc = RC_COMPLETE_LINE;
        }
        else rc = RC_MAYBE_MORE_TO_COME;
      }
      else
      {
        rc = RC_END;
        buf[0] = '\0';
      }
      
      if (len != NULL)
      {
        *len = strlen(buf) + 1;
      }
      
      return rc;
      
    }
    
    int main(void)
    {
      char buf[10];
      size_t len;
      enum RC rc;
      
      while ((rc = mygets(stdin, buf, sizeof(buf), &len)) != RC_END)
      {
        printf ("rc:%d len:%lu buf:>%s<\n", rc, (unsigned long)len, buf);
      }
      
      return(0);
    }
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

Popular pages Recent additions subscribe to a feed