Thread: Scanning new line

  1. #16
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by phantomotap View Post
    If you are going to try to recover gracefully in the face of "user error" nothing shown so far is sufficient.
    The suggestion by Adak causes a buffer underrun bug. It reads a byte just before the buffer, and if that byte equals 10, it replaces it with a zero.

    My suggestion does not have that bug. I would also be happy with
    Code:
    int len = strlen(line);
    if (len > 0 && line[len - 1] == '\n')
        line[--len] = '\0';
    which, due to the short circuit evaluation rules in C, does not try to access the byte just before the buffer, either. And the len ends up with the correct value in all cases (even when the line was longer than the available buffer and was only partially read).

    Quote Originally Posted by phantomotap View Post
    The `line[strlen(line)]' character will never be a newline character unless something goes horribly wrong.
    Right. It will always be a NUL character. It is safe to replace it with another NUL character.

    (.. assuming line != NULL. Which should be checked right after the fgets() call anyway.)

    Quote Originally Posted by phantomotap View Post
    All joking aside, the code forwarded by you will not get rid of the newline in the "\0bad input\n" form of string so it is just as likely to cause a serious problem at a later point.
    My code does not try to access the char just before the buffer, though.

    Given that input, my code yields *line == '\0' (and strlen(line) == 0 ), i.e. and empty string, but nothing bad happens.

    To be honest, I did first write the proper solution, using POSIX.1-2008 getline(), but I thought it would either scare the OP, or I'd just get some negative feedback about it, so I rewrote my reply before posting it.
    Code:
    char    *data = NULL;
    size_t   size = 0;
    
    char    *line;
    size_t   len;
    ssize_t full;
    
    /* Some form of input loop .. */
    while (1) {
    
        full = getline(&data, &size, file handle);
        if (full < (ssize_t)1)
            end of file or error, abort/break
    
        /* Input sanitization */
        if (!clean(data, (size_t)full, &line, &len))
            input is suspect, ignore/warn/abort
    
        /* Ignore empty lines (and comments, if cleaned) */
        if (len < 1)
            continue;
    
        /* Now have len chars at line. */
    
    }
    
    free(data);
    data = NULL;
    size = 0;
    The input sanitization function,
    Code:
    int clean(char *const input, const size_t inputlen, char **const result, size_t *const resultlen);
    takes the size bytes of input in input, removes unwanted characters or at least checks the string, saves the start of the contents and length of the contents in the last two pointers supplied by the user unless NULL, and returns 0 if the input was safe, and nonzero otherwise.

    The implementation details are very domain-specific. You might want to simply remove all embedded NUL bytes and control characters, and replace consecutive whitespace with a single space; this works well with semi-interactive human input, say a game. In other cases you might wish to replace NULs and (some) other ASCII control characters with escape codes (for example, ASCII NULs with "\\0" or ("&" "#0;"), the numeric HTML entity reference for ASCII NUL character).

    If there is interest, I'd be happy to show a couple of practical examples.
    Last edited by Nominal Animal; 10-11-2012 at 10:29 PM.

  2. #17
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Right. It will always be a NUL character. It is safe to replace it with another NUL character.
    You misunderstand.

    The line of code I was referring to was the bug in the `while' condition.

    Code:
    while (len > 0 && (line[len] == '\n' || line[len] == '\r'))
    My code does not try to access the char just before the buffer, though.
    I didn't say it did.

    I just said that what you were implying about your code being sufficient in the face of embedded nulls was wrong.

    Soma

  3. #18
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by phantomotap View Post
    You misunderstand.
    I did! :headpalm:

    I apologize. It is a typo; the code should of course be
    Code:
    size_t  len;
     
    len = strlen(line);
    while (len > 0 && (line[len-1] == '\n' || line[len-1] == '\r'))
        len--;
    line[len] = '\0';
    Big thanks for noticing it.

    Hmm.. I don't see how to edit the post. It might be good to fix the post, so nobody gets tripped by it.

  4. #19
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    I don't see how to edit the post. It might be good to fix the post, so nobody gets tripped by it.
    O_o

    I know that the forum software prevents edits after a certain time.

    I don't know what that time is, but it has probably passed.

    *shrug*

    I should have been more specific in the first pass.

    Soma

  5. #20
    Registered User
    Join Date
    Mar 2010
    Location
    Australia
    Posts
    174
    Actually, I much prefer the simpler methods rather than using functions that haven't been mentioned in class.

    Quote Originally Posted by Adak View Post
    If you use fgets(), you can only have one position for the newline: strlen() - 1. It can't be located anywhere else in the string.

    Code:
    //if line is the char array name:
    int len = strlen(line)-1;
    if(line[len]=='\n'
       line[len]='\0';
    I don't know why I didn't think of this. I instead had

    Code:
    for (i=0; i<CMDLENGTH && command[i+1] != '\0'; i++);
            if (command[i] == '\n' && i != 0) {
                command[i] = '\0';
            }
    But yours is much simpler. Thanks!

    Oh and I've already set my string length limits to be much larger than user input should be, plus our professor specifically told us he doesn't want us spending unnecesssary amounts of time concocting crazy scenarios that could break our program and trying to defend against that.
    Last edited by Mentallic; 10-12-2012 at 01:16 AM.

  6. #21
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Err... Adak's solution is wrong. If you want to stick to strlen, then it should be:
    Code:
    size_t len = strlen(line);
    if (len > 0 && line[len - 1] == '\n')
    {
        --len;
        line[len] = '\0';
    }
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #22
    Registered User
    Join Date
    Mar 2010
    Location
    Australia
    Posts
    174
    I realized that, but I only needed the concept

    Now that I've tried adding it to a function, I'm getting an error that has to do with arrays vs pointers:

    Code:
    char command[CMDLENGTH] = {};
            command = removeNewLine(command);
    
    char *removeNewLine(char *command) {
        int length = strlen(command)-1;
        
        if(command[length] == '\n' && length != 0) {
            command[length]='\0';
        }
        
        return command;
    }
    Is giving me the error

    error: incompatible types when assigning to type ‘char[102]’ from type ‘char *’
    It's been a while since I've used arrays, so I've clearly forgotten how to pass and return arrays as opposed to strings.

    EDIT:
    Quote Originally Posted by laserlight View Post
    Err... Adak's solution is wrong. If you want to stick to strlen, then it should be:
    Code:
    size_t len = strlen(line);
    if (len > 0 && line[len - 1] == '\n')
    {
        --len;
        line[len] = '\0';
    }
    The if statement should test len > 1 should'nt it? If len = 1 then the user input must've only been a new line, which I want to keep.
    Last edited by Mentallic; 10-12-2012 at 01:35 AM.

  8. #23
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    You cannot assign to an array. In this case, it will suffice to call:
    Code:
    removeNewLine(command);
    By the way, I did not subtract the 1 immediately from the result of strlen in case strlen is 0. Then, I did the check for > 0 before I accessed line[len - 1]. Granted, if the line comes from fgets within the constraints of your classroom environment, you will not need to be this paranoid, but you might as well be.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  9. #24
    Registered User
    Join Date
    Mar 2010
    Location
    Australia
    Posts
    174
    Quote Originally Posted by laserlight View Post
    You cannot assign to an array. In this case, it will suffice to call:
    Code:
    removeNewLine(command);
    Ahh thanks

    Quote Originally Posted by laserlight View Post
    By the way, I did not subtract the 1 immediately from the result of strlen in case strlen is 0. Then, I did the check for > 0 before I accessed line[len - 1]. Granted, if the line comes from fgets within the constraints of your classroom environment, you will not need to be this paranoid, but you might as well be.
    Before calling the function removeNewLine, I already check that the return value of fgets is not NULL, so then strlen can't be 0 from what I can see.

  10. #25
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Mentallic
    Before calling the function removeNewLine, I already check that the return value of fgets is not NULL, so then strlen can't be 0 from what I can see.
    You could read Nominal Animal's post #9 for a weird situation where strlen can still return 0 after reading with fgets. However, that aside, consider: separate from the issue of fgets, someone reading your code would ask if the removeNewLine function works correctly on an empty string. With your implementation, it does not. So you should either document that fact (e.g., with a comment, or maybe even an assertion), or just handle empty strings gracefully.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  11. #26
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,613
    int length = strlen(command)-1;
    if(command[length] == '\n' && length != 0)

    Forgive me for not posting this in the original context but this is still exhibiting out of bounds access. The undefined behavior can only be mitigated if the length != -1 check is performed first.

    Order of execution always matters.

  12. #27
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by whiteflags
    The undefined behavior can only be mitigated if the length != -1 check is performed first.
    It is a little more complicated than that, methinks: strlen returns a size_t, which is an unsigned integer type, so if the return value is 0, then the subtraction would result in a large positive number. This is then converted to int, the conversion being implementation defined. Assuming that we do arrive at -1 in the end, the check that Mentallic wrote compares with 0, not -1, so it is wrong anyway.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #28
    Registered User
    Join Date
    Mar 2010
    Location
    Australia
    Posts
    174
    Quote Originally Posted by laserlight View Post
    You could read Nominal Animal's post #9 for a weird situation where strlen can still return 0 after reading with fgets. However, that aside, consider: separate from the issue of fgets, someone reading your code would ask if the removeNewLine function works correctly on an empty string. With your implementation, it does not. So you should either document that fact (e.g., with a comment, or maybe even an assertion), or just handle empty strings gracefully.
    You make a good point.

    Quote Originally Posted by whiteflags View Post
    int length = strlen(command)-1;
    if(command[length] == '\n' && length != 0)

    Forgive me for not posting this in the original context but this is still exhibiting out of bounds access. The undefined behavior can only be mitigated if the length != -1 check is performed first.

    Order of execution always matters.
    I'll change my code to

    Code:
    int length = strlen(command)-1;
    
    if (length > 0 && command[length] == '\n')
    which looks like it'll handle NULL strings.

  14. #29
    Registered User
    Join Date
    Mar 2010
    Location
    Australia
    Posts
    174
    Quote Originally Posted by laserlight View Post
    It is a little more complicated than that, methinks: strlen returns a size_t, which is an unsigned integer type, so if the return value is 0, then the subtraction would result in a large positive number. This is then converted to int, the conversion being implementation defined. Assuming that we do arrive at -1 in the end, the check that Mentallic wrote compares with 0, not -1, so it is wrong anyway.
    Unsigned ints do that? Oh boy...

    Ok, I'm going to leave strlen as it is then.

    Code:
        int length = strlen(command);
        
        if (length > 1 && command[length-1] == '\n') {
            command[length-1]='\0';
        }

  15. #30
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Yep. Why not just make length a size_t? Also, note that if you are replacing the last character of a string with a null character, then you have effectively reduced the length by 1. Thus, if you want to use the length variable later on, then for correctness you should subtract 1 from it too, hence the decrement in my example in post #21. In your case, you are writing it in a function that will not use length afterwards, so it is arguably okay.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 6
    Last Post: 06-07-2012, 02:50 AM
  2. Replies: 3
    Last Post: 04-29-2011, 01:02 PM
  3. Replies: 7
    Last Post: 12-13-2010, 02:13 PM
  4. scanning till new line
    By dpp in forum C++ Programming
    Replies: 2
    Last Post: 06-29-2009, 10:31 PM
  5. Scanning a line in a document
    By bc120 in forum C Programming
    Replies: 5
    Last Post: 01-02-2002, 02:06 PM