Scanning new line

Printable View

Show 80 post(s) from this thread on one page

10-11-2012
Nominal Animal

Quote:

Originally Posted by phantomotap

If you are going to try to recover gracefully in the face of "user error" nothing shown so far is sufficient.

The suggestion by Adak causes a buffer underrun bug. It reads a byte just before the buffer, and if that byte equals 10, it replaces it with a zero.

My suggestion does not have that bug. I would also be happy with

Code:

int len = strlen(line); if (len > 0 && line[len - 1] == '\n') line[--len] = '\0';

which, due to the short circuit evaluation rules in C, does not try to access the byte just before the buffer, either. And the len ends up with the correct value in all cases (even when the line was longer than the available buffer and was only partially read).

Quote:

Originally Posted by phantomotap

The `line[strlen(line)]' character will never be a newline character unless something goes horribly wrong.

Right. It will always be a NUL character. It is safe to replace it with another NUL character.

(.. assuming line != NULL. Which should be checked right after the fgets() call anyway.)

Quote:

Originally Posted by phantomotap

All joking aside, the code forwarded by you will not get rid of the newline in the "\0bad input\n" form of string so it is just as likely to cause a serious problem at a later point.

My code does not try to access the char just before the buffer, though.

Given that input, my code yields *line == '\0' (and strlen(line) == 0 ), i.e. and empty string, but nothing bad happens.

To be honest, I did first write the proper solution, using POSIX.1-2008 getline(), but I thought it would either scare the OP, or I'd just get some negative feedback about it, so I rewrote my reply before posting it.

Code:

char *data = NULL; size_t size = 0; char *line; size_t len; ssize_t full; /* Some form of input loop .. */ while (1) { full = getline(&data, &size, file handle); if (full < (ssize_t)1) end of file or error, abort/break /* Input sanitization */ if (!clean(data, (size_t)full, &line, &len)) input is suspect, ignore/warn/abort /* Ignore empty lines (and comments, if cleaned) */ if (len < 1) continue; /* Now have len chars at line. */ } free(data); data = NULL; size = 0;

The input sanitization function,

Code:

int clean(char *const input, const size_t inputlen, char **const result, size_t *const resultlen);

takes the size bytes of input in input, removes unwanted characters or at least checks the string, saves the start of the contents and length of the contents in the last two pointers supplied by the user unless NULL, and returns 0 if the input was safe, and nonzero otherwise.

The implementation details are very domain-specific. You might want to simply remove all embedded NUL bytes and control characters, and replace consecutive whitespace with a single space; this works well with semi-interactive human input, say a game. In other cases you might wish to replace NULs and (some) other ASCII control characters with escape codes (for example, ASCII NULs with "\\0" or ("&" "#0;"), the numeric HTML entity reference for ASCII NUL character).

If there is interest, I'd be happy to show a couple of practical examples.
10-11-2012
phantomotap

Quote:

Right. It will always be a NUL character. It is safe to replace it with another NUL character.

You misunderstand.

The line of code I was referring to was the bug in the `while' condition.

Code:

while (len > 0 && (line[len] == '\n' || line[len] == '\r'))

Quote:

My code does not try to access the char just before the buffer, though.

I didn't say it did.

I just said that what you were implying about your code being sufficient in the face of embedded nulls was wrong.

Soma
10-11-2012
Nominal Animal

Quote:

Originally Posted by phantomotap

You misunderstand.

I did! :headpalm:

I apologize. It is a typo; the code should of course be

Code:

size_t len; len = strlen(line); while (len > 0 && (line[len-1] == '\n' || line[len-1] == '\r')) len--; line[len] = '\0';

Big thanks for noticing it.

Hmm.. I don't see how to edit the post. It might be good to fix the post, so nobody gets tripped by it.
10-12-2012
phantomotap

Quote:

I don't see how to edit the post. It might be good to fix the post, so nobody gets tripped by it.

O_o

I know that the forum software prevents edits after a certain time.

I don't know what that time is, but it has probably passed.

*shrug*

I should have been more specific in the first pass.

Soma
10-12-2012
Mentallic

Actually, I much prefer the simpler methods rather than using functions that haven't been mentioned in class.

Quote:

Originally Posted by Adak

If you use fgets(), you can only have one position for the newline: strlen() - 1. It can't be located anywhere else in the string.

Code:

//if line is the char array name: int len = strlen(line)-1; if(line[len]=='\n' line[len]='\0';

I don't know why I didn't think of this. I instead had

Code:

for (i=0; i<CMDLENGTH && command[i+1] != '\0'; i++); if (command[i] == '\n' && i != 0) { command[i] = '\0'; }

But yours is much simpler. Thanks!

Oh and I've already set my string length limits to be much larger than user input should be, plus our professor specifically told us he doesn't want us spending unnecesssary amounts of time concocting crazy scenarios that could break our program and trying to defend against that.
10-12-2012
laserlight

Err... Adak's solution is wrong. If you want to stick to strlen, then it should be:

Code:

size_t len = strlen(line); if (len > 0 && line[len - 1] == '\n') { --len; line[len] = '\0'; }
10-12-2012
Mentallic

I realized that, but I only needed the concept :)

Now that I've tried adding it to a function, I'm getting an error that has to do with arrays vs pointers:

Code:

char command[CMDLENGTH] = {}; command = removeNewLine(command); char *removeNewLine(char *command) { int length = strlen(command)-1; if(command[length] == '\n' && length != 0) { command[length]='\0'; } return command; }

Is giving me the error

Quote:

error: incompatible types when assigning to type ‘char[102]’ from type ‘char *’

It's been a while since I've used arrays, so I've clearly forgotten how to pass and return arrays as opposed to strings.

EDIT:

Quote:

Originally Posted by laserlight

Err... Adak's solution is wrong. If you want to stick to strlen, then it should be:

Code:

size_t len = strlen(line); if (len > 0 && line[len - 1] == '\n') { --len; line[len] = '\0'; }

The if statement should test len > 1 should'nt it? If len = 1 then the user input must've only been a new line, which I want to keep.
10-12-2012
laserlight

You cannot assign to an array. In this case, it will suffice to call:

Code:

removeNewLine(command);

By the way, I did not subtract the 1 immediately from the result of strlen in case strlen is 0. Then, I did the check for > 0 before I accessed line[len - 1]. Granted, if the line comes from fgets within the constraints of your classroom environment, you will not need to be this paranoid, but you might as well be.
10-12-2012
Mentallic

Quote:

Originally Posted by laserlight

You cannot assign to an array. In this case, it will suffice to call:

Code:

removeNewLine(command);

Ahh thanks :)

Quote:

Originally Posted by laserlight

By the way, I did not subtract the 1 immediately from the result of strlen in case strlen is 0. Then, I did the check for > 0 before I accessed line[len - 1]. Granted, if the line comes from fgets within the constraints of your classroom environment, you will not need to be this paranoid, but you might as well be.

Before calling the function removeNewLine, I already check that the return value of fgets is not NULL, so then strlen can't be 0 from what I can see.
10-12-2012
laserlight

Quote:

Originally Posted by Mentallic

Before calling the function removeNewLine, I already check that the return value of fgets is not NULL, so then strlen can't be 0 from what I can see.

You could read Nominal Animal's post #9 for a weird situation where strlen can still return 0 after reading with fgets. However, that aside, consider: separate from the issue of fgets, someone reading your code would ask if the removeNewLine function works correctly on an empty string. With your implementation, it does not. So you should either document that fact (e.g., with a comment, or maybe even an assertion), or just handle empty strings gracefully.
10-12-2012
whiteflags

int length = strlen(command)-1;
if(command[length] == '\n' && length != 0)

Forgive me for not posting this in the original context but this is still exhibiting out of bounds access. The undefined behavior can only be mitigated if the length != -1 check is performed first.

Order of execution always matters.
10-12-2012
laserlight

Quote:

Originally Posted by whiteflags

The undefined behavior can only be mitigated if the length != -1 check is performed first.

It is a little more complicated than that, methinks: strlen returns a size_t, which is an unsigned integer type, so if the return value is 0, then the subtraction would result in a large positive number. This is then converted to int, the conversion being implementation defined. Assuming that we do arrive at -1 in the end, the check that Mentallic wrote compares with 0, not -1, so it is wrong anyway.
10-12-2012
Mentallic

Quote:

Originally Posted by laserlight

You could read Nominal Animal's post #9 for a weird situation where strlen can still return 0 after reading with fgets. However, that aside, consider: separate from the issue of fgets, someone reading your code would ask if the removeNewLine function works correctly on an empty string. With your implementation, it does not. So you should either document that fact (e.g., with a comment, or maybe even an assertion), or just handle empty strings gracefully.

You make a good point.

Quote:

Originally Posted by whiteflags

int length = strlen(command)-1;
if(command[length] == '\n' && length != 0)

Forgive me for not posting this in the original context but this is still exhibiting out of bounds access. The undefined behavior can only be mitigated if the length != -1 check is performed first.

Order of execution always matters.

I'll change my code to

Code:

int length = strlen(command)-1; if (length > 0 && command[length] == '\n')

which looks like it'll handle NULL strings.
10-12-2012
Mentallic

Quote:

Originally Posted by laserlight

It is a little more complicated than that, methinks: strlen returns a size_t, which is an unsigned integer type, so if the return value is 0, then the subtraction would result in a large positive number. This is then converted to int, the conversion being implementation defined. Assuming that we do arrive at -1 in the end, the check that Mentallic wrote compares with 0, not -1, so it is wrong anyway.

Unsigned ints do that? Oh boy...

Ok, I'm going to leave strlen as it is then.

Code:

int length = strlen(command); if (length > 1 && command[length-1] == '\n') { command[length-1]='\0'; }
10-12-2012
laserlight

Yep. Why not just make length a size_t? Also, note that if you are replacing the last character of a string with a null character, then you have effectively reduced the length by 1. Thus, if you want to use the length variable later on, then for correctness you should subtract 1 from it too, hence the decrement in my example in post #21. In your case, you are writing it in a function that will not use length afterwards, so it is arguably okay.

Show 80 post(s) from this thread on one page