The suggestion by Adak causes a buffer underrun bug. It reads a byte just before the buffer, and if that byte equals 10, it replaces it with a zero.
My suggestion does not have that bug. I would also be happy with
which, due to the short circuit evaluation rules in C, does not try to access the byte just before the buffer, either. And the len ends up with the correct value in all cases (even when the line was longer than the available buffer and was only partially read).Code:int len = strlen(line);
if (len > 0 && line[len - 1] == '\n')
line[--len] = '\0';
Right. It will always be a NUL character. It is safe to replace it with another NUL character.
(.. assuming line != NULL. Which should be checked right after the fgets() call anyway.)
My code does not try to access the char just before the buffer, though.
Given that input, my code yields *line == '\0' (and strlen(line) == 0 ), i.e. and empty string, but nothing bad happens.
To be honest, I did first write the proper solution, using POSIX.1-2008 getline(), but I thought it would either scare the OP, or I'd just get some negative feedback about it, so I rewrote my reply before posting it.
The input sanitization function,Code:char *data = NULL;
size_t size = 0;
char *line;
size_t len;
ssize_t full;
/* Some form of input loop .. */
while (1) {
full = getline(&data, &size, file handle);
if (full < (ssize_t)1)
end of file or error, abort/break
/* Input sanitization */
if (!clean(data, (size_t)full, &line, &len))
input is suspect, ignore/warn/abort
/* Ignore empty lines (and comments, if cleaned) */
if (len < 1)
continue;
/* Now have len chars at line. */
}
free(data);
data = NULL;
size = 0;
takes the size bytes of input in input, removes unwanted characters or at least checks the string, saves the start of the contents and length of the contents in the last two pointers supplied by the user unless NULL, and returns 0 if the input was safe, and nonzero otherwise.Code:int clean(char *const input, const size_t inputlen, char **const result, size_t *const resultlen);
The implementation details are very domain-specific. You might want to simply remove all embedded NUL bytes and control characters, and replace consecutive whitespace with a single space; this works well with semi-interactive human input, say a game. In other cases you might wish to replace NULs and (some) other ASCII control characters with escape codes (for example, ASCII NULs with "\\0" or ("&" "#0;"), the numeric HTML entity reference for ASCII NUL character).
If there is interest, I'd be happy to show a couple of practical examples.