Critique me!

Printable View

09-12-2017
tehwilsonator

Critique me!
Hello to the board! Found myself writing some C over the weekend for the first time in a long time, and thought I'd shoot it over to anyone who wants to take the time to pick it apart. The tool I've written doesn't have any problem doing its very limited functionality, I just want to use it as a teaching moment for myself as kind of an entryway into best-practices and so forth. Be as pedantic as you want, I won't take any offense and I'm specifically looking to have this torn apart.

The setup here is that I'm trying to write a minimal-dependency thrift protocol parser for a particular thrift message type (a reply from a "check_service_status call, guaranteed to be a string return type). If I were dealing with thrift in general, there'd be much more work involved and I'd probably be pulling in libraries, but because the message name and type are both guaranteed, I can basically depend on every translated message having a set "prelude", and the job after that is just to translate the thrift protocol string encoding. The input is just binary piped straight from the socket (via netcat or whatever), and doesn't need to support any other features of thrift to do its job.

Here's the source:

Code:

#include <stdbool.h> #include <stdio.h> #include <stdint.h> #define MAX_BUFFER 128 #define PRELUDE_LENGTH 35 // This should be the same every time, but it boils down to: // (all big-endian) // (2 bytes): thrift protocol version // (2 bytes): message type (2 is "reply") // (4 bytes): name size (20, length of "check_service_status") // (20 bytes): ASCII for "check_service_status" // (4 bytes): sequence id, always zero here // response struct begins here // (1 byte): field type, 11 is "string" // (2 bytes): field id, always zero here // Then follows a 4-byte size and the string, which we care about, then // a field stop (0x00), which we don't. char PRELUDE[] = { 0x80, 0x01, 0x00, 0x02, 0x00, 0x00, 0x00, 0x14, 0x63, 0x68, 0x65, 0x63, 0x6b, 0x5f, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x5f, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x00, 0x00, 0x00, 0x00, 0x0b, 0x00, 0x00 }; bool cpu_is_le() { int n = 1; return *(char *)&n == 1; } uint32_t be32_to_cpu(const uint32_t num) { return ((num>>24)&0xff) | ((num<<8)&0xff0000) | ((num>>8)&0xff00) | ((num<<24)&0xff000000); } int discard_prelude(FILE *file) { char buffer[PRELUDE_LENGTH]; int read; int i; read = fread(buffer, sizeof(char), PRELUDE_LENGTH, file); if (read != PRELUDE_LENGTH) { printf("Only read %d bytes\n", read); return 16; } for(i = 0; i < PRELUDE_LENGTH; i++) { if (!(PRELUDE[i] == buffer[i])) { printf("Buffer did not match prelude at %d: %d -- %d\n", i, buffer[i], PRELUDE[i]); return 32; } } } int string_field_value(char *buffer, FILE *file) { uint32_t size; int read; fread(&size, sizeof(uint32_t), 1, file); if(cpu_is_le()){ size = be32_to_cpu(size); } if(size > MAX_BUFFER - 1) { printf("Refusing to read %lu bytes\n", (long unsigned) size); return 64; } read = fread(buffer, sizeof(char), size, stdin); if ((uint32_t) read != size) { printf("Size read terminated unexpectedly, %d instead of %d\n", read, size); return 128; } buffer[size] = (char) 0; // We are told it is a string, so null-terminate it return 0; } int status(char *full_status) { char status_sub[2]; sprintf(status_sub, "%.*s", 2, full_status); return strcmp(status_sub, "OK"); } int main() { char buffer[MAX_BUFFER]; uint32_t size; char *read_string; int discard_return_status = discard_prelude(stdin); if (discard_return_status != 0) { return discard_return_status; } int field_value_status = string_field_value(buffer, stdin); if (field_value_status != 0) { return field_value_status; } printf("%s", buffer); return status(buffer); }

Here's some specific questions I have:
1. How cross-platform is this? Am I using the guaranteed-size type specifier (uint32_t) in the right place, and is my runtime-based big/little-endian detection method reasonable?
2. Am I doing anything unsafe? Since all my char arrays are defined with specific bounds, and since I'm explicitly sticking a null zero terminator on the read string, it seems like I miss all the traditional vectors for buffer overflows, but I'd like to know if I'm mistaken there.
3. Is there any specific binary input I should try throwing at this to make it act in weird ways?
4. Are there any great resources for coding standards for making good C that go beyond teaching the language itself and more into how to make robust code that handles failure modes well and is easy to read?
Thanks!
09-12-2017
Salem

> char PRELUDE[]
You should say
char PRELUDE[PRELUDE_LENGTH]
- too many initialisers will get you an error
- too few will mean your code won't fail mysteriously when accessing it in a loop.
- uppercase symbols are conventionally reserved for macros only.

> bool cpu_is_le()
This isn't portable (or even necessary).
Question 20.9
Question 12.42
If you build your int's one byte at a time directly from the stream, you don't care about endian at all.
Stuffing all the bytes directly in an int, then wondering how to rearrange them is the wrong way.

> return 16;
Too many magic numbers.
Create an enum / #define for your error codes.
Also, what can portably be returned from main consists of 0, EXIT_SUCCESS and EXIT_FAILURE.
Your internal error codes might not be meaningful to the outside environment.

> int string_field_value(char *buffer, FILE *file)
Get into the habit of also passing your buffer size along with any buffer argument.
In a library, there would be no convenient #define for the actual buffer size - and if you did specify a max, people would find it hard to use.

> fread(&size, sizeof(uint32_t), 1, file);
Check the return result.

> int status(char *full_status)
Making this a const pointer helps to inform (and ensure) that the function doesn't modify the string.

> char status_sub[2];
> sprintf(status_sub, "%.*s", 2, full_status);
This is a buffer overflow!
sprintf will copy 2 chars and append a \0

> return strcmp(status_sub, "OK");
You may as well have done this and be done with it.
return strncmp(full_status, "OK", 2);
09-12-2017
tehwilsonator

Quote:

Originally Posted by Salem

> char PRELUDE[]
You should say
char PRELUDE[PRELUDE_LENGTH]
- too many initialisers will get you an error
- too few will mean your code won't fail mysteriously when accessing it in a loop.
- uppercase symbols are conventionally reserved for macros only.

> bool cpu_is_le()
This isn't portable (or even necessary).
Question 20.9
Question 12.42
If you build your int's one byte at a time directly from the stream, you don't care about endian at all.
Stuffing all the bytes directly in an int, then wondering how to rearrange them is the wrong way.

This is good to hear, especially since the Google result I was working from said exactly the opposite (i.e. that shoving directly into an int is the right way). But could you elaborate on "building one byte at a time directly from the stream"? Something like this?

Code:

uint32_t result_i32; char *some_stream; for (i = 0; i < 4; i++) { result_i32 += some_stream[i] << i; }

Quote:

Originally Posted by Salem

> return 16;
Too many magic numbers.
Create an enum / #define for your error codes.
Also, what can portably be returned from main consists of 0, EXIT_SUCCESS and EXIT_FAILURE.
Your internal error codes might not be meaningful to the outside environment.

Awesome. To clarify, are EXIT_SUCCESS and EXIT_FAILURE macros that already exist in some header somewhere? Also, is EXIT_SUCCESS just a portable way to refer to zero in most cases?

Quote:

Originally Posted by Salem

> int status(char *full_status)
Making this a const pointer helps to inform (and ensure) that the function doesn't modify the string.

Weirdly, this was a const pointer to begin when I started writing it, but then it disappeared in an iteration. Although of course strncmp makes it unnecessary to begin with, but thanks for the pointer.

Quote:

Originally Posted by Salem

> char status_sub[2];
> sprintf(status_sub, "%.*s", 2, full_status);
This is a buffer overflow!
sprintf will copy 2 chars and append a \0

You may as well have done this and be done with it.
return strncmp(full_status, "OK", 2);

Thanks for that, I was looking for exactly that and an errant Googling into a StackOverflow post gave me the other thing. JOY.
09-12-2017
tehwilsonator

Oh, and you mention that uppercase symbols are conventionally reserved for macros, is there a convention for global constants that aren't macros? Alternatively, would it be more conventional (is it even possible?) to make that into a macro array literal rather than the char * it is right now?

(EDIT: this is going to look weird until the moderator approves my other reply, but this is an addendum to that. Also, it evidently moderates it again when I edit out typos, so please forgive things that are uncorrected, that's why. For example, the somevalue << i was meant to be somevalue << i*8)
09-12-2017
Salem

> "building one byte at a time directly from the stream"? Something like this?
Yes, read the links I gave you.

> To clarify, are EXIT_SUCCESS and EXIT_FAILURE macros that already exist in some header somewhere?
EXIT_SUCCESS, EXIT_FAILURE - cppreference.com

> is there a convention for global constants that aren't macros?
Not so much.
Some people prefix all global variables with say 'g'.

#define PRELUDE "\x80\x01\x00\x02" // you get the idea
09-12-2017
tehwilsonator

>Yes, read the links I gave you.

After my post, I read them more carefully and was enlightened, but I appreciate your patience in re-pointing me to the links in any case. For thread posterity, what finally gave me the least compiler warnings was to use getc rather than fread, and use a bitwise | accumulator (as in the example):

Code:

//i, size_buffer, size defined previous to this for(i = sizeof(uint32_t) - 1; i >= 0; i--) { size_buffer = getc(file); if (size_buffer == EOF) { printf("Size read failed"); return 1; } size |= ((uint32_t) size_buffer) << i * 8; }

And, in general, thanks for the tear-down, it helped a lot.
09-13-2017
tehwilsonator

My original reply to this got re-moderated after an edit and it's pretty confusing to review, any mods wanna poke that?