Hello to the board! Found myself writing some C over the weekend for the first time in a long time, and thought I'd shoot it over to anyone who wants to take the time to pick it apart. The tool I've written doesn't have any problem doing its very limited functionality, I just want to use it as a teaching moment for myself as kind of an entryway into best-practices and so forth. Be as pedantic as you want, I won't take any offense and I'm specifically looking to have this torn apart.
The setup here is that I'm trying to write a minimal-dependency thrift protocol parser for a particular thrift message type (a reply from a "check_service_status call, guaranteed to be a string return type). If I were dealing with thrift in general, there'd be much more work involved and I'd probably be pulling in libraries, but because the message name and type are both guaranteed, I can basically depend on every translated message having a set "prelude", and the job after that is just to translate the thrift protocol string encoding. The input is just binary piped straight from the socket (via netcat or whatever), and doesn't need to support any other features of thrift to do its job.
Here's the source:
Code:#include <stdbool.h>
#include <stdio.h>
#include <stdint.h>
#define MAX_BUFFER 128
#define PRELUDE_LENGTH 35
// This should be the same every time, but it boils down to:
// (all big-endian)
// (2 bytes): thrift protocol version
// (2 bytes): message type (2 is "reply")
// (4 bytes): name size (20, length of "check_service_status")
// (20 bytes): ASCII for "check_service_status"
// (4 bytes): sequence id, always zero here
// response struct begins here
// (1 byte): field type, 11 is "string"
// (2 bytes): field id, always zero here
// Then follows a 4-byte size and the string, which we care about, then
// a field stop (0x00), which we don't.
char PRELUDE[] = {
0x80, 0x01, 0x00, 0x02,
0x00, 0x00, 0x00, 0x14,
0x63, 0x68, 0x65, 0x63,
0x6b, 0x5f, 0x73, 0x65,
0x72, 0x76, 0x69, 0x63,
0x65, 0x5f, 0x73, 0x74,
0x61, 0x74, 0x75, 0x73,
0x00, 0x00, 0x00, 0x00,
0x0b, 0x00, 0x00
};
bool cpu_is_le() {
int n = 1;
return *(char *)&n == 1;
}
uint32_t be32_to_cpu(const uint32_t num) {
return ((num>>24)&0xff) | ((num<<8)&0xff0000) | ((num>>8)&0xff00) | ((num<<24)&0xff000000);
}
int discard_prelude(FILE *file) {
char buffer[PRELUDE_LENGTH];
int read;
int i;
read = fread(buffer, sizeof(char), PRELUDE_LENGTH, file);
if (read != PRELUDE_LENGTH) {
printf("Only read %d bytes\n", read);
return 16;
}
for(i = 0; i < PRELUDE_LENGTH; i++) {
if (!(PRELUDE[i] == buffer[i])) {
printf("Buffer did not match prelude at %d: %d -- %d\n", i, buffer[i], PRELUDE[i]);
return 32;
}
}
}
int string_field_value(char *buffer, FILE *file) {
uint32_t size;
int read;
fread(&size, sizeof(uint32_t), 1, file);
if(cpu_is_le()){
size = be32_to_cpu(size);
}
if(size > MAX_BUFFER - 1) {
printf("Refusing to read %lu bytes\n", (long unsigned) size);
return 64;
}
read = fread(buffer, sizeof(char), size, stdin);
if ((uint32_t) read != size) {
printf("Size read terminated unexpectedly, %d instead of %d\n", read, size);
return 128;
}
buffer[size] = (char) 0; // We are told it is a string, so null-terminate it
return 0;
}
int status(char *full_status) {
char status_sub[2];
sprintf(status_sub, "%.*s", 2, full_status);
return strcmp(status_sub, "OK");
}
int main() {
char buffer[MAX_BUFFER];
uint32_t size;
char *read_string;
int discard_return_status = discard_prelude(stdin);
if (discard_return_status != 0) {
return discard_return_status;
}
int field_value_status = string_field_value(buffer, stdin);
if (field_value_status != 0) {
return field_value_status;
}
printf("%s", buffer);
return status(buffer);
}
Here's some specific questions I have:
- How cross-platform is this? Am I using the guaranteed-size type specifier (uint32_t) in the right place, and is my runtime-based big/little-endian detection method reasonable?
- Am I doing anything unsafe? Since all my char arrays are defined with specific bounds, and since I'm explicitly sticking a null zero terminator on the read string, it seems like I miss all the traditional vectors for buffer overflows, but I'd like to know if I'm mistaken there.
- Is there any specific binary input I should try throwing at this to make it act in weird ways?
- Are there any great resources for coding standards for making good C that go beyond teaching the language itself and more into how to make robust code that handles failure modes well and is easy to read?
Thanks!