Thread: Community Code review - let me have it!

  1. #1
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332

    Community Code review - let me have it!

    Here's some code I wrote this evening, gearing up for my bigger MySQL driver application.

    I'm writing a driver to read in a series of SQL statements from a text file, parse them using a semicolon as a delimiter, and then identifying them as either a SELECT clause or not.

    It works fine right now. I'm putting it out here for a critique and to learn better practices.

    If you have the time, I would appreciate your feedback. Here's the code and a small input file to test with.

    Code:
    #include <stdio.h>
    #include <string.h> 
    #include <stdlib.h> 
    
    #define FILENAME "/Users/toddburch/Desktop/sql.txt"
    
    char * readSQL(FILE * fp) {   // File is already open 
    	
    	char buffer[32000] = {0} ;   // SQL statement buffer - assume it's big enough 
    	char line[81] = {0} ;     //  fgets buffer 
    	
    	char * ptr ; 
    	char * end_of_line ; 
    	
    	// Read data from the input file until a semicolon is found or end of file is hit 
    	
    	while (fgets(line, sizeof(line), fp) ) { 
    		if ( strncmp(line, "--", 2) == 0 ) continue ;   // skip comments 
    		if ( (end_of_line = strrchr(line, '\n')) != NULL ) *end_of_line = 0 ; // Remove newline  
    		if ( (end_of_line = strrchr(line, '\r')) != NULL ) *end_of_line = 0 ; // Remove cr   
    		ptr = line ; 
    		while (*ptr == ' ') ptr++ ;   // index past leading blanks 
    		if (strlen(ptr) == 0) continue ;  
    
    		// look for trailing semi colon - that terminates a statement 
    		end_of_line = ptr + strlen(ptr)-1 ;   // position to last character 
    		while (*end_of_line == ' ') *end_of_line-- = 0  ;  // back up to last non-blank 
    		if (*end_of_line == ';') { 
    			strcat(buffer, ptr) ;   // append to buffer 
    			break ; 
    		} 
    		else {
    			strcat(buffer, ptr) ;   // append to buffer 
    			strcat(buffer, " ") ;   // separate it 
    		}
    	} 
    	if ( strlen(buffer) == 0 ) return NULL ; 
    	
    	// get some storage and pass back a pointer to it 
    	ptr = malloc(strlen(buffer)+1) ;  
    	if (!ptr) return NULL ; 
    	strcpy(ptr,buffer)  ; 
    	return ptr ; 
    }
    
    int isSelect(char * sql) {  // Determine if this is a SELECT statement 
    	char word[] = "SELECT" ; 
    	int i ; 
    	if (strlen(sql) < 7) return 0 ;  // no match 
    	for (i = 0 ; i < 6 ; i++ ) if ( word[i] != toupper(sql[i]) ) return 0 ;  // no match 
    	return 1 ; // match 
    } 
    
    int main (int argc, const char * argv[]) {
    
    	FILE * fp ; 
    	char * sql ;  
    	fp = fopen(FILENAME , "r") ; 
    	if (!fp) { 
    		printf("cannot open input file " FILENAME "\n") ; 
    		return -1 ; 
    	} 
    	
    	while (1) { 
    		sql = readSQL(fp) ; 
    		if (!sql) break ; 
    		if (isSelect(sql)) printf("Select Statement!\n") ; 
    		else printf("Not a select Statement!\n") ;
    		
    		printf("SQL = >%s<\n\n", sql) ; 
    		free(sql) ; 
    	} 
    	
    	fclose(fp) ; 
    	
        return 0;
    }
    Input file. Lines starting with "--" are comment lines.

    Code:
     select * from table where (col1 = 5 ) ; 
    -- this is a comment 
    -- 
    
           insert into table values (1, ';')   
    ;
    
          select name, age, sex, phone from othertable order by 
    --  age
    -- sex 
          name;
    --
    I'm not validating SQL, just splitting the input into individual statements.

    Thanks, Todd
    Mainframe assembler programmer by trade. C coder when I can.

  2. #2
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    parameter of isSelect I would make const char*
    The word I would make "SELECT " to be sure the select is a whole word

    in the readSQL you are actually droping \n, \r and spaces at the and of line - but to do it you are walking to the end of line 3 times
    I would change it in the one iteration like
    Code:
    end_of_line = ptr + strlen(ptr)-1 ;   // position to last character 
    while (*end_of_line == ' ' || *end_of_line == '\n' || *end_of_line == '\r')
    {
      *end_of_line-- = 0  ;  // back up to last non-blank 
      if(end_of_line < ptr) break; /* maybe we get to the beginning of the line? */
    }
    also you are using the result value of readSQL only in the loop - in this case I would get rid of malloc/free - moving the buffer iutside the readSQL and providing it as a parameter (with buffer length)
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Code:
    int isSelect(char * sql) {  // Determine if this is a SELECT statement 
    	char word[] = "SELECT" ; 
    	int i ; 
    	if (strlen(sql) < 7) return 0 ;  // no match 
    	for (i = 0 ; i < 6 ; i++ ) if ( word[i] != toupper(sql[i]) ) return 0 ;  // no match 
    	return 1 ; // match 
    }
    Why not use strnicmp() instead, like this:
    Code:
    int isSelect(char * sql) {  // Determine if this is a SELECT statement 
    	static const char word[] = "SELECT" ; 
            return !strnicmp(word, sql, strlen(word);
    }
    Using static const means that the string isn't being initialized every call.


    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Maybe you could write strnicmp yourself to be ultra-portable.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by citizen View Post
    Maybe you could write strnicmp yourself to be ultra-portable.
    Good point, strnicmp is not available in ALL systems, but most do have it. Writing your own isn't a difficult task [just take the loop from your original code and wrap it in a new function, although you should of course do "toupper" (or tolower) on both sides)].

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    All excellent suggestions! Will do ASAP. I'll have to write my own strnicmp - not available on a Mac. Good idea.

    vart - you are absolutely correct. I struggled when writing to be sure and not back up past the start of the line, but your suggestion is quite succinct.

    I haven't spent a lot of time studying type modifiers. I need to reread that chapter.

    I'll change it and repost.

    Thanks again! Todd
    Mainframe assembler programmer by trade. C coder when I can.

  7. #7
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Here's the updated code, major new bits in red.

    Code:
    #include <stdio.h>
    #include <string.h> 
    #include <stdlib.h> 
    
    #define FILENAME "/Users/toddburch/Desktop/sql.txt"
    
    int possibleOverrun(const char * s1, const char * s2, const int size) { 
    	if (strlen(s1) + strlen(s2) > size) { 
    		fprintf(stderr, "Overrun avoided - SQL statement ignored\n") ; 
    		return 1 ; 
    	} 
    	return 0 ;  // no overrun 
    }  
    
    
    char * readSQL(FILE * fp, char * buffer, const int size ) {   // File is already open 
    	
    	char line[81] = {0} ;     //  fgets buffer 
    	char * ptr ; 
    	char * end_of_line ; 
    	
    	buffer[0] = 0 ;  // Set to empty 
    	
    	// Read data from the input file until a semicolon is found or end of file is hit 
    	
    	while (fgets(line, sizeof(line), fp) ) { 
    		if ( strncmp(line, "--", 2) == 0 ) continue ;   // skip comments 
    
    		ptr = line ; 
    		end_of_line = ptr + strlen(ptr)-1 ;   // position to last character 
    		while (*end_of_line == ' ' || *end_of_line == '\n' || *end_of_line == '\r')
    		{
    			*end_of_line-- = 0  ;  // back up to last non-blank 
    			if (end_of_line < ptr) break; /* maybe we get to the beginning of the line? */
    		} 
    		
    		while (*ptr == ' ') ptr++ ;   // index past leading blanks 
    		if (strlen(ptr) == 0) continue ;  
    
    		// look for trailing semi colon - that terminates a statement 
    		if (*end_of_line == ';') { 
    			if (possibleOverrun(buffer, ptr, size)) return NULL ;
    			
    			strcat(buffer, ptr) ;   // append to buffer 
    			break ; 
    		} 
    		else {
    			if (possibleOverrun(buffer, ptr, size-1 )) return NULL ; // allow for blank 
    			strcat(buffer, ptr) ;   // append to buffer 
    			strcat(buffer, " ") ;   // separate it 
    		}
    	} 
    	if ( strlen(buffer) == 0 ) return NULL ; 
    	return buffer ; 
    }
    
    int strnicmp(const char * s1, const char * s2, const size_t length) { 
    
    // Return 0 if the strings match 
    // Return i of s1 != s2, where i is the differing character relative to 1. 
    
    	int i ; 
    	char c1, c2 ; 
    
    	for (i = 0 ; i < length ; i++) { 
    		c1 = toupper(s1[i]) ; 
    		c2 = toupper(s2[i]) ; 
    		if ( c1==0 && c2==0) return 0 ;      // match 
    		else if (c1==0 || c2==0) return 1 ;  // no match
    		else if ( c1 == c2 ) continue ;      // so far, so good 
    		else return i+1 ;           // Return position of the first non-matching char
    	}
    	return 0 ; // match 
    
    } 
    
    int isSelect(const char * sql) {  // Determine if this is a SELECT statement 
    	static const char keyword[] = "SELECT " ; 
    	return !strnicmp(sql, keyword, strlen(keyword) ) ; 
    } 
    
    int main (int argc, const char * argv[]) {
    
    	FILE * fp ; 
    	char * sql ;  
    	char * buffer ;
    	int sqlcount = 0 ;
    	static const int bufsize = 32768 ;  
    
    	if (( buffer = malloc(bufsize) ) == NULL) { 
    		fprintf(stderr, "Cannnot get buffer for SQL.  Exiting.\n") ; 
    		return -1 ; 
    	} 
     
    	fp = fopen(FILENAME , "r") ; 
    	if (!fp) { 
    		fprintf(stderr, "Cannot open input file " FILENAME "\n") ; 
    		return -1 ; 
    	} 
    	while (1) { 
    		sql = readSQL(fp, buffer, bufsize ) ; 
    		if (!sql) break ; 
    		sqlcount++ ;                  // Bump count of SQL statements processed  
    		if (isSelect(sql)) printf("Select Statement!\n") ; 
    		else printf("Not a select Statement!\n") ;
    		
    		printf("SQL = >%s<\n\n", sql) ; 
    	} 
     
    	printf("%d SQL statements processed.\n", sqlcount) ; 
    	
    	free(buffer) ;  
    	fclose(fp) ; 
    	
        return 0;
    }
    Mainframe assembler programmer by trade. C coder when I can.

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Any particular reason you are malloc'ing the buffer now, when you weren't before?

    Also, something that I should have noted before:
    is there any reason why you need to use strcat() to append the data - presumably we could just copy [using strctpy for example] the latest data into the current buffer, as long as you track how much you have - and that would also affect the "possiblerOverRun()", in that you'd pass in "how much I've got so far" and then only do strlen on the bit you are adding.

    perhaps you should rename possibleOverrun to checkForOverrun instead - that would make it more along the lines of what the function actually does.

    This line is repeated:
    Code:
    			strcat(buffer, ptr) ;   // append to buffer
    Try to not repeat the same bit of code.


    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Thanks matsp.

    I was malloc'ing only what was needed for a buffer to pass the SQL back before, now I just malloc the whole thing and hold on to it. ?? 6 one way, 1/2 a dozen the other.

    I've rearranged the (now) checkForOverrun and the subsequent strcat(buffer,ptr) statement to place them prior to that if logic. Good suggestion - I too dislike redundant code.

    Code:
    		if (checkForOverrun(buffer, ptr, size-1 )) return NULL ; // allow for potential blank
    		strcat(buffer, ptr) ;   // append to buffer 
    
    		// look for trailing semi colon - that terminates a statement 
    		if (*end_of_line == ';') { 
    			break ; 
    		} 
    		else {
    			strcat(buffer, " ") ;   // separate it 
    		}
    As far as why am I strcat'ing - and not keeping track of a running pointer and just strcpy'ing? Laziness. It's just easier and less things for me to have to keep track of - at the expense of a nit in overhead. I won't be processing millions of these things. That's my logic anyway!
    Mainframe assembler programmer by trade. C coder when I can.

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Sure, it's your code, I'm just allergic to code that does unnecessary "find the zero at the end of this string", and making a function that appends your string using it's current "last entry" would not require much extra effort, and you can sort of combine it with checkForOverRun(), something like this:
    Code:
    int concat(char *buf, int *pos, int size, char *str)
    {
          int len = strlen(str);
          if (len + *pos  > size) 
          { 
              Overrun error;
              return 1; 
          }
          strcpy(&buf[pos], str);
          (*pos) += len;
          
          return 0;      
    }
    ...
    
         int usedSize = 0;
    ...
    		if (concat(buffer, &usedSize, ptr, size-1 )) return NULL ; // allow for potential blank
    
    		// look for trailing semi colon - that terminates a statement 
    		if (*end_of_line == ';') { 
    			break ; 
    		} 
    		else {
    			if (concat(buffer, &usedSize, " ", size)) return NULL;   // separate it 
    		}
    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #11
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Yes, easy enough. I implemented it. Thanks again.
    Mainframe assembler programmer by trade. C coder when I can.

  12. #12
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Where's Elysia's input
    =========================================
    Everytime you segfault, you murder some part of the world

  13. #13
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    I didn't use void main(), gets() or scanf(), and did code "return 0", so there was nothing to comment on in that department!
    Mainframe assembler programmer by trade. C coder when I can.

  14. #14
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    I didn't use void main(), gets() or scanf(), and did code "return 0", so there was nothing to comment on in that department!
    Indentation?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  15. #15
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Yep, did that right too! lol
    Mainframe assembler programmer by trade. C coder when I can.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Code Review for upcoming contest
    By Darryl in forum Contests Board
    Replies: 2
    Last Post: 02-28-2006, 02:39 PM
  2. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  3. Problem : Threads WILL NOT DIE!!
    By hanhao in forum C++ Programming
    Replies: 2
    Last Post: 04-16-2004, 01:37 PM
  4. True ASM vs. Fake ASM ????
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 7
    Last Post: 04-02-2003, 04:28 AM
  5. Interface Question
    By smog890 in forum C Programming
    Replies: 11
    Last Post: 06-03-2002, 05:06 PM