Thread: Segfault with simple lexer

  1. #1
    Registered User
    Join Date
    Dec 2012
    Posts
    34

    Segfault with simple lexer

    Hi all -

    I'm trying to get this lexer to work but it gives a segfault at the moment. Here's the code -

    Code:
    /* lex.c */ 
    /* This code is released to the public domain. */ 
    /* "Share and enjoy......"  :)     */ 
    
    #include <stdio.h>
    #include <ctype.h>
    #include <string.h>
    #include <stdlib.h>
    
    
    /* Array of our keywords in string form. */ 
    char *kw_strings[10] = { 
       "select", "from", "where", "and", "or", "not", "in", "is", "null" 
        } ; 
       
        
    /*  Search function to search the array of keywords. */ 
    int search(char *arr[], int dim, char *str) { 
            
        int i;      
        int found_match;
        
        for (i=0; i<dim; i++) { 
            if ( !strcmp(arr[i], str ) )  {   
                found_match = 1;        
                break; 
        }   else found_match = 0;    
     }  /* For */     
    
        return found_match; 
    }  /* search */ 
    
    
    void print_token(char *str) {       
        char token[80];
        char *toktype;    
        int i=0;
        char c;      
    
    while (*str != '\0') { 
        c = str[i];
                  
        /* Keyword or identifier */ 
        if (isalpha(c) || c == '_') {        
           while (isalnum(c) || c == '_') { 
                 token[i] = c; 
                 str++;  i++;
           }
             
       if (search(kw_strings, 10, token) == 1 ) toktype = "Keyword";
       else toktype "Identifier" ;                                         
           printf("%s %s\n", toktype, token);            
        }    
        
        else if (ispunct(c)) { 
            toktype = "operator"; 
            printf("%s %s\n", toktype, token);          
            str++; i++;
        }    
        else if (isspace(c))  { 
           str++;  i++;    
        }      
       str++; i++;       
      }  // while    
                                                   
    }  // print_token 
    
    
    int main()  { 
        
    char *mystr = "select var1 from mytable ; " ; 
       
    print_token(mystr);
    
    return 0; 
    }
    If I can just get this cut-down lexer to work, getting the full version up and running should be no problem.
    I'll put it on Github for others to use too.
    Many thanks in advance for any help received!

    Cheers -
    Andy (latte123)

  2. #2
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    1. Indent your code according to some style.
    2. Init your pointers to NULL or some other value
    char *toktype;
    3. Turn on Compiler warnings

    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  3. #3
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Code:
    /* lex.c */
    /* This code is released to the public domain. */
    /* "Share and enjoy......"  :)     */
    
    #include <stdio.h>
    #include <ctype.h>
    #include <string.h>
    #include <stdlib.h>
    
    
    /* Array of our keywords in string form. */
    char *kw_strings[10] =
    {
        "select", "from", "where", "and", "or", "not", "in", "is", "null"
    } ;
    
    
    /*  Search function to search the array of keywords. */
    int search(char *arr[], int dim, char *str)
    {
    
        int i;
        int found_match;
    
        for (i=0; i<dim; i++)
        {
            if ( !strcmp(arr[i], str ) )
            {
                found_match = 1;
                break;
            }
            else
                found_match = 0;
        }  /* For */
    
        return found_match;
    }  /* search */
    
    
    void print_token(char *str)
    {
        char token[80];
        char *toktype;
        int i=0;
        char c;
    
        while (*str != '\0')
        {
            c = str[i];
    
            /* Keyword or identifier */
            if (isalpha(c) || c == '_')
            {
                while (isalnum(c) || c == '_')
                {
                    token[i] = c;
                    str++;
                    i++;
                }
    
                if (search(kw_strings, 10, token) == 1 )
                    toktype = "Keyword";
                else
                    toktype "Identifier" ;
                printf("%s %s\n", toktype, token);
            }
    
            else if (ispunct(c))
            {
                toktype = "operator";
                printf("%s %s\n", toktype, token);
                str++;
                i++;
            }
            else if (isspace(c))
            {
                str++;
                i++;
            }
            str++;
            i++;
        }  // while
    
    }  // print_token
    
    
    int main()
    {
    
        char *mystr = "select var1 from mytable ; " ;
    
        print_token(mystr);
    
        return 0;
    }
    Code:
    C:\Users\stahta01\devel\open_source_code\no_version_control\Test\testsql\main.c:64:17: warning: statement with no effect [-Wunused-value]
                     toktype "Identifier" ;
                     ^
    C:\Users\stahta01\devel\open_source_code\no_version_control\Test\testsql\main.c:64:25: error: expected ';' before string constant
                     toktype "Identifier" ;
    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  4. #4
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,787
    This while loop (below) is going to just go straight past the end of the string (str) because it never checks for the terminating '\0' character. As an added bonus because str is going out of bounds so will token[i] (most likely).

    Code:
            /* Keyword or identifier */
            if (isalpha(c) || c == '_')
            {
                while (isalnum(c) || c == '_')
                {
                    token[i] = c;
                    str++;
                    i++;
                }

  5. #5
    Registered User
    Join Date
    Dec 2012
    Posts
    34
    Quote Originally Posted by Hodor View Post
    This while loop (below) is going to just go straight past the end of the string (str) because it never checks for the terminating '\0' character. As an added bonus because str is going out of bounds so will token[i] (most likely).

    Code:
            /* Keyword or identifier */
            if (isalpha(c) || c == '_')
            {
                while (isalnum(c) || c == '_')
                {
                    token[i] = c;
                    str++;
                    i++;
                }

    Hi - thanks Hodor and stahta01 - very helpful!

    As an aside (and an experiment), I've also been trying the alternative approach below (which looks very promising). It would certainly be easier to maintain -

    Code:
    #include <stdio.h>
    #include <ctype.h>
    #include <string.h>
    #include <stdlib.h>
    
    
    void lex_kwident(char *str, int i)  { 
      printf("I am at pos. %d in string %s\n", i, str);     
    }    
    
    
    void lex_string(char *str, int i) { 
      printf("I am at pos. %d in string %s\n", i, str);        
    }    
    
    
    void lex_number(char *str, int i) { 
      printf("I am at pos. %d in string %s\n", i, str);        
    }    
        
    
    void lex_punct(char *str, int i) { 
      printf("I am at pos. %d in string %s\n", i, str);            
    }     
    
    
    void lex_space(char *str, int i) { 
      printf("I am at pos. %d in string %s\n", i, str);    
    }     
    
    
    void lex(char *str) { 
        int i=0; 
        char c; 
    
    while (*str != '\0') { 
        c = str[i];
                  
        /* Keyword or identifier */ 
        if (isspace(c))                   lex_space(str, i); 
        else if (isalpha(c) || c == '_')  lex_kwident(str, i); 
        else if ( c == '"')               lex_string(str, i); 
        else if (isdigit(c))              lex_number(str, i);
        else if (ispunct(c))              lex_punct(str, i);
                    
        str++; i++;       
      }  // while    
                                                   
    }  // print_token 
    
     
    
    int main()  { 
        
    char *mystr = "select var1 from mytable ; " ; 
       
    lex(mystr);
    
    return 0; 
    
    }

  6. #6
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    FYI:

    Code:
    c = str[i];
    The code above implies str value is not being changed inside your print_token function.

    While you are changing both "i" and "str" inside the function print_token.

    Edit: The problem seems to be you are changing both at the same time inside the function.

    Tim S.
    Last edited by stahta01; 01-07-2018 at 12:44 AM.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  7. #7
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Quote Originally Posted by stahta01 View Post
    1. Indent your code according to some style.
    2. Init your pointers to NULL or some other value
    char *toktype;
    3. Turn on Compiler warnings

    Tim S.
    4. Avoid the use of magic numbers!

    You OP code formatted and minor fixes and major logic change of increment i and then updating str pointer.

    Code:
    /* lex.c */
    /* This code is released to the public domain. */
    /* "Share and enjoy......"  :)     */
    
    #include <stdio.h>
    #include <ctype.h>
    #include <string.h>
    #include <stdlib.h>
    
    #define NUMBER_OF_KEYWORDS 9
    
    /* Array of our keywords in string form. */
    char *kw_strings[NUMBER_OF_KEYWORDS] =
    {
        "select", "from", "where", "and", "or", "not", "in", "is", "null"
    } ;
    
    
    /*  Search function to search the array of keywords. */
    int search(char *arr[], int dim, char *str)
    {
    
        int i;
        int found_match;
    
        for (i=0; i<dim; i++)
        {
            if ( !strcmp(arr[i], str ) )
            {
                found_match = 1;
                break;
            }
            else
                found_match = 0;
        }  /* For */
    
        return found_match;
    }  /* search */
    
    
    void print_token(char *str)
    {
        char token[80] = {0};
        char *toktype = "???";
        int i=0;
        char c;
    
        while (*str != '\0')
        {
            c = str[i];
    
            /* Keyword or identifier */
            if (isalpha(c) || c == '_')
            {
                while (isalnum(c) || c == '_')
                {
                    token[i] = c;
                    i++;
                    c = str[i];
                }
    
                if (search(kw_strings, NUMBER_OF_KEYWORDS, token) == 1 )
                    toktype = "Keyword";
                else
                    toktype = "Identifier" ;
                printf("%s %s\n", toktype, token);
            }
    
            else if (ispunct(c))
            {
                toktype = "operator";
                printf("%s %s\n", toktype, token);
                i++;
            }
            else if (isspace(c))
            {
                i++;
            }
            else
            {
                i++;
            }
            str += i;
        }  // while
    
    }  // print_token
    
    
    int main()
    {
    
        char *mystr = "select var1 from mytable ; " ;
    
        print_token(mystr);
    
        return 0;
    }
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  8. #8
    Registered User
    Join Date
    Dec 2012
    Posts
    34
    Quote Originally Posted by stahta01 View Post
    FYI:

    Code:
    c = str[i];
    The code above implies str value is not being changed inside your print_token function.

    While you are changing both "i" and "str" inside the function print_token.

    Edit: The problem seems to be you are changing both at the same time inside the function.

    Tim S.
    Hi again -

    Ahh..... I think I may be seeing what you mean. The code outputs positions up to 13, then it jumps to 17, 19 and 21.
    If I think about it - we always want str[0] *anyway* when we use str++, so I should always keep c as str[0].

    I've now got the function as below and have dropped the integer parameter to the lex_ functions -
    Code:
    void lex(char *str) {    
        char c; 
    
    while (*str != '\0') { 
        c = str[0];
                  
        /* Keyword or identifier */ 
        if (isspace(c))                   lex_space(str); 
        else if (isalpha(c) || c == '_')  lex_kwident(str); 
        else if ( c == '"')               lex_string(str); 
        else if (isdigit(c))              lex_number(str);
        else if (ispunct(c))              lex_punct(str);
                    
         str++;       
      }  // while    
                                                   
    }  // print_token
    The code now works much better - just as would be expected. Many thanks! My brain needed a good rattle and that's just what you've given.....

    Cheers -
    Andy

  9. #9
    Registered User
    Join Date
    Dec 2012
    Posts
    34
    Hey stahta01 - a HUGE thanks for posting the fixed code! *Big* improvement! I've just run it and it's great!

    Thanks again - cheers -
    - Andy

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. SQL lexer - almost there, just have a printf problem
    By latte123 in forum C Programming
    Replies: 4
    Last Post: 12-31-2017, 06:34 PM
  2. Lexer / Parser
    By Asymptotic in forum C Programming
    Replies: 3
    Last Post: 12-21-2016, 03:32 PM
  3. need help with a lexer
    By DTSCode in forum C++ Programming
    Replies: 4
    Last Post: 09-13-2013, 06:32 AM
  4. Review my simple Lexer please..
    By manasij7479 in forum C++ Programming
    Replies: 4
    Last Post: 08-19-2011, 10:56 AM
  5. segfault with gcc, but not with TC
    By koodoo in forum C Programming
    Replies: 15
    Last Post: 04-23-2007, 09:08 AM

Tags for this Thread