Thread: need help with a lexer

  1. #1
    Registered User
    Join Date
    Sep 2013
    Posts
    11

    need help with a lexer

    hey all! Im not new to forums in general, but i am to this one, so if im inadvertently breaking a rule please let me know. anyways i need help with some code im writing and cant figure it out. im writing a compiler for a language im designing called Jade, and am on the step with the lexer. im basing this off the udacity tutorial teaching you how to build a web browser (its more to learn how to interpret html/javascript) and am using the purple dragon book for reference. anyways heres the code:
    Code:
    #include <iostream>
    #include <fstream>
    #include <sstream>
    #include <string>
    #include <vector>
    #include <regex.h>
    
    using std::ostream;
    using std::cout;
    using std::cerr;
    using std::endl;
    using std::cin;
    using std::ifstream;
    using std::istringstream;
    using std::string;
    using std::vector;
    
    class Token
    {
         string Type, Name;
         int LineNo, LineCol;
    
         public:
              Token(string type, string name, int lineno, int linecol)
                   : Type(type), Name(name), LineNo(lineno), LineCol(linecol) {}
              Token() {}
    
              void SetType   (string type) { Type    = type;    }
              void SetName   (string name) { Name    = name;    }
              void SetLineNo (int lineno)  { LineNo  = lineno;  }
              void SetLineCol(int linecol) { LineCol = linecol; }
    
              string GetType   () { return Type;    }
              string GetName   () { return Name;    }
              int    GetLineNo () { return LineNo;  }
              int    GetLineCol() { return LineCol; }
    };
    
    ostream& operator<<(ostream &out, Token token)
    {
         out<<"("<< token.GetType() <<", '"<< token.GetName() <<"', "<< token.GetLineNo() <<", "<< token.GetLineCol() <<")";
         return out;
    }
    
    void  ReadInFile (ifstream&, vector<string>&);
    void  Lex        (vector<string>&, vector<Token>&);
    Token Lex        (string, int&, int&);
    
    int main(int argc, char *argv[])
    {
         ifstream File(argv[1]);
         vector<string> FileContents;
         vector<Token> TokenList;
    
         ReadInFile(File, FileContents);
    
         Lex(FileContents, TokenList);
    
         for(auto &Counter : TokenList)
              cout<< Counter << endl;
    }
    
    void ReadInFile(ifstream &File, vector<string> &FileContents)
    {
         int Counter = 0;
         string Line;
    
         while(getline(File, Line))
              FileContents.push_back(Line);
    }
    
    void Lex(vector<string> &FileContents, vector<Token> &TokenList)
    {
         int LineNo = 1, MBegin, MEnd;
    
         for(auto &Counter : FileContents)
         {
              (MBegin = 0) && (MEnd = Counter.size() - 1);
    
              while(true)
              {
                   TokenList.push_back(Lex(Counter, MBegin, MEnd));
    
                   if(Counter.at(MEnd) == '\n')
                   {
                        if(TokenList[TokenList.size() - 1].GetType() == "UNINITIALIZED")
                             TokenList.pop_back();
    
                        break;
                   }
    
                   Counter = Counter.substr(MEnd, Counter.size() - MEnd - 1);
                   TokenList[TokenList.size() - 1].SetLineNo(LineNo);
              }
    
              LineNo++;
         }
    }
    
    Token Lex(string Line, int &MBegin, int &MEnd)
    {
         regex_t Regex;
         regmatch_t Match;
    
         regcomp(&Regex, "\"[^\"]+\"", REG_EXTENDED);
         if(regexec(&Regex, Line.c_str(), 1, &Match, 0) == 0)
         {
                   (MBegin = Match.rm_so) && (MEnd = Match.rm_eo);
                   return Token("STRING", Line.substr(MBegin + 1, MEnd - MBegin - 2), -1, MBegin + 1);
         }
         regfree(&Regex);
    
         (MBegin = 0) && (MEnd = Line.size() - 1);
         return Token("UNINITIALIZED", "", -1, -1);
    }
    terminate called after throwing an instance of 'std:ut_of_range'
    what(): basic_string::at
    Aborted (core dumped)
    the problem is when ever i run it i get this:
    terminate called after throwing an instance of 'std:ut_of_range'
    what(): basic_string::at
    Aborted (core dumped)

    i have asked cplusplus.com, which i have found to be a great forum, but they were unable to help.

  2. #2
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    This kind of thing is just silly, even if it worked correctly
    Code:
    (MBegin = 0) && (MEnd = Line.size() - 1);
    The expression (MBegin = 0) has a value of 0, i.e., false, so the short-circuit nature of && will skip the other expression/assignment.

    Just write it normally.

    And what is regex.h?
    Why aren't you using the C++11 regex?
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  3. #3
    Registered User
    Join Date
    Sep 2013
    Posts
    11
    ah that makes sense. i was actually just testing the && to see if it would work and decided to not take it out, but i forgot that it will short circuit. regex.h is the c regex library. i would and have tried to use <regex>, but as im on linux using the gcc i cant, because its not supported yet. i know regex.h is outdated, so its really just a place holder right now until <regex> is supported or i find the time to get boost installed and built

    edit: thanks it worked! you have no idea how helpful that is. my lexer is finally done!
    Last edited by DTSCode; 09-12-2013 at 05:20 PM.

  4. #4
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    No problem. Easy one, actually.
    Get rid of both of those && contraptions and write what you actually mean.
    The cost of software maintenance increases with the square of the programmer's creativity. - Robert D. Bliss

  5. #5
    Registered User
    Join Date
    Oct 2006
    Posts
    3,445
    Quote Originally Posted by DTSCode View Post
    but as im on linux using the gcc i cant, because its not supported yet
    what linux distribution and gcc version do you have? do you know you can build your own custom version of gcc?
    What can this strange device be?
    When I touch it, it gives forth a sound
    It's got wires that vibrate and give music
    What can this thing be that I found?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Review my simple Lexer please..
    By manasij7479 in forum C++ Programming
    Replies: 4
    Last Post: 08-19-2011, 10:56 AM