Thread: Initialize a static map

  1. #1
    Registered User
    Join Date
    May 2008
    Posts
    87

    Initialize a static map

    I still haven't mastered static members and initialization. I'm working on a little calculator program, and I would like to have a std::map that can associate a TokenType with a std::string representation. Mainly I want this ability right now to be able to provide better error messages.

    My program has three objects that work together to do the work, a Parser, Lexer, and SymbolTable. Inside the parser, I have a function that matches Tokens from the input, like a "(" for example as part of an if statement: "if ( expr ) if-block [else if-block] endif". That function looks like:

    Code:
    bool Parser::match(TokenType t, bool get) {
      if (get) lexer.getNextToken();
      
      if (lexer.getCurrentToken().getTokenType() == t)
        return true;
    
      throw SyntaxError("Token Expected");
    }
    "Token Expected" isn't a very descriptive message. My first thought when thinking about associating strings with TokenTypes was to use a map. Then after some deliberation, I decided that such a map belongs in the Lexer, since that is the only part of the program that should have any say about what character input results in which Tokens. So, without further introduction, here is my Lexer code, with the sections I am interested in bolded.

    Lexer.h
    Code:
    #ifndef _LEXER_H_
    #define _LEXER_H_
    
    #include <iostream>
    #include <map>
    #include <string>
    #include "Token.h"
    
    class Lexer {
     public:
      Lexer(std::istream *in);
      Token getNextToken();
      Token getCurrentToken();
      int getLine();
    
      static const std::string getTokenString(Token t);
    
      class LexerError {
      public:
        const char *p;
        LexerError(const char* q) { p = q; }
      };
    
     private:
      Token currentToken;
      std::istream* input;
      int currentLine;
      static std::map<TokenType, std::string> tokenToString;
    };
    
    #endif //_LEXER_H_
    Lexer.cpp
    Code:
    #include <iostream>
    #include <cstdio>
    #include "Lexer.h"
    
    using namespace std;
    
    static std::map<TokenType, std::string> createTokenStringMap() {
      std::map<TokenType, std::string> m;
      
      m[END]   = "EOF";   m[NEWLINE] = "new line";
      m[SEMI]  = ";";     m[DIV]     = "/";
      m[MULT]  = "*";     m[POW]     = "^";
      m[MOD]   = "%";     m[PLUS]    = "+";
      m[MINUS] = '-';     m[OR]      = "|";
      m[AND]   = "&";     m[LP]      = "(";
      m[RP]    = ")";     m[ASSIGN]  = "=";
      m[NOT]   = "!";     m[LT]      = "<";
      m[GT]    = ">";     m[EQ]      = "==";
      m[NEQ]   = "!=";    m[LE]      = "<=";
      m[GE]    = ">=";    m[IF]      = "if";
      m[ELSE]  = "else";  m[ENDIF]   = "endif";
      m[NAME]  = "name";  m[NUMBER]  = "number";
    
      return m;
    }
    
    Lexer::tokenToString = createTokenStringMap();
    
    // Maybe provide more information in the case of a NAME or NUMBER?
    const std::string Lexer::getTokenString(Token t) {
      return tokenToString[t.getTokenType()];
    }
    
    Lexer::Lexer(std::istream* in) 
      : input(in), currentLine(1) { }
    
    // Could inline this in class definition
    Token Lexer::getCurrentToken() {
      return (*this).currentToken;
    }
    
    // Could inline this one too
    int Lexer::getLine() {
      return (*this).currentLine;
    }
    
    Token Lexer::getNextToken() {
      int ch = 0;  // int, rather than char so large enough to hold EOF
      
      do {  // Skip over whitespace
        ch = (*input).get();
        if (ch == EOF) return (*this).currentToken = Token(END);
      } while (ch != '\n' && isspace(ch));
      
      switch (ch) {
      case EOF:
        return (*this).currentToken = Token(END);
    
      case '\n':
        (*this).currentLine++;
        return (*this).currentToken = Token(NEWLINE);
    
      case ';':
        return (*this).currentToken = Token(SEMI);
    
      case '/':
        {
          int look = 0;  // Look ahead a character to detect comments
          look = (*input).get();
          if (look != '/') {
    	if (look != EOF) (*input).putback(char(look));
    	return (*this).currentToken = Token(TokenType(ch));  // Division
          }
    
          // Read over comment until the end of the line (or end of input)
          while ((look != EOF) && (look != '\n')) look=(*input).get();
    
          if (look == '\n')
    	return (*this).currentToken = Token(NEWLINE);
          else
    	return (*this).currentToken = Token(END);
        }
    
        // Simple operators
      case '^':
      case '*':
      case '%':
      case '+':
      case '-':
      case '|':
      case '&':
      case '(':
      case ')':
        return (*this).currentToken = Token(TokenType(ch));
    
      case '!':
        {
          int look = 0;
          look = (*input).get();
          if (look != '=') {
    	if (look != EOF) (*input).putback(char(look));
    	return (*this).currentToken = Token(TokenType(ch)); // Uniary NOT
          } else
    	return (*this).currentToken = Token(NEQ); // Not equal to
        }
    
      case '=':
        {
          int look = 0;
          look = (*input).get();
          if (look != '=') {
    	if (look != EOF) (*input).putback(char(look));
    	return (*this).currentToken = Token(TokenType(ch)); // Assignment
          } else
    	return (*this).currentToken = Token(EQ);  // Equality test
        }
    
      case '<':
        {
          int look = 0;
          look = (*input).get();
          if (look != '=') {
    	if (look != EOF) (*input).putback(look);
    	return (*this).currentToken = Token(TokenType(ch));  // Less than
          } else
    	return (*this).currentToken = Token(LE);  // Less than or equal to
        }
    
      case '>':
        {
          int look = 0;
          look = (*input).get();
          if (look != '=') {
    	if (look != EOF) (*input).putback(char(look));
    	return (*this).currentToken = Token(TokenType(ch));  // Greater than
          } else
    	return (*this).currentToken = Token(GE); // Greater than or equal to
        }
    
        // Numbers
      case '0':
      case '1':
      case '2':
      case '3':
      case '4':
      case '5':
      case '6':
      case '7':
      case '8':
      case '9':
      case '.':
        {
          (*input).putback(ch);
          double d;
          (*input) >> d;
          return (*this).currentToken = Token(NUMBER, d);
        }
      
      // Keywords and names
      default:
        {
          // Names can begin with letters or underscore
          if (isalpha(ch) || ch == '_') {
    	string s;
    	s.push_back(char(ch));
    	
    	// And can contain letters, numbers, and underscores
    	while ( (ch=(*input).get()) && (isalnum(char(ch)) || ch == '_') )
    	  s.push_back(char(ch));
    
    	(*input).putback(char(ch));  // Read one too many characters
    	
    	// Check for keywords
    	if (s.compare("if") == 0) return (*this).currentToken = Token(IF);
    	if (s.compare("else") == 0) return (*this).currentToken = Token(ELSE);
    	if (s.compare("endif") == 0) return (*this).currentToken = Token(ENDIF);
    	
    	// Not a keyword, must be a name
    	return (*this).currentToken = Token(NAME,s);
          }
    
          // Not good if we reach this point
          throw LexerError("bad token");
        }
      }	
    }
    And for completeness I guess, Token.h
    Code:
    #ifndef _TOKEN_H_
    #define _TOKEN_H_
    
    enum TokenType {
      END,       NEWLINE='\n', SEMI=';', DIV='/',
      MULT='*',  POW='^',      MOD='%',  PLUS='+',
      MINUS='-', OR='|',       AND='&',  LP='(',
      RP=')',    ASSIGN='=',   NOT='!',  LT='<',
      GT='>',    EQ,           NEQ,      LE,
      GE,        IF,           ELSE,     ENDIF,
      NAME,      NUMBER 
    };
    
    struct Token {
    private:
      TokenType tokenType;
      std::string stringValue;
      double numberValue;
    
    public:
      Token()
        : tokenType(END), stringValue(""), numberValue(0) { }
    
      Token(const TokenType t) 
        : tokenType(t), stringValue(""), numberValue(0) { }
      
      Token(const TokenType t, const std::string& s)
        : tokenType(t), stringValue(s), numberValue(0) { }
      
      Token(const TokenType t, double d)
        : tokenType(t), stringValue(""), numberValue(d) { }
    
      TokenType getTokenType() const { return tokenType; }
      std::string& getStringValue() { return stringValue; }
      double getNumberValue() const { return numberValue; }
    };
    
    #endif // TOKEN_H
    My basis for this approach was the second response at:
    Initializing a static std::map<int, int> in C++ - Stack Overflow

    However, when I compile (gcc) I get the following message:
    Lexer.cpp:27: error: expected constructor, destructor, or type conversion before ‘=’ token
    Lexer.cpp:7: warning: ‘std::map<TokenType, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<TokenType>, std::allocator<std:air<const TokenType, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > createTokenStringMap()’ defined but not used
    How can I go about creating this static map? Or is that not even a good solution for what i am trying to accomplish?

    I have not posted the Parser or SymbolTable as I don't think they are relevant to the syntax / procedure here. I can do so if necessary.

    Jason
    Last edited by jason_m; 07-02-2010 at 08:42 PM. Reason: Remove comment full of lies from posted code

  2. #2
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    IIRC, static has a separate meaning for functions. It restricts visibility of the function name to the file (the translation unit) such that it wont be usable outside of the file.

    If you make a variable static, its memory is allocated in the same place globals are, but the variable still has local access. An example of appropriate use is so that you only initialize a variable once, even if, for example, the map falls out of scope; because, the same instance needs to be reused in a function.

    If you make class members static, they exist as part of their class, but, even if there is no object, you can use static methods as an interface to those.

    In other words, static is C++'s swiss army knife. I don't think you want to use it here.

    This is also the first time I've seen enums used to name character constants. Note that these constants will be stored according to their integer representation, which is a pretty arbitrary order. It looks like a job for macros to be honest.

    I'd say you need to rethink the design of this parser.
    Last edited by whiteflags; 07-02-2010 at 09:10 PM.

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    And I believe if you make a global variable static, that means it will not be visible outside the file (cannot be linked from another file using extern).

    Static has 4-5 totally different meanings in different contexts... not sure what they were thinking. Ran out of keywords?

  4. #4
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Oh they love to reuse keywords. It means they can break less code.
    Otherwise they have to do go poke around in others' sources and try compiling millions of lines of code to see how much code they broke with the new keyword.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  5. #5
    Registered User
    Join Date
    May 2008
    Posts
    87
    Quote Originally Posted by whiteflags View Post
    IIRC, static has a separate meaning for functions. It restricts visibility of the function name to the file (the translation unit) such that it wont be usable outside of the file.

    If you make a variable static, its memory is allocated in the same place globals are, but the variable still has local access. An example of appropriate use is so that you only initialize a variable once, even if, for example, the map falls out of scope; because, the same instance needs to be reused in a function.

    If you make class members static, they exist as part of their class, but, even if there is no object, you can use static methods as an interface to those.

    In other words, static is C++'s swiss army knife. I don't think you want to use it here.
    Such a map only needs to exist once, it does not need to be created for each instance of a Lexer, that is why I have chosen to make it a static member of class Lexer.

    static const std::string getTokenString(Token t) is also a static member of class Lexer, for the same reason. All it does is add the ability to put some logic around returning the string associated with a TokenType, using the static map.

    In Lexer's implementation file, Lexer.cpp, the function static std::map<TokenType, std::string> createTokenStringMap() is not a member of the Lexer. It is an implementation detail and I don't want this function to be accessible outside of this implementation file, therefore I made it static. My intention with this function was to initialize the map, but the compiler doesn't like this, which is what I am looking for help on.

    Quote Originally Posted by whiteflags View Post
    This is also the first time I've seen enums used to name character constants. Note that these constants will be stored according to their integer representation, which is a pretty arbitrary order. It looks like a job for macros to be honest
    Stroustrup uses this approach in The C++ Programming Language when he demonstrates building a parser and lexer.

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Try changing:
    Code:
    Lexer::tokenToString = createTokenStringMap();
    to:
    Code:
    std::map<TokenType, std::string> Lexer::tokenToString = createTokenStringMap();
    Defining createTokenStringMap as a static non-member function is fine, though you might consider defining it in an anonymous namespace instead. The effect will be more or less the same.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Though, if we consider OP's statements:

    > Such a map only needs to exist once,
    > it does not need to be created for each instance of a Lexer

    Then why is the map a member of Lexer at all? Does a lexer object need to know the implementation details of what amounts to a toString() method?
    Code:
    std::string Lexer::ToString (TokenType token)
    {
       static std::map<TokenType, std::string> m = createTokenToStringMap();
    
      std::map<TokenType, std::string>::iterator found = m.find(token);
      if (found != m.end()) {
          return found->second;
      }
      throw SyntaxError("Unknown token");
    }
    I think we are after something of this sort.
    Last edited by whiteflags; 07-03-2010 at 10:24 AM. Reason: Excuse my ignnorance/

  8. #8
    Registered User
    Join Date
    May 2008
    Posts
    87
    @laserlight: This compiles, so when initializing the static member variable, its type definition needs to be repeated? Is that what I should take from this?

    @whiteflags: You are absolutely right, class Lexer doesn't need to have the map as a member. Didn't even occur to me that I could have a member function in class Lexer that has a static variable. I have followed your suggestion.

  9. #9
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    @laserlight: This compiles, so when initializing the static member variable, its type definition needs to be repeated? Is that what I should take from this?
    Yeap.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 100
    Last Post: 06-21-2010, 02:22 AM
  2. how to define a static member in a template class
    By wanziforever in forum C++ Programming
    Replies: 3
    Last Post: 10-08-2009, 04:44 AM
  3. seg fault at vectornew
    By tytelizgal in forum C Programming
    Replies: 2
    Last Post: 10-25-2008, 01:22 PM
  4. [GLUT] Pointers to class methods
    By cboard_member in forum C++ Programming
    Replies: 13
    Last Post: 02-16-2006, 04:03 PM
  5. Static member (map) / constructor initialisation problem
    By drrngrvy in forum C++ Programming
    Replies: 9
    Last Post: 12-28-2005, 12:03 PM