Initialize a static map

**jason_m** · 07-02-2010

I still haven't mastered static members and initialization. I'm working on a little calculator program, and I would like to have a std::map that can associate a TokenType with a std::string representation. Mainly I want this ability right now to be able to provide better error messages.

My program has three objects that work together to do the work, a Parser, Lexer, and SymbolTable. Inside the parser, I have a function that matches Tokens from the input, like a "(" for example as part of an if statement: "if ( expr ) if-block [else if-block] endif". That function looks like:

Code:

bool Parser::match(TokenType t, bool get) {
  if (get) lexer.getNextToken();
  
  if (lexer.getCurrentToken().getTokenType() == t)
    return true;

  throw SyntaxError("Token Expected");
}

"Token Expected" isn't a very descriptive message. My first thought when thinking about associating strings with TokenTypes was to use a map. Then after some deliberation, I decided that such a map belongs in the Lexer, since that is the only part of the program that should have any say about what character input results in which Tokens. So, without further introduction, here is my Lexer code, with the sections I am interested in bolded.

Lexer.h

Code:

#ifndef _LEXER_H_
#define _LEXER_H_

#include <iostream>
#include <map>
#include <string>
#include "Token.h"

class Lexer {
 public:
  Lexer(std::istream *in);
  Token getNextToken();
  Token getCurrentToken();
  int getLine();

  static const std::string getTokenString(Token t);

  class LexerError {
  public:
    const char *p;
    LexerError(const char* q) { p = q; }
  };

 private:
  Token currentToken;
  std::istream* input;
  int currentLine;
  static std::map<TokenType, std::string> tokenToString;
};

#endif //_LEXER_H_

Lexer.cpp

Code:

#include <iostream>
#include <cstdio>
#include "Lexer.h"

using namespace std;

static std::map<TokenType, std::string> createTokenStringMap() {
  std::map<TokenType, std::string> m;
  
  m[END]   = "EOF";   m[NEWLINE] = "new line";
  m[SEMI]  = ";";     m[DIV]     = "/";
  m[MULT]  = "*";     m[POW]     = "^";
  m[MOD]   = "%";     m[PLUS]    = "+";
  m[MINUS] = '-';     m[OR]      = "|";
  m[AND]   = "&";     m[LP]      = "(";
  m[RP]    = ")";     m[ASSIGN]  = "=";
  m[NOT]   = "!";     m[LT]      = "<";
  m[GT]    = ">";     m[EQ]      = "==";
  m[NEQ]   = "!=";    m[LE]      = "<=";
  m[GE]    = ">=";    m[IF]      = "if";
  m[ELSE]  = "else";  m[ENDIF]   = "endif";
  m[NAME]  = "name";  m[NUMBER]  = "number";

  return m;
}

Lexer::tokenToString = createTokenStringMap();

// Maybe provide more information in the case of a NAME or NUMBER?
const std::string Lexer::getTokenString(Token t) {
  return tokenToString[t.getTokenType()];
}

Lexer::Lexer(std::istream* in) 
  : input(in), currentLine(1) { }

// Could inline this in class definition
Token Lexer::getCurrentToken() {
  return (*this).currentToken;
}

// Could inline this one too
int Lexer::getLine() {
  return (*this).currentLine;
}

Token Lexer::getNextToken() {
  int ch = 0;  // int, rather than char so large enough to hold EOF
  
  do {  // Skip over whitespace
    ch = (*input).get();
    if (ch == EOF) return (*this).currentToken = Token(END);
  } while (ch != '\n' && isspace(ch));
  
  switch (ch) {
  case EOF:
    return (*this).currentToken = Token(END);

  case '\n':
    (*this).currentLine++;
    return (*this).currentToken = Token(NEWLINE);

  case ';':
    return (*this).currentToken = Token(SEMI);

  case '/':
    {
      int look = 0;  // Look ahead a character to detect comments
      look = (*input).get();
      if (look != '/') {
	if (look != EOF) (*input).putback(char(look));
	return (*this).currentToken = Token(TokenType(ch));  // Division
      }

      // Read over comment until the end of the line (or end of input)
      while ((look != EOF) && (look != '\n')) look=(*input).get();

      if (look == '\n')
	return (*this).currentToken = Token(NEWLINE);
      else
	return (*this).currentToken = Token(END);
    }

    // Simple operators
  case '^':
  case '*':
  case '%':
  case '+':
  case '-':
  case '|':
  case '&':
  case '(':
  case ')':
    return (*this).currentToken = Token(TokenType(ch));

  case '!':
    {
      int look = 0;
      look = (*input).get();
      if (look != '=') {
	if (look != EOF) (*input).putback(char(look));
	return (*this).currentToken = Token(TokenType(ch)); // Uniary NOT
      } else
	return (*this).currentToken = Token(NEQ); // Not equal to
    }

  case '=':
    {
      int look = 0;
      look = (*input).get();
      if (look != '=') {
	if (look != EOF) (*input).putback(char(look));
	return (*this).currentToken = Token(TokenType(ch)); // Assignment
      } else
	return (*this).currentToken = Token(EQ);  // Equality test
    }

  case '<':
    {
      int look = 0;
      look = (*input).get();
      if (look != '=') {
	if (look != EOF) (*input).putback(look);
	return (*this).currentToken = Token(TokenType(ch));  // Less than
      } else
	return (*this).currentToken = Token(LE);  // Less than or equal to
    }

  case '>':
    {
      int look = 0;
      look = (*input).get();
      if (look != '=') {
	if (look != EOF) (*input).putback(char(look));
	return (*this).currentToken = Token(TokenType(ch));  // Greater than
      } else
	return (*this).currentToken = Token(GE); // Greater than or equal to
    }

    // Numbers
  case '0':
  case '1':
  case '2':
  case '3':
  case '4':
  case '5':
  case '6':
  case '7':
  case '8':
  case '9':
  case '.':
    {
      (*input).putback(ch);
      double d;
      (*input) >> d;
      return (*this).currentToken = Token(NUMBER, d);
    }
  
  // Keywords and names
  default:
    {
      // Names can begin with letters or underscore
      if (isalpha(ch) || ch == '_') {
	string s;
	s.push_back(char(ch));
	
	// And can contain letters, numbers, and underscores
	while ( (ch=(*input).get()) && (isalnum(char(ch)) || ch == '_') )
	  s.push_back(char(ch));

	(*input).putback(char(ch));  // Read one too many characters
	
	// Check for keywords
	if (s.compare("if") == 0) return (*this).currentToken = Token(IF);
	if (s.compare("else") == 0) return (*this).currentToken = Token(ELSE);
	if (s.compare("endif") == 0) return (*this).currentToken = Token(ENDIF);
	
	// Not a keyword, must be a name
	return (*this).currentToken = Token(NAME,s);
      }

      // Not good if we reach this point
      throw LexerError("bad token");
    }
  }	
}

And for completeness I guess, Token.h

Code:

#ifndef _TOKEN_H_
#define _TOKEN_H_

enum TokenType {
  END,       NEWLINE='\n', SEMI=';', DIV='/',
  MULT='*',  POW='^',      MOD='%',  PLUS='+',
  MINUS='-', OR='|',       AND='&',  LP='(',
  RP=')',    ASSIGN='=',   NOT='!',  LT='<',
  GT='>',    EQ,           NEQ,      LE,
  GE,        IF,           ELSE,     ENDIF,
  NAME,      NUMBER 
};

struct Token {
private:
  TokenType tokenType;
  std::string stringValue;
  double numberValue;

public:
  Token()
    : tokenType(END), stringValue(""), numberValue(0) { }

  Token(const TokenType t) 
    : tokenType(t), stringValue(""), numberValue(0) { }
  
  Token(const TokenType t, const std::string& s)
    : tokenType(t), stringValue(s), numberValue(0) { }
  
  Token(const TokenType t, double d)
    : tokenType(t), stringValue(""), numberValue(d) { }

  TokenType getTokenType() const { return tokenType; }
  std::string& getStringValue() { return stringValue; }
  double getNumberValue() const { return numberValue; }
};

#endif // TOKEN_H

My basis for this approach was the second response at:
Initializing a static std::map<int, int> in C++ - Stack Overflow

However, when I compile (gcc) I get the following message:

Lexer.cpp:27: error: expected constructor, destructor, or type conversion before ‘=’ token
Lexer.cpp:7: warning: ‘std::map<TokenType, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<TokenType>, std::allocator<std:

air<const TokenType, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > createTokenStringMap()’ defined but not used

How can I go about creating this static map? Or is that not even a good solution for what i am trying to accomplish?

I have not posted the Parser or SymbolTable as I don't think they are relevant to the syntax / procedure here. I can do so if necessary.

Jason

**whiteflags** · 07-02-2010

IIRC, static has a separate meaning for functions. It restricts visibility of the function name to the file (the translation unit) such that it wont be usable outside of the file.

If you make a variable static, its memory is allocated in the same place globals are, but the variable still has local access. An example of appropriate use is so that you only initialize a variable once, even if, for example, the map falls out of scope; because, the same instance needs to be reused in a function.

If you make class members static, they exist as part of their class, but, even if there is no object, you can use static methods as an interface to those.

In other words, static is C++'s swiss army knife. I don't think you want to use it here.

This is also the first time I've seen enums used to name character constants. Note that these constants will be stored according to their integer representation, which is a pretty arbitrary order. It looks like a job for macros to be honest.

I'd say you need to rethink the design of this parser.

**cyberfish** · 07-03-2010

And I believe if you make a global variable static, that means it will not be visible outside the file (cannot be linked from another file using extern).

Static has 4-5 totally different meanings in different contexts... not sure what they were thinking. Ran out of keywords?

**Elysia** · 07-03-2010

Oh they love to reuse keywords. It means they can break less code.
Otherwise they have to do go poke around in others' sources and try compiling millions of lines of code to see how much code they broke with the new keyword.

**jason_m** · 07-03-2010

Originally Posted by whiteflags

IIRC, static has a separate meaning for functions. It restricts visibility of the function name to the file (the translation unit) such that it wont be usable outside of the file.

If you make a variable static, its memory is allocated in the same place globals are, but the variable still has local access. An example of appropriate use is so that you only initialize a variable once, even if, for example, the map falls out of scope; because, the same instance needs to be reused in a function.

If you make class members static, they exist as part of their class, but, even if there is no object, you can use static methods as an interface to those.

In other words, static is C++'s swiss army knife. I don't think you want to use it here.

Such a map only needs to exist once, it does not need to be created for each instance of a Lexer, that is why I have chosen to make it a static member of class Lexer.

static const std::string getTokenString(Token t) is also a static member of class Lexer, for the same reason. All it does is add the ability to put some logic around returning the string associated with a TokenType, using the static map.

In Lexer's implementation file, Lexer.cpp, the function static std::map<TokenType, std::string> createTokenStringMap() is not a member of the Lexer. It is an implementation detail and I don't want this function to be accessible outside of this implementation file, therefore I made it static. My intention with this function was to initialize the map, but the compiler doesn't like this, which is what I am looking for help on.

Originally Posted by whiteflags

This is also the first time I've seen enums used to name character constants. Note that these constants will be stored according to their integer representation, which is a pretty arbitrary order. It looks like a job for macros to be honest

Stroustrup uses this approach in The C++ Programming Language when he demonstrates building a parser and lexer.

**laserlight** · 07-03-2010

Try changing:

Code:

Lexer::tokenToString = createTokenStringMap();

to:

Code:

std::map<TokenType, std::string> Lexer::tokenToString = createTokenStringMap();

Defining createTokenStringMap as a static non-member function is fine, though you might consider defining it in an anonymous namespace instead. The effect will be more or less the same.

**whiteflags** · 07-03-2010

Though, if we consider OP's statements:

> Such a map only needs to exist once,
> it does not need to be created for each instance of a Lexer

Then why is the map a member of Lexer at all? Does a lexer object need to know the implementation details of what amounts to a toString() method?

Code:

std::string Lexer::ToString (TokenType token)
{
   static std::map<TokenType, std::string> m = createTokenToStringMap();

  std::map<TokenType, std::string>::iterator found = m.find(token);
  if (found != m.end()) {
      return found->second;
  }
  throw SyntaxError("Unknown token");
}

I think we are after something of this sort.

**jason_m** · 07-03-2010

@laserlight: This compiles, so when initializing the static member variable, its type definition needs to be repeated? Is that what I should take from this?

@whiteflags: You are absolutely right, class Lexer doesn't need to have the map as a member. Didn't even occur to me that I could have a member function in class Lexer that has a static variable. I have followed your suggestion.

**cyberfish** · 07-03-2010

@laserlight: This compiles, so when initializing the static member variable, its type definition needs to be repeated? Is that what I should take from this?

Yeap.

Thread: Initialize a static map

Thread Tools

Search Thread

Display

Initialize a static map

Similar Threads

Is it possible to swap the key values and the mapped values around in a map?

how to define a static member in a template class

seg fault at vectornew

[GLUT] Pointers to class methods

Static member (map) / constructor initialisation problem