Scanner?
Lexical analyzer (Lexer)?
Tokenizer?
Their job is just to grouping characters into a token based on a regular expression?
Are they the same thing?
So many book I've read and make me more confusing?
WTH.
Printable View
Scanner?
Lexical analyzer (Lexer)?
Tokenizer?
Their job is just to grouping characters into a token based on a regular expression?
Are they the same thing?
So many book I've read and make me more confusing?
WTH.
i don't know much about this stuff.
But I think they are different phases of a compiler.
I could be wrong.
Edit:
According to wikipedia
http://en.wikipedia.org/wiki/Lexical_analysisQuote:
In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers. A lexer is often organized as separate scanner and tokenizer functions, though the boundaries may not be clearly defined.
It might not be actually based on a regular expression, but I think that is the general idea.Quote:
Originally Posted by audinue
I would say yes, though I suppose it depends on context since "scanner" is a rather generic word.Quote:
Originally Posted by audinue
And then you need a semantic analyzer as well, to actually get what the code actually means.
But yes, a lexical analyzer and tokenizer is essentially the same thing - also depending on who you talk to.
--
Mats
Oh, and Laserlight's point about "not a regular expression" is quite clear if you consider that sometimes a = would terminate the expression, in other cases it won't ( += or == for example).
Some things ALWAYS terminate a token, at other times, it depends on the context. So a lexical analyzer (for C or similar language) is more complex than a simple regular expression termination. Although with a reasonable set of regexp's, you may be able to parse all of C.
--
Mats
So,
And... a...Code:lexer == scanner == tokenizer
Scanner : Scan, its job to scan thing.Quote:
A lexer is often organized as separate scanner and tokenizer functions
Tokenizer : Tokenize, its job to tokenize thing.
While a Tokenizer need a Scanner, without it, what thing to be tokenized?
And a Tokenizer is always a Scanner?Code:Scanner
|
Tokenizer
So, it means a Tokenizer == Lexer and Scanner != Lexer, but it's part of Lexer?
I think scanner and tokenizer are part of lexer...
What does it mean to scan and what does it mean to tokenize?
Along the same lines, what is a lexeme and what is a token?
I think you got it correct.Quote:
So, it means a Tokenizer == Lexer and Scanner != Lexer, but it's part of Lexer
That's what the dragon book for compiler says.
The Lexical analyser has a scanner which scans the source program and produces tokens as output which are later parsed by a parser to get a parse tree.
So a scanner is a functionality of Lexer which performs the tokenizing operation.
However as the wikipedia article suggests the boundaries of what is what may vary depending on the context.
This is also mentioned in the dragon book.
PS:if you wonder what the dragon book is, it is a book written by Alfred Aho et.al titled
Compilers: Principles, Techniques, and Tools which is considered as an authentic source for compiler by many.
I read the second edition three times LOL.Quote:
PS:if you wonder what the dragon book is, it is a book written by Alfred Aho et.al titled
Compilers: Principles, Techniques, and Tools which is considered as an authentic source for compiler by many.
Compilers: Design and Principles.