PDA

View Full Version : Scanner? Lexical analyzer? Tokenizer?



audinue
12-23-2008, 12:03 PM
Scanner?
Lexical analyzer (Lexer)?
Tokenizer?

Their job is just to grouping characters into a token based on a regular expression?

Are they the same thing?

So many book I've read and make me more confusing?

WTH.

stevesmithx
12-23-2008, 12:06 PM
i don't know much about this stuff.
But I think they are different phases of a compiler.
I could be wrong.
Edit:
According to wikipedia

In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers. A lexer is often organized as separate scanner and tokenizer functions, though the boundaries may not be clearly defined.
http://en.wikipedia.org/wiki/Lexical_analysis

laserlight
12-23-2008, 12:14 PM
Their job is just to grouping characters into a token based on a regular expression?
It might not be actually based on a regular expression, but I think that is the general idea.


Are they the same thing?
I would say yes, though I suppose it depends on context since "scanner" is a rather generic word.

matsp
12-23-2008, 12:15 PM
And then you need a semantic analyzer as well, to actually get what the code actually means.

But yes, a lexical analyzer and tokenizer is essentially the same thing - also depending on who you talk to.

--
Mats

matsp
12-23-2008, 12:18 PM
Oh, and Laserlight's point about "not a regular expression" is quite clear if you consider that sometimes a = would terminate the expression, in other cases it won't ( += or == for example).

Some things ALWAYS terminate a token, at other times, it depends on the context. So a lexical analyzer (for C or similar language) is more complex than a simple regular expression termination. Although with a reasonable set of regexp's, you may be able to parse all of C.

--
Mats

audinue
12-23-2008, 12:47 PM
So,


lexer == scanner == tokenizer

And... a...


A lexer is often organized as separate scanner and tokenizer functions

Scanner : Scan, its job to scan thing.
Tokenizer : Tokenize, its job to tokenize thing.

While a Tokenizer need a Scanner, without it, what thing to be tokenized?


Scanner
|
Tokenizer

And a Tokenizer is always a Scanner?

So, it means a Tokenizer == Lexer and Scanner != Lexer, but it's part of Lexer?

I think scanner and tokenizer are part of lexer...

laserlight
12-23-2008, 12:53 PM
What does it mean to scan and what does it mean to tokenize?

Along the same lines, what is a lexeme and what is a token?

stevesmithx
12-23-2008, 01:25 PM
So, it means a Tokenizer == Lexer and Scanner != Lexer, but it's part of Lexer
I think you got it correct.
That's what the dragon book for compiler says.
The Lexical analyser has a scanner which scans the source program and produces tokens as output which are later parsed by a parser to get a parse tree.
So a scanner is a functionality of Lexer which performs the tokenizing operation.
However as the wikipedia article suggests the boundaries of what is what may vary depending on the context.
This is also mentioned in the dragon book.

PS:if you wonder what the dragon book is, it is a book written by Alfred Aho et.al titled
Compilers: Principles, Techniques, and Tools which is considered as an authentic source for compiler by many.

audinue
12-23-2008, 11:32 PM
PS:if you wonder what the dragon book is, it is a book written by Alfred Aho et.al titled
Compilers: Principles, Techniques, and Tools which is considered as an authentic source for compiler by many.

I read the second edition three times LOL.

Compilers: Design and Principles.