Thread: Scanner? Lexical analyzer? Tokenizer?

  1. #1
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489

    Scanner? Lexical analyzer? Tokenizer?

    Scanner?
    Lexical analyzer (Lexer)?
    Tokenizer?

    Their job is just to grouping characters into a token based on a regular expression?

    Are they the same thing?

    So many book I've read and make me more confusing?

    WTH.
    Just GET it OFF out my mind!!

  2. #2
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    i don't know much about this stuff.
    But I think they are different phases of a compiler.
    I could be wrong.
    Edit:
    According to wikipedia
    In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical analysis are called lexical analyzers or lexers. A lexer is often organized as separate scanner and tokenizer functions, though the boundaries may not be clearly defined.
    http://en.wikipedia.org/wiki/Lexical_analysis
    Last edited by stevesmithx; 12-23-2008 at 12:12 PM.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by audinue
    Their job is just to grouping characters into a token based on a regular expression?
    It might not be actually based on a regular expression, but I think that is the general idea.

    Quote Originally Posted by audinue
    Are they the same thing?
    I would say yes, though I suppose it depends on context since "scanner" is a rather generic word.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    And then you need a semantic analyzer as well, to actually get what the code actually means.

    But yes, a lexical analyzer and tokenizer is essentially the same thing - also depending on who you talk to.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Oh, and Laserlight's point about "not a regular expression" is quite clear if you consider that sometimes a = would terminate the expression, in other cases it won't ( += or == for example).

    Some things ALWAYS terminate a token, at other times, it depends on the context. So a lexical analyzer (for C or similar language) is more complex than a simple regular expression termination. Although with a reasonable set of regexp's, you may be able to parse all of C.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    So,

    Code:
    lexer == scanner == tokenizer
    And... a...

    A lexer is often organized as separate scanner and tokenizer functions
    Scanner : Scan, its job to scan thing.
    Tokenizer : Tokenize, its job to tokenize thing.

    While a Tokenizer need a Scanner, without it, what thing to be tokenized?

    Code:
    Scanner
       |
    Tokenizer
    And a Tokenizer is always a Scanner?

    So, it means a Tokenizer == Lexer and Scanner != Lexer, but it's part of Lexer?

    I think scanner and tokenizer are part of lexer...
    Just GET it OFF out my mind!!

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    What does it mean to scan and what does it mean to tokenize?

    Along the same lines, what is a lexeme and what is a token?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    So, it means a Tokenizer == Lexer and Scanner != Lexer, but it's part of Lexer
    I think you got it correct.
    That's what the dragon book for compiler says.
    The Lexical analyser has a scanner which scans the source program and produces tokens as output which are later parsed by a parser to get a parse tree.
    So a scanner is a functionality of Lexer which performs the tokenizing operation.
    However as the wikipedia article suggests the boundaries of what is what may vary depending on the context.
    This is also mentioned in the dragon book.

    PS:if you wonder what the dragon book is, it is a book written by Alfred Aho et.al titled
    Compilers: Principles, Techniques, and Tools which is considered as an authentic source for compiler by many.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  9. #9
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    PS:if you wonder what the dragon book is, it is a book written by Alfred Aho et.al titled
    Compilers: Principles, Techniques, and Tools which is considered as an authentic source for compiler by many.
    I read the second edition three times LOL.

    Compilers: Design and Principles.
    Just GET it OFF out my mind!!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Lexical analyzer for C
    By nishkarsh in forum C Programming
    Replies: 4
    Last Post: 08-26-2008, 08:05 AM
  2. My TCP Port Scanner in C
    By billy786 in forum Networking/Device Communication
    Replies: 5
    Last Post: 06-28-2008, 07:12 PM
  3. OpenScript2.0 Spectrum Analyzer
    By jverkoey in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 03-23-2004, 10:01 PM
  4. My first Scanner (real Open Source)
    By Lynux-Penguin in forum Linux Programming
    Replies: 0
    Last Post: 04-30-2002, 12:12 AM
  5. Scanner
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 12-24-2001, 12:35 AM