Reference Book:
Compilers: Principles, Techniques and Tools
Roles of a lexical analyzer include -
- Reads input character stream
- Group them into lexeme
- Generate tokens for output to parser
It uses the symbol table for two reasons -
- if it thinks a lexeme constitutes an identifier, it stores that lexeme in that symbol table.
- to get the type of identifier for a particular lexeme so that it can generate more relevant token for the parser.
Interaction between Lexical Analyzer and Parser
Interaction is actually implemented by parser when it calls getnexttoken, so that the lexical analyzer processes its input stream and identify next lexeme to generate the next token for parser.
Lexical Analyzers also have a role in removing whitespace (newline, blanks, tabs), comments etc. They also associate error messages with corresponding lines (based on the newline characters or other delimiters) in source program.
Lexical Analyzer can be thought of as a combination of -
- Scanning - no tokenization, only scanning - removing comments etc.
- Lexical Analysis - scanner produces sequence of tokens as output
Why Lexical Analysis and Parsing are required to be separate phases
- Simplicity of design
- Improved compiler efficiency
- Compiler portability is enhanced.