The process of compilation consists of two parts - analysis and synthesis. Analysis stage is where the source program is analysed for syntax and semantics. Errors are reported back and intermediate program is generated. Symbol table is a data structure created from the source program and passed down to the synthesis stage. In the synthesis stage, the symbol table is referred and a target program is generated for the assemblers. Phases of compilation are more detailed and individual parts of theses stages. Now we'll briefly discuss these individual phases :
Lexical Analysis - Scanning
A lexical analyzer takes as input the entire source program, by reading it as a stream of characters. It then groups the character into meaningful sequences called lexemes, and the analyzer then proceeds to generate an output of tokens. Now, tokens are represented as a combination of token names (abstract symbols) and attribute values, together which constitute entries in the symbol table.
Syntax Analysis - Parsing
The syntax analyzer, or parser generates an intermediate tree-like structure from the tokens it receives as input. The tree, known as syntax tree, is a grammatical representation of the token stream. Operations form the interior nodes and arguments form the leaf nodes.
Semantic Analysis
The semantic analyzer takes the syntax tree and symbol table as input and validates the source program for semantic consistency with the languages definition. Semantic analyzer also performs an important task - type checking. Type checking involves matching the type of operands and operators and verifying that the program follows the type definition of the language. Type conversions, called coercions are also done.
Intermediate Code Generation
After the syntax and semantic analysis phase, the source program exists as an intermediate representation. This intermediate representation should be easier to produce and easier to translate further. A popular representation of this code is the three address code - where the code is structured as a sequence of assembly-like instructions each with three operands.
Code Optimization
In this phase, the intermediate code generated is optimized for speed, memory, efficiency, for a better target code.
Code Generation