LIN3022 Natural Language Processing Lecture 5 Albert Gatt LIN3022 -- Natural Language Processing.
Language Processing Systems - u-aizu.ac.jp
Transcript of Language Processing Systems - u-aizu.ac.jp
![Page 1: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/1.jpg)
Language Processing Systems
Prof. Mohamed Hamada
Software Engineering Lab. The University of Aizu
Japan
![Page 2: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/2.jpg)
2
Review
![Page 3: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/3.jpg)
3
Compiler Architecture
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
![Page 4: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/4.jpg)
4
Compiler Architecture
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Front End Back End
![Page 5: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/5.jpg)
5
Front-end and Back-end
Target-1 Code Generator Target-2 Code Generator
Intermediate-code Optimizer
Language-1 Front End
Source program in Language-1
Language-2 Front End
Source program in Language-2
Non-optimized Intermediate Code
Optimized Intermediate Code
Target-1 machine code Target-2 machine code
![Page 6: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/6.jpg)
6
Front-end and Back-end
• Suppose you want to write compilers from C++ to 4 computer platforms:
C++
Java
FORTRAN
MIPS
SPARC
Pentium
PowerPC
We need to write 12 programs
![Page 7: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/7.jpg)
7
Front-end and Back-end
• But we can do it better
FE BE
IR
– IR: Intermediate Representation – FE: Front-End – BE: Back-End
C++
Java
FORTRAN
MIPS
SPARC
Pentium
PowerPC
BE
BE
BE
FE
FE
We need to write 7 programs only
![Page 8: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/8.jpg)
8
Scanner
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Front End Back End
Scanner (lexical
analysis)
How it works? Use Finite Automata to recognize tokens Use Regular expressions to define tokens
How to write it? Use Unix command LEX
![Page 9: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/9.jpg)
9
Parser
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Front End Back End
Parser (syntax
analysis)
How it works? Use Top-down (LL(k)) or Bottom-up (LR(k)) parsing to make the parse tree
How to write it? Use Unix command Yacc
![Page 10: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/10.jpg)
10
Top Down Parsing
Parsing
Bottom Up Parsing
Predictive Parsing Shift-reduce Parsing
LL(k) Parsing LR(k) Parsing
Left Recursion
Left Factoring
![Page 11: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/11.jpg)
11
Scanner and Parser
Scanner
Parser
symbol table
get next token
Source Program
get next char
next char next token
(Contains a record for each identifier)
1. Uses Regular Expressions to define tokens
2. Uses Finite Automata to recognize tokens
Uses Top-down parsing or Bottom-up parsing
To construct a Parse tree
![Page 12: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/12.jpg)
12
Semantics analysis
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Front End Back End
Abstract Syntax Tree
Scope
Symbol Table
Type Checker
Semantic Analysis
![Page 13: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/13.jpg)
13
Intermediate Code (IC) Generator
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Front End Back End
Three-address code
Directed Acyclic Graph (DAG)
Control Flow Graph (CFG)
IC generator
Stack based (postfix)
![Page 14: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/14.jpg)
14
Code Generator
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Code Generator
Source language
tokens Parse tree Intermediate
Language
Target language
Semantic Analysis
IC generator
AST
Error Handler
Symbol Table
OIL
Front End Back End
Data Dependency Graph
Instruction Selection
Register Allocation
Target Machine
Code Generator
Memory Management
![Page 15: Language Processing Systems - u-aizu.ac.jp](https://reader031.fdocuments.us/reader031/viewer/2022020623/61f1aabcff422815a92ace5f/html5/thumbnails/15.jpg)
15
Parser := id1 + id2 *
id3 60
position := initial + rate * 60
Scanner
id1 := id2 + id3 * 60
Semantic Analyzer
:= id1 + id2 *
id3 int-to-real
60
Intermediate Code Generator
temp1 := int-to-real (60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3
Code Optimizer
temp1 := id3 * 60.0 id1 := id2 + temp1
Code Generator
MOV id3, R2 MUL #60.0, R2 MOV id2, R1 ADD R2, R1 MOV R1, id1
Example