Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer...

32
Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University

Transcript of Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer...

Page 1: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Introduction to Language Processing TechnologyNatawut Nupairoj, Ph.D.

Department of Computer EngineeringChulalongkorn University

Page 2: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Outline

Level of Programming Languages. Language Processors. Specification of Programming Languages.

Page 3: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

swap(int v[], int k)

{ int temp;

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

}

swap:

muli $2, $5, 4

add $2, $4, $2

lw $15, 0($2)

...

Assembler

C Compiler

Level of Programming Languages

000010001101101100110000

000010001101101100110000

000010001101101100110000

000010001101101100110000

...

•High level: C / Java / Pascal•Low level: Assembly / Bytecode•Machine Language

Page 4: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

High-Level Language Characteristics Expressions:

a = b + (c – d)/2; Data types:

Integer, character, boolean. Record, array.

Control structures: Selective. Iterative.

Page 5: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

High-Level Language Characteristics Declarations:

Identifier can be constant, variable, procedure, function, and type.

Abstraction: Object-oriented concept. Concern only what, not how.

Encapsulation: Object-oriented concept. Information hiding.

Page 6: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Language Processors

Program that manipulates programs express in some programming languages.

Example:Editor.Translator / Compiler. Interpreter.

Page 7: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Translator

Translate a “source” program into an “equivalent” “object” program.

Translatorsourceprogram

objectprogram

error messages

CC++FORTRANJavaVB

AssemblyCBytecodep-code

Page 8: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Ordinary program

Program P

Written with Language L

L

P

Java

Sort

x86

Sort

x86

Web Browser

Page 9: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

x86

Web Browser

Tombstone Diagrams

Machine

M

Machine M

x86

SPARCx86

SPARC

x86

Web Browser

Page 10: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Translator

L

S T

S is translatedto T

Translator is written with Language L

C

Java x86

x86

Java x86

C++

Java C

Page 11: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Compilation

x86

C x86

x86

x86

x86

Sort

C

Sort

x86

Sort

Page 12: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Cross Compilation

x86

C SPARC

x86

SPARC

SPARC

Sort

SPARC

Sort

C

Sort

Page 13: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

x86

Java C

x86

x86

C x86

x86

Two-stage compilation

C

Sort

Java

Sort

x86

Sort

Page 14: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

x86

C x86

x86

Compiling a compiler

C

Pascal x86

x86

Pascal x86

Page 15: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Interpreter

S

L

Interpret source S

x86Written in language L

Basic

x86

Basic

x86

SQL

SPARC

Basic

Sort

Page 16: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Abstract machine = hardware emulator interpreter for low-level language.

x86

C x86

x86

370

C

370

x86

x86

370

x86=

370

HW1

370

370

HW1

Page 17: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Java Portable environment: write-once-run-anywhere. Interpretive compiler.

M

Java JVM JVM

M

JVM = Bytecode

Page 18: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

x86

JVM

x86

SPARC

JVM

SPARC

JVM

Sort

JVM

Sort

x86

Java JVM

x86

JVM

Sort

Java

Sort

Page 19: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

BootstrappingCompiler L that is written on L language.

Full bootstrapStart from nothing.

Half bootstrapStart from other machine.

NNP

C NNP

Page 20: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Full Bootstrap

NNP

Csub

Csub NNP

NNP

Csub NNP

NNP

Csub

C NNP

NNP

C NNP

NNP

Csub NNP

NNP

Csub NNP

NNP

Csub NNP

Page 21: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

NNP

C

C NNP

NNP

C NNP

NNP

C NNP

Page 22: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

NNP

Csub

Csub NNP

NNP

Csub NNP

NNP

Csub

C NNP

NNP

C NNP

NNP

Csub NNP

NNP

C NNP

NNP

C

C NNP

Page 23: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Tombstone Diagrams

Half Bootstrap

x86

C x86

x86

C

C NNP

x86

C NNP

x86

C NNP

x86

C

C NNP

NNP

C NNP

x86

C X86

x86

Page 24: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Specification of Programming Language Specification

Syntax Define symbol and structure of the language. Grammar.

Contextual constraints Constraints beyond grammar. Rules of the language: scope rules, type rules, etc.

Semantics Meaning of program: its behaviors when run. How to translate a sentence S of the language L to a

machine code on M

Page 25: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Syntax

Context-free grammarTerminals.Non-terminals / Variables.Start symbol.Production rules.

Usually being expressed with BNF notation.

Page 26: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

BNF Notation

Backus-Naur Form. Given production rule:

N N

Can be written as:

N ::=

Page 27: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Example: Mini-Triangle Program

! This is a comment. It continues to the end-of-line.

let

const m ~ 7;

var n: Integer

in

begin

n:= 2 * m * m;

putint(n);

end

Terminalsbegin const do else end ifin let then var while; : := ~ ( )+ - * / < >= \

Page 28: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Mini-Triangle Syntax

Program ::= Command

Command ::= single-Command

| Command ; single-Command

single-Command ::= V-name := Expression

| Identifier ( Expression )

| if Expression then single-Command

else single-Command

| while Expression do single-Command

| let Declaration in single-Command

| begin Command end

Page 29: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Mini-Triangle Syntax

Expression ::= primary-Expression

| Expression Operator primary-Expression

primary-Expression ::= Integer-Literal

| V-name

| Operator primary-Expression

| ( Expression )

V-name ::= Identifier

Declaration ::= single-Declaration

| Declaration ; single-Declaration

single-Declaration ::= const Identifier ~ Expression

| var Identifier : Type-denoter

Page 30: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Mini-Triangle Syntax

Type-denoter ::= Identifier

Operator ::= + | - | * | / | < | > | = | \

Identifier ::= Letter | Identifier Letter

| Identifier Digit

Integer-Literal ::= Digit | Integer-Literal Digit

Comment ::= ! Graphic* eol

Letter ::= a | b | … |z

Digit ::= 0 | 1 | 2 | … | 9

Page 31: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Syntax Tree

Ordered tree with Internal nodes: non-terminals.Leaf nodes: terminals.N-tree of G is a syntax tree with N as the root.

Page 32: Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.

Mini-Triangle Syntax Tree

Expression ::= primary-Expression| Expression Operator primary-Expression

primary-Expression ::= Integer-Literal| V-name| Operator primary-Expression|( Expression )

V-name ::= Identifier…

Expression

Expression

Expression

primary-Expr.

V-name

Ident.

d

Op.

+

Int. Lit.

10

Op.

*

primary-Expr. primary-Expr.

V-name

Ident.

n