Java (Object Oriented)


RDF (Horn Clause Deduction, Semantic Web)


Jython in Java

This Course

High Level Languages

Lexical and Syntactic Analysis

•  Chomsky Grammar Hierarchy

•  Lexical Analysis – Tokenizing

•  Syntactic Analysis – Parsing

•  Hmm Concrete Syntax

•  Hmm Abstract Syntax

Programming Languages

Noam Chomsky

•  Regular grammar – used for tokenizing •  Context-free grammar (BNF) – used for parsing •  Context-sensitive grammar – not really used for

programming languages

Chomsky Hierarchy

•  Simplest; least powerful •  Equivalent to:

–  Regular expression (think of perl) –  Finite-state automaton

•  Right regular grammar: ω ∈ Terminal*, A and B ∈ Nonterminal A → ω B A → ω

•  Example: Integer → 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9

Regular Grammar

•  Less powerful than context-free grammars •  The following is not a regular language

{ aⁿ bⁿ | n ≥ 1 } i.e., cannot balance: ( ), { }, begin end

Regular Grammar

Regular Expressions

x a character x \x an escaped character, e.g., \n { name } a reference to a name M | N M or N M N M followed by N M* zero or more occurrences of M M+ One or more occurrences of M M? Zero or one occurrence of M [aeiou] the set of vowels [0-9] the set of digits . any single character

Regular Expressions

Regular Expressions

(S, a2i$) ├ (I, 2i$) ├ (I, i$) ├ (I, $)

├ (F, )

Thus: (S, a2i$) ├* (F, )

Finite State Automaton for Identifiers

Deterministic Finite State Automaton Examples

Production: α → β α ∈ Nonterminal

β ∈ (Nonterminal ∪ Terminal)*

ie, lefthand side is a single nonterminal, and righthand side is a string of nonterminals and/or terminals (possibly empty).

Context-Free Grammar

Production: α → β |α| ≤ |β| α, β ∈ (Nonterminal ∪ Terminal)*

ie, lefthand side can be composed of strings of terminals and nonterminals, however, the number of items on the left must be smaller than the number of items on the right.

Context-Sensitive Grammar

•  The syntax of a programming language is a precise description of all its grammatically correct programs.

•  Precise syntax was first used with Algol 60, and has been used ever since.

•  Three levels: –  Lexical syntax - all the basic symbols of the language

(names, values, operators, etc.) –  Concrete syntax - rules for writing expressions,

statements and programs. –  Abstract syntax - internal representation of the program,

favoring content over form.


Grammars Grammars: Metalanguages used to define the concrete syntax of a language. Backus Normal Form – Backus Naur Form (BNF) •  Stylized version of a context-free grammar (cf. Chomsky hierarchy) •  First used to define syntax of Algol 60 •  Now used to define syntax of most major languages

Production: α → β α ∈ Nonterminal β ∈ (Nonterminal ∪ Terminal)* ie, lefthand side is a single nonterminal, and β is a string of nonterminals and/or terminals (possibly empty).

•  Example Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Extended BNF (EBNF)

Additional metacharacters { } a series of zero or more ( ) must pick one from a list [ ] pick none or one from a list


Expression -> Term { ( + | - ) Term } IfStatement -> if ( Expression ) Statement [ else Statement ]

EBNF is no more powerful than BNF, but its production rules are often simpler and clearer. Javacc EBNF

( … )* a series of zero or more ( … )+ a series of one or more [ … ] optional

Internal Parse Tree

Abstract Syntax

int main ()


return 0 ;


Program (abstract syntax): Function = main; Return type = int params = Block: Return: Variable: return#main, LOCAL addr=0 IntValue: 0

Instance of a Programming Language:

Now we’ll focus on the internal parse tree

Parse Trees

Integer → Digit | Integer Digit Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Parse Tree for 352 as an Integer

Arithmetic Expression Grammar

Expr → Expr + Term | Expr – Term | Term Term → 0 | ... | 9 | ( Expr )

Parse of 5 - 4 + 3

•  A grammar can be used to define associativity and precedence among the operators in an expression. E.g., + and - are left-associative operators in mathematics;

* and / have higher precedence than + and - .

•  Consider the following grammar: Expr -> Expr + Term | Expr – Term | Term Term -> Term * Factor | Term / Factor | Term % Factor | Factor Factor -> Primary ** Factor | Primary Primary -> 0 | ... | 9 | ( Expr )

Associativity and Precedence

Associativity and Precedence

Parse of 4**2**3 + 5 * 6 + 7

Precedence Associativity Operators 3 right ** 2 left * / % 1 left + -

Note: These relationships are shown by the structure of the parse tree: highest precedence at the bottom, and left-associativity on the left at each level.

Associativity and Precedence

•  A grammar is ambiguous if one of its strings has two or more diffferent parse trees.

•  Example: Expr -> Expr Op Expr | ( Expr ) | Integer Op -> + | - | * | / | % | **

•  Equivalent to previous grammar but ambiguous

Ambiguous Grammars

Ambiguous Parse of 5 – 4 + 3

Ambiguous Grammars

Dangling Else Ambiguous Grammars

IfStatement -> if ( Expression ) Statement | if ( Expression ) Statement else Statement

Statement -> Assignment | IfStatement | Block Block -> { Statements } Statements -> Statements Statement | Statement With which ‘if’ does the following ‘else’ associate

if (x < 0) if (y < 0) y = y - 1; else y = 0;

Dangling Else Ambiguous Grammars

Program : {[ Declaration ]|retType Identifier Function | MyClass | MyObject} Function : ( ) Block MyClass: Class Idenitifier { {retType Identifier Function}Constructor {retType Identifier

Function } } MyObject: Identifier Identifier = create Identifier callArgs Constructor: Identifier ([{ Parameter } ]) block Declaration : Type Identifier [ [Literal] ]{ , Identifier [ [ Literal ] ] } Type : int|bool| float | list |tuple| object | string | void Statements : { Statement } Statement : ; | Declaration| Block |ForEach| Assignment |IfStatement|WhileStatement|CallStatement|

ReturnStatement Block : { Statements } ForEach: for( Expression <- Expression ) Block Assignment : Identifier [ [ Expression ] ]= Expression ; Parameter : Type Identifier IfStatement: if ( Expression ) Block [elseifStatement| Block ] WhileStatement: while ( Expression ) Block

Hmm BNF (i.e., Concrete Syntax)

Expression : Conjunction {|| Conjunction } Conjunction : Equality {&&Equality } Equality : Relation [EquOp Relation ] EquOp: == | != Relation : Addition [RelOp Addition ] RelOp: <|<= |>|>= Addition : Term {AddOp Term } AddOp: + | - Term : Factor {MulOp Factor } MulOp: * | / | % Factor : [UnaryOp]Primary UnaryOp: - | ! Primary : callOrLambda|IdentifierOrArrayRef| Literal |subExpressionOrTuple|ListOrListComprehension| ObjFunction callOrLambda : Identifier callArgs|LambdaDef callArgs : ([Expression |passFunc { ,Expression |passFunc}] ) passFunc : Identifier (Type Identifier { Type Identifier } ) LambdaDef : (\\ Identifier { ,Identifier } -> Expression)

Hmm BNF (i.e., Concrete Syntax)

Hmm BNF (i.e., Concrete Syntax)

IdentifierOrArrayRef : Identifier [ [Expression] ] subExpressionOrTuple : ([ Expression [,[ Expression { , Expression } ] ] ] ) ListOrListComprehension: [ Expression {, Expression } ] | | Expression[<- Expression ] {, Expression[<- Expression ] } ] ObjFunction: Identifier . Identifier . Identifier callArgs Identifier : (a |b|…|z| A | B |…| Z){ (a |b|…|z| A | B |…| Z )|(0 | 1 |…| 9)} Literal : Integer | True | False | ClFloat | ClString Integer : Digit { Digit } ClFloat: 0 | 1 |…| 9 {0 | 1 |…| 9}.{0 | 1 |…| 9} ClString: ” {~[“] }”

Clite Operator Associativity Unary - ! none * / left + - left < <= > >= none == != none && left || left

Associativity and Precedence for Hmm

Hmm Parse Tree Example

z = x + 2 * y;

Now we’ll focus on the Abstract Syntax

Hmm Parse Tree

z = x + 2 * y;


Very Approximate Hmm Abstract Syntax

Assignment = Variable target; Expression source Expression = VariableRef | Value | Binary | Unary VariableRef = Variable | ArrayRef Variable = String id ArrayRef = String id; Expression index Value = IntValue | BoolValue | FloatValue | CharValue Binary = Operator op; Expression term1, term2 Unary = UnaryOp op; Expression term Operator = ArithmeticOp | RelationalOp | BooleanOp IntValue = Integer intValue …

Very Approximate Hmm Abstract Syntax

Binary Operator



Variable Value


2 y *


Hmm Abstract Syntax – Binary Example z = x + 2 * y
