Syntax and Semantics - Methods
-
Upload
giljohnferolino -
Category
Documents
-
view
20 -
download
5
description
Transcript of Syntax and Semantics - Methods
Syntax and Semantics
Different Methods to describe syntax and semantics
Syntax
• Lexical Structure of Programming Languages
• Context-Free Grammars and BNFs
• Parse Trees and Abstract Syntax Trees
• Ambiguity, Associativity, and Precedence
• EBNFs and Syntax Diagrams
• Parsing Techniques and Tools
• Lexics versus Syntax versus Semantics
Lexical Structure of Programming Languages
• Tokens – are words which comprise a programming language
• Lexical structure - structure of words/tokens
• Scanning phase – collects sequences of characters from the input program into tokens
• Parsing phase – processes the tokens syntactic structure
Lexical Structure of Programming Languages
• Categories of Tokens:
• Reserved words or keywords
• Literals or constants
• Special symbols
• Identifiers
Lexical Structure of Programming Languages
• The format of a program can affect the way tokens are recognized.
• Certain tokens are separated by token delimiters or white space
• Indentation can also be used
• Free-format language – one in which format has no effect on the program structure
• Fixed format – all tokens must occur in pre-specified locations on the page
Lexical Structure of Programming Languages
• Tokens in programming language are often described in English, but they can also be described formally by regular expressions (descriptions of patterns of characters).
• 3 basic operations: - concatenation (sequencing the items without an explicit operation)
- repetition (‘*’)
- choice or selection (‘|’)
• Parentheses are also often included to allow for the grouping of operations
• Square brackets with a hyphen indicate a range of characters
• ‘+’ indicates one or more repetitions
• ‘?’ indicates an optional item
• ‘.’ indicates any character
Context-Free Grammars and BNFs
• Context-Free Grammars – consists of a series of grammar rules:
• Rules consist of a left-hand side that is a single structure name
• Then the metasymbol “”
• Followed by the right-hand side consisting of a sequence of items that can be symbols or other structure names
• Nonterminals – names of structures, broken down into further structures
• Terminals – words or token symbols, never broken down
• Grammar rules = production – they “produce” the strings of the language using derivations
Context-Free Grammars and BNFs
(1)sentence noun-phrase verb-phrase .
(2)noun-phrase article noun
(3)article a | the
(4)noun girl | dog
(5)verb-phrase verb noun-phrase
(6)verb sees | pets
Context-Free Grammars and BNFs
• “” – “consists of” or “is the same as”; metasymbol which separates the left-hand side from the right-hand side of a rule
• The italics serve to distinguish the names of the structures from the actual words or tokens that may appear in the language
• “|” – also a metasymbol; “or”
• Other metasymbols: “::=“, angle brackets, double quotes
Context-Free Grammars and BNFs
• BNF notation – ISO standard format for notation conventional conventions in describing syntax of programming languages
• Start symbol (left-hand side) and derivation (right-hand side, foregoing rules)
Context-Free Grammars and BNFs
(1)sentence noun-phrase verb-phrase .
(2)noun-phrase article noun
(3)article a | the
(4)noun girl | dog
(5)verb-phrase verb noun-phrase
(6)verb sees | pets
Parse Trees and [Abstract] Syntax Trees
• Parse tree - describes graphically the replacement process in a derivation.
• “the girl sees a dog”
• “234”
• “3 + 4 * 5”
Parse Trees and Abstract Syntax Trees
• A parse tree is labelled by nonterminals at interior nodes and terminals at leaves.
• The structure of the parse tree is completely specified by the grammar rules of the language and a derivation of particular sequence of terminals
• Abstract syntax trees – do away with terminals that are redundant once the structure of the tree is determined.
Ambiguity, Associativity, and Precedence
• Two different derivations can lead to the same parse tree or syntax tree
• Different derivations can also lead to difference parse trees
• Ambiguity – present difficulties since no clear structure; addressed/prevented by special derivations
• Leftmost derivation – where the leftmost remaining nonterminal is singled out for replacement at each step.
• Disambiguity rule/ precedence
• Right- or left- associative
EBNFs and Syntax Diagrams
• Extended Backus-Naur Form – special notation which expresses more clearly the repetitive nature of their structures
• “{ }” – stand for zero or more repetitions
• “[ ]” – indicate optional parts of the structure
EBNFs and Syntax Diagrams
• Syntax diagrams – which indicates the sequence of terminals and nonterminals encountered in the right-hand side of the rule
• Use circles or ovals for terminals and squares or rectangles for nonterminals, connecting them with lines and arrows to indicate appropriate sequencing
Parsing Techniques and Tools
• Grammar explicitly describes the strings of tokens that are syntactically legal in a programming language
• Grammar implicitly describes the actions that a parser must take to parse a string of tokens correctly
• Recognizer - simplest form of parser; program that accepts or rejects strings, based on whether they are legal strings in the language; build parse trees
Parsing Techniques and Tools
• Bottom-up parsers – when a match occurs, the right-hand side is replaced by or reduced to the nonterminal on the left; construct derivations and parse trees from the leaves to the root; shift-reduce parsers
• Top-down parsers – nonterminals are expanded to match incoming tokens and directly construct a derivation
• Recursive-descent parsers – operates by turning the nonterminals into a group of mutually recursive procedures whose actions are based on the right-hand sides
Lexics versus Syntax versus Semantics
Next Topic: Semantics
• Axiomatic Semantics
• Denotational Semantics
• Translation Semantics
• Algebraic Semantics
• Operational Semantics