Professor Jeanne Ferrante - Computer Science

28
Professor Jeanne Ferrante 1 CSE 105 Theory of Computation

Transcript of Professor Jeanne Ferrante - Computer Science

• Professor Jeanne Ferrante

1

http://www.jflap.org/jflaptmp/

CSE 105 Theory of

Computation

Today’s Agenda – Context-Free Grammars – Ambiguity

Reminders and announcements: – Exam 1 grades will be available…

• Very unlikely: Friday More Likely: Monday

– Reading Quiz 4: Due MONDAY Apr 25, 11:59 pm – HW 4: Due WEDNESDAY, Apr 27, 11:59 pm

• Discussion helpful for homework

– Office Hours next week will be updated • Solutions for exam problems

2

MODELING GRAMMATICAL UNDERSTANDING

Grammar intuition

3

Vote your first instinct!

Which is the correct order? A. a beautiful blue sailing boat B. a blue beautiful sailing boat C. a sailing beautiful blue boat D. a blue sailing beautiful boat

Credits: Quiz questions taken from: http://web2.uvcs.uvic.ca/elc/studyzone/410/grammar/adjord.htm 4

English Adjective Categories & Order

1. General opinion (good, brilliant, awful)

2. Specific opinion (intelligent, tasty, comfortable)

3. Size (small, large, tiny) 4. Shape (square, round, flat, thin )

5. Age (new, ancient)

6. Colour (blue, pink, reddish)

7. Nationality (French, eastern, Martian)

8. Material (steel, wooden, paper)

9. Purpose (sailing, carving)

Sources: https://learnenglish.britishcouncil.org/en/english-grammar/adjectives/order-adjectives http://web2.uvcs.uvic.ca/elc/studyzone/410/grammar/adjord.htm 5

Did you know you knew that? • Could you have written down those rules?

– Then how did you apply them? – Did anyone ever explicitly teach you how to correctly

structure multi-adjective clauses using that list? • Linguists, Psychologists, Neurologists, and Anthropologists

have long been amazed and puzzled by the human capacity for acquiring language

• Context-free grammars are one way that scholars have tried to model how language, and our brain’s models of language, might be structured – Key concept: representing and generating infinite

languages with a complex structure, using just a few simple rules

• Formalizing these rules allows software tools to use them

6

Why use grammars in computer science?

• Precise specification of language structure – For example, begin – end hierarchy

• Good basis of automatic tools – For example, parser generators

• Provides feedback to language designers – For example, whether new construct easy/hard

to add

• Adaptable – Easy to add new language constructs

7

CONTEXT-FREE GRAMMARS CFGS

They generate languages!

8

Context-Free Grammars

• Finite set of Variables: {S} • Designated Start variable: S • Finite set of Terminals: {0} • Finite set of Rules (Productions):

S 0S (1) S 0 (2)

Example: S 0S 00S 000 1 1 2

Example Grammar G

9

Presenter
Presentation Notes
Variables written in CAPS. Terminals just like alphabet, distinct from Variables. Rules have variable on LFS of arrow, string of variables and terminals on RHS Designated start variable (here S) Alphabet (here only 0) Here called TERMINALS. Set of Substitution Rules, where LHS is a variable (may have more variables than S) and you can substitute the RHS for an occurrence of the left hand variable. The language of the grammar is all the terminal strings you can generate starting from S. Grammars describe a language: all terminal strings that can be generated from start symbol, replacing other variables, as go along, till wind up with string of terminals.

What is the language of this CFG?

S 0S | 0 (Note: 2 rules, separated by I ) A. {0} B. {0, 00, 000, …} C. {ε , 0, 00, 000, …} D. None of the above

10

Presenter
Presentation Notes
Can’t get epsilon. SO its B.

Formal Definition of CFG A context-free grammar is a 4-tuple (V, Σ , R, S), where 1. V is a finite set of variables 2. Σ is a finite set of terminals, disjoint from

V 3. R is a finite set of rules

A u where A є V and u є (V U Σ )*

4. S є V is the designated start variable Rules capture patterns of words in language

11

Presenter
Presentation Notes
Name context-free comes from that only 1 VAR allowed on LHS of rule. So the rule can be applied in any context!

Derivations in CFG Given a CFG G, with start variable S, and

terminals Σ • u A v u w v if A w is a rule in G and

u, v, and w are strings of variables and terminals of G.

• u * v if u = v, or there is a sequence u1 … uk

such that u = u1 u2 ….. uk-1 uk = v

• L(G) = { w in Σ* │ S * w } – All strings derived when start with S and apply 0 or

more rules to get w

• G is equivalent to G’ if L(G) = L(G’)

12

G generates a LANGUAGE L(G) • L(G) includes every string of terminals

generated from the start variable S – Might include the empty string ε

• L(G) does not include strings that cannot be generated from the start variable S

• L(G) includes ONLY strings of terminals, not strings of terminals and variables

• A language L is context-free if there is a CFG G with L(G) = L

13

What is the language of this CFG?

S 0S | 1S | ε (Note: 3 separate rules!) A. Same as regular expression 0*1* B. Same as regular expression 0* U 1* C. Same as regular expression (0 U 1)* D. None of the above

14

Presenter
Presentation Notes
Its not A, since can start with 1. its not B, can have both 0’s and 1’s in any order. Its C.

Are all regular languages context-free? Discuss and try to prove or disprove.

A. Yes B. No C. Don’t know

15

Presenter
Presentation Notes
YES One way to prove is to show every regular expression describing language L has a CFG that accepts it. Prove by construction.

Designing CFG’s • If you can divide language you want to

generate into pieces, write a separate grammar for each piece, with start Si , and then add a new start variable S S1 | S2 | … Sn

• If you need to link information, such as the number of 0’s to the same number of 1’s, use rule of form R u R v

• You can construct a CFG from a DFA! – For state qi in DFA use variable Ri in CFG – For start state q0 in DFA make R0 start in CFG – If δ(qi, a) = qj, add rule Ri a Rj in CFG – Add Ri ε if qi is a final state in DFA

16

Presenter
Presentation Notes
ADD AN EXAMPLE HERE WHERE HAVE 2 SEPARATE PIECES: Say for reg exp 1* 01 U 001* Linking info: we’ll see that next slide! Example DFA for 1*0

CFG for L = {anbn | n ≥ 0} • What’s the shortest string in language? • If we have built anbn , how do we modify it to

create a longer string in the language?

• What rules should we pick to use this idea?

ε

an+1bn+1 = a anbn b

S ε | aSb

17

Demonstrating CFG Power Consider L = { 0n 1n | n ≥ 0}. Which of these CFG’s generate L? A. S QR Q 0Q | ε R 1R | ε B. S RQ R 0R1 | ε Q 0Q1 | ε C. S 0S1 | ε D. S 01S01 | ε E. None of the above

18

Presenter
Presentation Notes
C allows you to generate 1 0 for each 1. Its not A since Q R will generate independent (not the same) no of 0’s and 1’s. Its not B, since it can put 1’s before 0’s. Its not D, since it can put a 0 after a 1.

Design a CFG • L = {an bm cn | n, m ≥ 0} • Shortest string: • Any number of b’s in middle?

• Right order and right number of a’s and

c’s?

ε

B bB | ε

S ε | aSc | B B bB | ε OR S aSc | B B bB | ε

19

Presenter
Presentation Notes
Shortest string: epsilon Note any number of b’s in the middle! How do we do that? B b B | epsilon Now how ensure right order and right number? S epsilon | a S c | B B bB | epsilon

Derivations and Parse Trees • Leftmost (rightmost) derivation:

Derivation where you always replace the leftmost (rightmost) variable.

• Can put the same information in a parse tree – But ignore the order of the derivation

• Unique leftmost (rightmost) derivation

Parse Tree

20

Parse Trees For the following grammar, where Σ = {a, +, x, (, ) } construct the parse tree for the string a+axa

E E+T | T T TxF | F F ( E ) | a

E E + T T T x F F F a a a

21

Special Forms of CFG’s “Chomsky Normal Form” • Noam Chomsky, linguist

who has taught at MIT for 55 years

• Every rule of form A → BC │ a or S → ε

“Greibach Normal Form” • Sheila Greibach, UCLA CS

professor • Every rule of form

A → a A1A2…An or S → ε

22

Presenter
Presentation Notes
Chomsky form is interesting because can get simple and easily bounded algorithm to see if string is generated. Never have to look beyond length of string! Greibach form always gives you the first terminal of the string—useful for compilers!

Example CFG E E + E | E x E | (E) | a Consider the string a+axa. It has 2 different parse trees: E E E E E E E E E E a + a x a a + a x a

23

Ambiguity The existence of 2 distinct parse trees (or 2

leftmost derivations) for the same string means the grammar G is ambiguous.

Ambiguous grammars are a problem (for

example, for compilers) In practice, we often try to rewrite the

grammar to eliminate the ambiguity. See slide 22 for unambiguous grammar for

expressions

24

Presenter
Presentation Notes
We’ve already seen how to correct this! The E-T-F grammar on previous slide (19).

What are the limits of Context-Free Languages? What are their Closure Properties?

Regular

Context-Free

Not Context-Free ??

25

True or False: The Context-Free Languages are closed under Union. A. True B. False C. We don’t know yet

26

Pushdown Automata: Sneak Peek

NFA

STACK with operations Read top Remove top (Pop) Write top (Push)

Input

a

x

At each step, transition from state q, input a , top of stack x state r, replace x with y on top of stack ( for a, x ≠ ε ) Accept: IF Have read ALL the input, AND in final state of PDA BOTH MUST HOLD! (Don’t require the stack to be empty) 27

Presenter
Presentation Notes
Note that the next move depends on the input AND top of stack. We can use the stack to do auxiliary checking. Its useful to know when the stack is empty, so we often start with a special symbol so we can check that. PDA’s can be non deterministic.

JFLAP Diagram of PDA for {0n1n │n > 0}

a,b; c : “When PDA is reading input a, replace symbol b on the top of the stack with symbol c” (This example was from JFLAP, but written a, b c in Sipser) a = ε means don’t read input symbol b = ε means don’t read or pop top of stack c = ε means don’t write on top of stack

Example input: 0011 010

28

Presenter
Presentation Notes
This is deterministic; pda’S can also be NON Det. In fact, they are MORE powerful than determinstic PDAs