Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of...

18
Syntactic Pattern Recognition •Statistical PR: Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns •Ignores relational information contained in the structure •Most structural methods use hierarchical decomposition •Note similarity between a sentence structure and pattern description A B C c a b f g d e Picture A Triangle B Rectangle C edge edge edge a b c edge edge edge edge d e
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of...

Page 1: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Syntactic Pattern Recognition

•Statistical PR: Find a feature vector x

• Train a system using a set of labeled patterns

• Classify unknown patterns

•Ignores relational information contained in the structure

•Most structural methods use hierarchical decomposition

•Note similarity between a sentence structure and pattern description

A

BCc

a

b

f

g

d

e

Picture A

Triangle B Rectangle C

edge edge edge

a b c

edge edge edge edge

d e f g

Page 2: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Language•Alphabet is a finite set of symbols, V={x1,x2, …,xn}

•Sentence over B is a finite string of ordered symbols (left to right) from V

•Example: V = {a,b,c}, valid sentences are “abb”, “abba”, “aaa”, null

•Length of a sentence s, |s| is the number of symbols

•s1os2 is the concatenation of the two sentences

•VoVoV…oV = Vn is the set of all sentences with n symbols over V

•V+=VUV2UV3…. is the set of all non-empty sentences over V

•V* is the closure of V

•Language is an arbitrary subset L of V*

•Example: V={0,1}, then L1 = {001, 110, 111, 0, null} is a finite language

• L2 = {s|s = 1n021m, n>=1, 1<=m<=10} is an infinite language

Page 3: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

•L1oL2 = {s|s = s1s2, s1 belongs to L1 and s2 belongs to L2} is concatenation

•L1it = {s|s = s1s2…sn, n>=0, si belongs to L1} is the iterate of L1

•L1oL2 and L1it are both languages

•Example: V = {a,b} L1 = {aa,ab,bb} L2 = {a,b}

•L1oL2 = {a3,aba,b2a,a2b,ab2,b3}

•L1it is infinite; for n={0,1,2}

•s is called a sub-string of t if t =usv for some strings u,v belonging to V*

•Every string is a substring of itself as u and/or v can be null

Languages

},,,,,,,,,,,, 422232223422 babbabababababababaababa

Page 4: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Grammars

•Grammar G = {VT, VN, P, S} has 4 entities

• VT is a set of terminal symbols, called primitives or constants

• VN is a set of non-terminal symbols, called variables

• VT and VN belong to V;

• P is the set of production rules A->B where A has at least one variable and B is a mix of variables and constants

• S is the starting symbol or the root; S belongs to VN

•L(G) is a formal language ( a set of strings) generated by the grammar G

• Each string is composed of only primitives

• Each string can be derived from S using the production rules P

• Example: VT = {a,b}, VN = {S}; P = {S->aSb, S->ab} => L(G) : anbn, n>=1

•Grammar is used to :

•(I) generate the strings (sentences) accepted by L(G), (ii) check if a sentence belongs to a grammar, (iii) analyze the structure of a sentences

VVVVV NTNT ,

Page 5: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Grammar Types

UnRestricted Grammar (UR)

Context Sensitive Grammar (CS)

Context Free Grammar (CF)

Finite State Grammar (FS)

21 21A

;Vb,a;VB,A;V;V TN*

21A

aBorA,aA 211 Example: VT = {a,b,c}; VN = {S, A, B}

UR CS CF FS

B

aAB

cBS

cA

abAS

AcbBb

abcAc

aAcaB

abS

aBbcS

bB

cA

aBA

ABS

bBS

aAcS

cB

bB

aA

A

bBS

aAS

Page 6: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Finite State Grammars, and Graphical Representations

•Nodes are nonterminals in VN and an additional terminal node T not in V

•Productions of type Ai->aAj represented by edge a directed from Ai to Aj

•Productions of type Ai->a represented by edge a directed from Ai to T

cA

aB

cBA

bBS

aAS

S

T

BA

a

a

a

aa

For a FS grammar G, an arbitrary string x=x1x2..xn, xi in VT is in L(G) iff there exists at least one path (x1,x2,..,xn) from S to T

Page 7: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Syntactic Pattern Recognition

C2-class problem C1 and C2 are composed of features from a set VT

Let G be a grammar such that L(G) consists only of sentences (patterns) from C1

Example: VT = {a,b} VN = {S,A} P:{S->aSb S->b}

L(G): {b; anbn+1, n>=1}

Classification Rule

x belongs to C1 iff x belongs to L(G)

x belongs to C2 iff otherwise

Classification algorithm has to correctly answer whether or not a given string is grammatically correct.

Page 8: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Pattern Grammars

2-class problem: rectangles and other quadilaterals

Select primitives: a: 0o edge

b: 90o edge

c: 180o edge

d: 270o edge

Set of rectangles:

If a0, b0, c0, d0 represent unit length lines

...}3,2,1m,n;dcba{L m0

n0

m0

n0

Page 9: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Consider, a: 0o horizontal unit length

b: 120o unit length

c: 240o unit length

L(G) represents the class of equilateral triangles

What is the grammar?

Make it up from domain knowledge

There is no unique solution

}3n1;cba{L nnn

Page 10: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

FS Grammar solution

VT = {a,b,c} VN = {S, A, B, C, D, E, F, G, H, I, J, K}

aEB

aDA

aBA

aCS

aAS

cJK

cIJ

cI

bKH

bHG

bGE

bJF

bFD

bIC

cI

cIJ

bJF

bFD

aDA

aAS

222 cba

CS Grammar solution

VT = {a,b,c} VN = {S, A, B, C, D, E, F}

cFbDEaDFA

bCaEFBbA

bcDaBFAaAFS

P

Page 11: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Syntax Analysis

mi1C

mi1G,L

i

ii

•Let x be the unknown pattern. Recognition task is finding L(Gi) such that x belongs to L(Gi)

•i.e. Given a string x and a grammar G, construct a triangle with the top vertex S and the bottom side x inside which will be the derivation parse tree

•Top-down and Bottom-up parsing methods can be used

S

x

Page 12: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Stochastic Languages

Probabilities are associated with production rules- stochastic grammar

Stochastic language is one obtained by such a grammar

Probability of obtaining x is

)P,P,P|P(p)P|P(p)P(p)x(p 1n21n121

Page 13: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.

Tree representations

A string s1 is directly derived from string s2 in G ( ) if there exists a rule in G such that s1 is the result of replacing by . In general, s is derived from the initial symbol of G, S, if there exists a sequence of strings from which we can derive s from S, i.e.,

Parsing is the reverse of generation

21 ss

sS G

Page 14: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.
Page 15: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.
Page 16: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.
Page 17: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.
Page 18: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.