Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of...
-
date post
22-Dec-2015 -
Category
Documents
-
view
216 -
download
1
Transcript of Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of...
![Page 1: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/1.jpg)
Syntactic Pattern Recognition
•Statistical PR: Find a feature vector x
• Train a system using a set of labeled patterns
• Classify unknown patterns
•Ignores relational information contained in the structure
•Most structural methods use hierarchical decomposition
•Note similarity between a sentence structure and pattern description
A
BCc
a
b
f
g
d
e
Picture A
Triangle B Rectangle C
edge edge edge
a b c
edge edge edge edge
d e f g
![Page 2: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/2.jpg)
Language•Alphabet is a finite set of symbols, V={x1,x2, …,xn}
•Sentence over B is a finite string of ordered symbols (left to right) from V
•Example: V = {a,b,c}, valid sentences are “abb”, “abba”, “aaa”, null
•Length of a sentence s, |s| is the number of symbols
•s1os2 is the concatenation of the two sentences
•VoVoV…oV = Vn is the set of all sentences with n symbols over V
•V+=VUV2UV3…. is the set of all non-empty sentences over V
•V* is the closure of V
•Language is an arbitrary subset L of V*
•Example: V={0,1}, then L1 = {001, 110, 111, 0, null} is a finite language
• L2 = {s|s = 1n021m, n>=1, 1<=m<=10} is an infinite language
![Page 3: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/3.jpg)
•L1oL2 = {s|s = s1s2, s1 belongs to L1 and s2 belongs to L2} is concatenation
•L1it = {s|s = s1s2…sn, n>=0, si belongs to L1} is the iterate of L1
•L1oL2 and L1it are both languages
•Example: V = {a,b} L1 = {aa,ab,bb} L2 = {a,b}
•L1oL2 = {a3,aba,b2a,a2b,ab2,b3}
•L1it is infinite; for n={0,1,2}
•s is called a sub-string of t if t =usv for some strings u,v belonging to V*
•Every string is a substring of itself as u and/or v can be null
Languages
},,,,,,,,,,,, 422232223422 babbabababababababaababa
![Page 4: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/4.jpg)
Grammars
•Grammar G = {VT, VN, P, S} has 4 entities
• VT is a set of terminal symbols, called primitives or constants
• VN is a set of non-terminal symbols, called variables
• VT and VN belong to V;
• P is the set of production rules A->B where A has at least one variable and B is a mix of variables and constants
• S is the starting symbol or the root; S belongs to VN
•L(G) is a formal language ( a set of strings) generated by the grammar G
• Each string is composed of only primitives
• Each string can be derived from S using the production rules P
• Example: VT = {a,b}, VN = {S}; P = {S->aSb, S->ab} => L(G) : anbn, n>=1
•Grammar is used to :
•(I) generate the strings (sentences) accepted by L(G), (ii) check if a sentence belongs to a grammar, (iii) analyze the structure of a sentences
VVVVV NTNT ,
![Page 5: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/5.jpg)
Grammar Types
UnRestricted Grammar (UR)
Context Sensitive Grammar (CS)
Context Free Grammar (CF)
Finite State Grammar (FS)
21 21A
;Vb,a;VB,A;V;V TN*
21A
aBorA,aA 211 Example: VT = {a,b,c}; VN = {S, A, B}
UR CS CF FS
B
aAB
cBS
cA
abAS
AcbBb
abcAc
aAcaB
abS
aBbcS
bB
cA
aBA
ABS
bBS
aAcS
cB
bB
aA
A
bBS
aAS
![Page 6: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/6.jpg)
Finite State Grammars, and Graphical Representations
•Nodes are nonterminals in VN and an additional terminal node T not in V
•Productions of type Ai->aAj represented by edge a directed from Ai to Aj
•Productions of type Ai->a represented by edge a directed from Ai to T
cA
aB
cBA
bBS
aAS
S
T
BA
a
a
a
aa
For a FS grammar G, an arbitrary string x=x1x2..xn, xi in VT is in L(G) iff there exists at least one path (x1,x2,..,xn) from S to T
![Page 7: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/7.jpg)
Syntactic Pattern Recognition
C2-class problem C1 and C2 are composed of features from a set VT
Let G be a grammar such that L(G) consists only of sentences (patterns) from C1
Example: VT = {a,b} VN = {S,A} P:{S->aSb S->b}
L(G): {b; anbn+1, n>=1}
Classification Rule
x belongs to C1 iff x belongs to L(G)
x belongs to C2 iff otherwise
Classification algorithm has to correctly answer whether or not a given string is grammatically correct.
![Page 8: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/8.jpg)
Pattern Grammars
2-class problem: rectangles and other quadilaterals
Select primitives: a: 0o edge
b: 90o edge
c: 180o edge
d: 270o edge
Set of rectangles:
If a0, b0, c0, d0 represent unit length lines
...}3,2,1m,n;dcba{L m0
n0
m0
n0
![Page 9: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/9.jpg)
Consider, a: 0o horizontal unit length
b: 120o unit length
c: 240o unit length
L(G) represents the class of equilateral triangles
What is the grammar?
Make it up from domain knowledge
There is no unique solution
}3n1;cba{L nnn
![Page 10: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/10.jpg)
FS Grammar solution
VT = {a,b,c} VN = {S, A, B, C, D, E, F, G, H, I, J, K}
aEB
aDA
aBA
aCS
aAS
cJK
cIJ
cI
bKH
bHG
bGE
bJF
bFD
bIC
cI
cIJ
bJF
bFD
aDA
aAS
222 cba
CS Grammar solution
VT = {a,b,c} VN = {S, A, B, C, D, E, F}
cFbDEaDFA
bCaEFBbA
bcDaBFAaAFS
P
![Page 11: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/11.jpg)
Syntax Analysis
mi1C
mi1G,L
i
ii
•Let x be the unknown pattern. Recognition task is finding L(Gi) such that x belongs to L(Gi)
•i.e. Given a string x and a grammar G, construct a triangle with the top vertex S and the bottom side x inside which will be the derivation parse tree
•Top-down and Bottom-up parsing methods can be used
S
x
![Page 12: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/12.jpg)
Stochastic Languages
Probabilities are associated with production rules- stochastic grammar
Stochastic language is one obtained by such a grammar
Probability of obtaining x is
)P,P,P|P(p)P|P(p)P(p)x(p 1n21n121
![Page 13: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/13.jpg)
Tree representations
A string s1 is directly derived from string s2 in G ( ) if there exists a rule in G such that s1 is the result of replacing by . In general, s is derived from the initial symbol of G, S, if there exists a sequence of strings from which we can derive s from S, i.e.,
Parsing is the reverse of generation
21 ss
sS G
![Page 14: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/14.jpg)
![Page 15: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/15.jpg)
![Page 16: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/16.jpg)
![Page 17: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/17.jpg)
![Page 18: Syntactic Pattern Recognition Statistical PR:Find a feature vector x Train a system using a set of labeled patterns Classify unknown patterns Ignores relational.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d765503460f94a58030/html5/thumbnails/18.jpg)