Computational Learning: Statistical & Soft-Computing ......Prof. Nikhil Pal, Editor-in-Chief, IEEE...

Post on 28-Sep-2020

0 views 0 download

Transcript of Computational Learning: Statistical & Soft-Computing ......Prof. Nikhil Pal, Editor-in-Chief, IEEE...

1

Computational Learning: Statistical & Soft-Computing Approaches: with focus on Language Structure using Fuzzy Similarity

Narendra S Chaudhari

School of Computer Engineering Nanyang Technological UniversitySingapore

Emails: asnarendra@ntu.edu.sg; nsc183@gmail.com; nscmp@lycos.com

verbpredicate

nounarticlephrasenoun

predicatephrasenounsentence

_

_

runsverb

catnoun

thearticle

Son of (Ethel) Sara Turing (conceived 1911 : Chatrapur, India; born 23 June 1912, London)

Prof. Nikhil Pal, Editor-in-Chief, IEEE Tras. Fuzzy Systems & Professor, Indian Statistical Institute (ISI), Calcutta, India

2

Language Structure using Fuzzy Similarity:Language Structure – Some Approaches

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaanen, 2000 ] – Alignment Profile– Profile Similarity– Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

3

Human beings appear to be able to learn:new conceptswithout needing

to be programmed explicitlyin any conventional sense.

[Leslie G. Valiant, 1984]Harvard University,

Aiken Computation Laboratory, Cambridge, MA, USA

Grammar Learning: BIG PICTURE

Formal Language Learning: (Grammar Learning): Automation of formal language (Chomsky Hierarchy) learning.

4

Grammar Learning: Model Construction

• Grammar Learning (also called as Grammar Inference: GI) is to identify grammatical models from generated samples.

Grammar Model

S→ ….

….. …

11

111

1111

…?

5

Grammar Learning: Approach -Induction, and, Inductive Inference

Induction: reasoning from a part to a whole, from particulars to generals, or from individual to the universal.

Inductive inference: process of hypothesizing a general rule from examples.

– Example:

100, 111100, 11000, 1110, 1100,…

– Guess:

“any number of 1’s followed by any number of 0’s”.

6

E.M. Gold [1967]Formulated concept of “Identification in the Limit”It views inductive inference as an infinite process.

PROVES THE “Negative” result:

Any “super-finite” language class (class containing at least one infinite language) cannot be learnt with only “positive” examples

E. M. Gold [1967] Language identification in the limit,Information and Control, Vol. 10, pp. 447-474, 1967.

Grammar Learning: E.M. Gold’s “negative” result

7

E.M. Gold [1978]Formulated concept of “Language learning in the Limit”(again views inductive inference as an infinite process)

PROVES THE “Positive” result:

“Regular” languages are “learnable in the limit”

“practicality problems” (good algorithms) still remain

E. M. Gold [1978] Complexity of automation identification from given data. Information and Control, Vol. 37, pp. 302-320, 1978.

Grammar Learning: E.M. Gold’s “positive” result

8

Grammar Models of practical interest

Regular Languages

Context-free Languages

a

b b c

Stochastic Extensions

S→AB (80%)

S→B (20%)

Sub classes

Extended Models

9

Regular Language (Finite Automaton) Learning

• Learning from Representative Samples and Membership Queries [Angluin 1981]

• Learning by Membership and Equivalence Queries [Angluin 1987b]• Learning Reversible Languages [Angluin 1982]

[Angluin 1981] D. Angluin, "A Note on the Number of Queries Needed to Identify Regular Languages", Information and Control, 51, pp. 76-87, 1981. [Angluin 1987b] D. Angluin, "Learning regular sets from queries and counterexamples", Information and Control, 39, pp. 337-350, 1987b. [Angluin 1982] D. Angluin, "Inference of Reversible Languages", Journal of ACM, 29, pp. 741-765, 1982.

10

designed efficient algorithms for incrementalefficient algorithms for incrementallearning of DFAlearning of DFA.– A version space based framework (using the lattice

of DFAs) is reported in:

– Suresh Jain, Narendra S. Chaudhari, “An Incremental Algorithm for learning DFA from characteristic sample,”International Journal of Computational Intelligence Research (IJCIR), Vol. 3, No. 4, pp. 297-312 (Online: http://www.ripublication.com/ijcirv3/ijcirv3n4_3.pdf )(Dec 2007).

Regular Language Learning:Some of Our Contributions

11

Language Structure using Fuzzy Similarity

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaanen, 2000 ] – Alignment Profile

• Alignment Profile as a Fuzzy Set– Profile Similarity

• Profile Similarity formulated in terms of Fuzzy Operations– Indistinguishable Grammar Symbols

• Profile Similarity and Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

12

Language Structure using Fuzzy Similarity

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaanen, 2000 ] – Alignment Profile

• Alignment Profile as a Fuzzy Set– Profile Similarity

• Profile Similarity formulated in terms of Fuzzy Operations– Indistinguishable Grammar Symbols

• Profile Similarity and Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

13

Alignment Based Learning (ABL)

Introduced by Zaanen [Zaanen 2000, 2002a]For Context-free Languages• Based on alignment information• Unsupervised• Does not require language details

[Zaanen 2000] Menno van Zaanen, "ABL: Alignment-Based Learning", Proceedings of the 18th International Conference on Computational Linguistics (COLING), Saarbrücken, Germany, pp. 961-967, 31 Jul-4 Aug 2000.

Induction of Linguistic Knowledge

ILK Research GroupDept. of Communication and Information SciencesFaculty of HumanitiesTilburg UniversityP.O. Box 90153NL-5000 LE TilburgThe Netherlands

14

ABL Step 1

• Get alignments between each pair of sentences from the samples.

Oscar sees the large, green apple.

Cookie monster sees the red apple.

Oscar sees the large, green apple.

Cookie monster sees the red apple.

15

ABL Step 2

• Extract context-free grammar

A → OscarA → Cookie monsterB → large, greenB → redS → A sees the B apple.

Oscar sees the large, green apple.

Cookie monster sees the red apple.

16

Problems with ABL

• Unable to identify the following alignment– Book [a trip to Goa beach].– Show [ me Big Bird’s house].

• Generates the following alignment, and generates incorrect grammatical rules – Gopal eats [biscuits].– Gopal eats [well] .

Book and Show are “verbs” but ABL cannot learn this concept of verbs.

biscuits and well should not be “clubbed” as single concept (because biscuits is noun, well is adverb) but ABL will club them to be derived from the same non-terminal symbol (since other parts Gopal eats are aligned in ABL) .

17

OUR EXTENSION: Profile Based Alignment Learning1,2

• Objective: – Refine the learned model by reducing identified invalid rules

– Improve the precision of context-free grammatical rules extracted from samples

1: X. Wang and N. S. Chaudhari, “Alignment Based Similarity Measure for Grammar Learning”, 2006 IEEE International Conference on Fuzzy Systems, pp. 1902 – 1909, July 16-21, 2006.2: X. Wang and N. S. Chaudhari, “Profile Based Alignment Learning System for Language Inference”, in Proceedings of the 14th International Conference on Intelligent and Adaptive Systems and Software Engineering, pp. 94-99, Toronto, Canada, July 20-22, 2005.

biscuits and well should not be “clubbed” as single concept … but ABL will club them to be derived from the same non-terminal symbol (since other parts Gopal eats are aligned in ABL) .

Book and Show are of the “same catergories” (verbs) but ABL cannot learn this concept of “same category” (verb).

18

Language Structure using Fuzzy Similarity

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaanen ] – Alignment Profile

• Alignment Profile as a Fuzzy Set– Profile Similarity

• Profile Similarity formulated in terms of Fuzzy Operations– Indistinguishable Grammar Symbols

• Profile Similarity and Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

19

Alignment Profile

• Suppose– “apples” aligned with “pears” in one context

• I brought some [apples] this morning• I brought some [pears] this morning

– “apple” aligned with “well” in another context• Gopal eats [pears]• Gopal eats [apples]• Gopal eats [well]

apples pears well

2/5 2/51/5

• Then alignment profile for “apples” is:

apples pears well

2/5 2/51/5

• The alignment profile for “pears” is:

apples pears well

1/5 1/5 1/5

• Alignment profile for “well” is:

√ Not matching

20

Alignment Profile: as a Fuzzy Set

apples pears well

2/5 2/51/5

• The alignment profile for “apples” is:

We define Alignment profile as a fuzzy set AP defined on the set of symbols N ∪Σ, denoted by AP = { <v, μAP(v) > | v ∈N ∪Σ },

where μAP(v) : N ∪Σ → [0,1]is the membership grade of the fuzzy set AP.

21

Language Structure using Fuzzy Similarity

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaanen ] – Alignment Profile

• Alignment Profile as a Fuzzy Set– Profile Similarity

• Profile Similarity formulated in terms of Fuzzy Operations– Indistinguishable Grammar Symbols

• Profile Similarity and Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

22

Profile Similarity

• Our basic idea: words with higher “Profile similarity” tend to be generated from the same nonterminal in context-free languages

(%)

Similarity

Dissimilarity

23

Profile Similarity

• Our basic idea: words with higher “Profile similarity” tend to be generated from the same nonterminal in context-free languages

(%)

Similarity

Dissimilarity

24

Profile Similarity: formulated in terms of Fuzzy operations

OUR Approach to prove that words with higher “Profile similarity”tend to be generated from the same nonterminal in context-free languages: – Formulate “Alignment Profile” as a Fuzzy Set over Grammar symbols – Formulate Profile Similarity in terms of Fuzzy Set operation(s) The Profile similarity of alignments profiles AP1 and AP2, denoted as Ps(AP1, AP2),is formulated as:

Ps(AP1, AP2) = .

(%)

SimilarityDissimilarity||

||

21

21

APAPAPAP

∪∩

25

Language Structure using Fuzzy Similarity

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaanen ] – Alignment Profile

• Alignment Profile as a Fuzzy Set– Profile Similarity

• Profile Similarity formulated in terms of Fuzzy Operations– Indistinguishable Grammar Symbols

• Profile Similarity and Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

26

Indistinguishable Grammar Symbols: our definition †

Define indistinguishable (grammar) symbols:In a CFG G = (N, Σ, P, S), A∈N, B∈N,if for all p∈P and

for all “surrounding contexts” α, β, RHS(p)=αAβ,

there exists a q∈P, RHS(q)=αBβ, and LHS(p) = LHS(q), then A and B are called indistinguishable symbols, Otherwise, A and B are distinguishable.

X → αAβ

Y → αBβ

p: q: X

† : motivation: we introduce this definition to formalize the construction to “merge” grammar symbols (e.g. “apples”, “pears”) with high profile similarity

27

Indistinguishable Grammar Symbols & Profile Similarity

Theorem 1. In a SCFG G(N, Σ, P, S), A∈N, w1∈N∪Σ, w2∈N∪Σ, {A→w1 (pr1), A→ w2 (pr2) } ⊆ P, pr1>0, pr2>0.

Assume that A can be generated from S. Then, given enough generated sentences,Ps(AP(w1), AP(w2)) = 1.

Theorem 2. In a SCFG G(N, Σ, P, S), A∈N, w1 ∈N∪Σ, w2 ∈N∪Σ, {A→w1 (pr1), B→ w2 (pr2) } ⊆ P, pr1>0, pr2>0. Assume that A and B are distinguishable, thenthe profile similarity, Ps(AP(w1), AP(w2)) = 1 - σ,

where σ is a positive real number less than 1.Note: To represent the “frequency of occurrence” of the rule, we use “stochastic” extension of CFG: i.e. we use SCFG.

28

Example

Sample text 1: London came out on top when the announcement was made by IOC President Jacques Rogge in Singapore.

Sample text 2: This was the fourth bid from Britain. London will become the first city to have hosted the Olympics three times.

Sample text 3:This was the fourth bid from Britain. London will become the first city to have hosted the Olympics three times.

production rules and their counts are:

p1: Sn1 → [London] [came] [out] [on] [top] [when] [the] [announcement] [was] [made] [by] [IOC] [President] [Jacques] [Rogge] [in] [Singapore], (C(p1) = 1)

p2 : S → Sn1 , (C(p2) = 1)p3 : Sn2 →[This] [was] [the] [fourth] [bid] [from] [Britain], (C(p3) = 2)p4 : Sn3 → [London] [will] [become] [the] [first] [city] [to] [have] [hosted] [the]

[Olympics] [three] [times], (C(p4) = 2)p5 : S → Sn1 Sn2, (C(p5) = 2)

29

Language Structure using Fuzzy Similarity

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaneen ] – Alignment Profile

• Alignment Profile as a Fuzzy Set– Profile Similarity

• Profile Similarity formulated in terms of Fuzzy Operations– Indistinguishable Grammar Symbols

• Profile Similarity and Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

30

Profile Based Alignment Learning: PBAL Framework

Makes use of the following concepts:– Slot Alignment Score– Sentence Similarity– Sentence Similarity Threshold– Dynamic Sentence Similarity Threshold

STEPS in Our PBAL Framework:• Find pairwise alignment for each sentence pair• Calculate and accumulate alignment counts (slot alignment,

sentence similarity)• Calculate alignment profile and Profile similarities• Redo the alignment until there is no further change• Extract grammar by extracting non-terminal for each slot

31

Experiments, and Example Rules Generated

For experiments and testing, we used• CHILDES database: a collection of child-related speech• sample from the English-American corpora• No editing / cleanup is done and used directly for alignment• Number of Sentences: more than 10,000

One sample rule is:

don't put that in the {1155}[dough ] (+)oh lets put that in the {1155}[middle ]

where {1155} is the ID of nonterminal of CFG, with [dough] and [middle] identified to be generated from {1155}.

(the number in bracket is the number of times the rule is used. ):\769 = \437 \510 \544 \757 .(1)

CHILDES Reference: B. MacWhinney, The CHILDES Project: Tools for analyzing talk, third ed., Lawrence Erlbaum Associates, Mahwah, NJ, 2000

32

Rules Generated: Example\769 = \437 \510 \544 \757 .(1)

\769 = \470 \515 \508 \758 \534 \466 \759 \470 \515 .(1)\769 = \578 \760 \439 \399 \761 .(1)\769 = \478 \499 \617 \484 \510 \490 \762 \469 \490 \475 \527 \647 \465 .(1)\769 = \385 \473 \763 \764 \445 \765 .(1)\769 = \394 \768 \402 \533 \766 \765 \647 .(1)\769 = \552 \442 \766 \765 \647 .(1)\769 = \552 \606 \406 .(1)\1073 = \594 .(1)\1072 = \635 .(1)\1073 = \679 .(1)\1072 = \680 .(1)\1078 = \419 .(1)\1078 = \431 .(1)\1079 = \450 \542 .(1)\1080 = \675 .(1)\1080 = \580 \507 \678 .(1)\1081 = \450 .(1)\1079 = \733 \734 .(1)\1081 = \465 .(1)\1090 = \386 \387 \388 \389 .(1)\1092 = \497 .(1)\1091 = \487 \498 .(1)\1093 = \390 \516 .(1)\1094 = \526 .(1)\1094 = \527 \528 .(1)\1090 = \536 \537 \537 \538 \521 .(1)

\1092 = \582 .(1)\1091 = \599 \439 .(1)\1095 = \435 \460 \545 \414 \530 .(1)\1095 = \416 \622 \516 .(1)\1093 = \400 \647 \646 .(1)\1097 = \515 .(1)\1096 = \503 .(1)\1098 = \460 .(1)\1097 = \563 \690 .(1)\1096 = \691 .(1)\1098 = \414 .(1)\1099 = \385 \466 \437 .(1)\1099 = \484 \401 .(1)\1120 = \460 .(1)\1120 = \1098 .(1)\1124 = \400 \564 .(1)\1123 = \565 .(1)\1124 = \390 .(2)\1123 = \1095 .(2)\1129 = \668 \544 \440 \669 .(1)\1129 = \536 \766 \765 \767 .(1)

33

Precision and Number of Rules Generated by Different PBAL Formulations

9996.93%Dynamic SST PBAL

6094.97%R_Score based PBAL

3491.47%PBAL SA

2291.36%PBAL NSA

1485.82%ABL SST

# of Rules Generated

PrecisionFormulation

34

Language Structure using Fuzzy Similarity

• Model Construction– Grammar Learning– Grammar Models of Practical Interest– Regular Language Learning

• Context Free Grammar Learning– Alignment Based Learning (ABL) [ Zaanen ] – Alignment Profile

• Alignment Profile as a Fuzzy Set– Profile Similarity

• Profile Similarity formulated in terms of Fuzzy Operations– Indistinguishable Grammar Symbols

• Profile Similarity and Indistinguishable Grammar Symbols– Profile Based Alignment Learning (PBAL)

• Concluding Remarks

• Research Interests

35

Language Structure Learning: Concluding Remarks

• Grammar Learning: computationally hard – More tractable with “additional information” given in terms of:

• ‘negative’ examples• ‘structural’ information

• Learning of Regular Grammars: – many approaches have been proposed by researchers– We have developed one approach based on “Version Spaces”:

useful for incremental learning

• Learning of Context Free Grammars: – Researchers have investigated approaches based on soft-

computing models– We extended Zaanen’s Alignment Based Learning (ABL) to make

use of “Profile Based Alignment Learning”

• Applications: • one area: Games

36

Questions ? …

…• More information:

– Web: www/ntu/edu/sg/home/asnarendra• Contact

– Email: asnarendra@ntu.edu.sgPersonal Emails: nsc183@gmail.com,

nscmp@lycos.com

37

Research Interests

• Algorithms– Graph Isomorphism– Parsing of Context Free Grammars (CFGs)

• Soft Computing– Specialized Recurrent Neural Networks (RNNs) for protein secondary

Structure (PSS) Prediction – Long-Short Term Memory (LSTM) Networks– Segmented Memory Recurrent Neural Networks (SMRNNs)– Bidirectional SMRNNs (BSMRNNs)

– Binary Neural Networks (BNNs)– Construction Method(s)

– Grammar Learning– For Regular Grammars: Use of Version Spaces– For CFGs: Use of Profile Based Alignment Learning

• Simulations and Games• BSP for urban Terrain Modeling• Game AI: non-conventional

38

• Non-conventional Game AI– Games with Computational Models

• Computational Learning• Soft-computing• Neural Networks and Binary Neural Networks

• Concluding Remarks

Non-conventional Game AI: Computational Learning, and Computer

Science Models: Outline

39

Computational Models: Formal Language

• Language Structures– Verbal Explanation

(Study of Forms of Language Structures)

• Formal Languages • Chomsky’s classification:

– Regular (Type 3) Languages– Context Free (Type 2) Languages– Context Sensitive Languages– Phrase Structured Languages

40

Table (represents mapping) - …(phoneme – Viseme)Finite Automaton (FA) - regular languagesPush down automata - Context-freeLinear Bounded Automata - Context sensitiveTuring machines - Phrase-structured lgs

Automaton Model

Formal Languages they represent

41

Our Work: Computational Learning - I

• Learning of Regular Grammars• Through “Reversible Automata”• Construction through

– Positive Examples, and– Negative Examples

• “Optimal Construction” remains NP-Complete– However, “good” algorithms possible for Construction

• Games with such learning models

Web: http://www.ntu.edu.sg/home/asnarendra

42

Our Work: Computational Learning - II

• Learning of Context Free Grammars• Through “Structured Examples”• Construction through

– Positive “structured” Examples, and– Negative “structured” Examples

• Use of Soft-Computing Approaches for learning of Structures

• PBAL Based Approach for CFG Learning

• Games with such learning models

43

Soft Computing

• Current state of soft computing: • collection of the following techniques

– Fuzzy Logic – Neural Networks– Evolutionary Computing Techniques inspired from behavioral

studies like• Genetic Algorithms• ant colony optimization• small world theory• theory of memes

44

Neural Networks

• Models cortical structures of the brain. • Neurons - interconnected processing

elements, that work together to produce output.

• Co-operation - output relies on the cooperation of the individual neurons within the network to operate.

• Parallel Processing: Processing of information is parallel (to contrast with sequential nature of Turing Model).

45

Neural Networks• Difficulties:• Initialization of network's starting structure, • training parameters, and, • weight updates• Other factors:

– Number of hidden layers– Number of hidden nodes– Initial weights

Some (famous) Solutions:– Self Organizing Maps– Binary Neural Networks

46

Binary Neural Networks (BNNs)

• Hard-limiter nonlinearity

• Our Contribution:Constructive methods for

– Number of hidden layers– Number of hidden nodes

• Based on geometric (hypersphere) concepts, we have developed BNN construction methods.

±1Sum

Threshold = +1

w1= +1

w2= +2

47

• Non-conventional Game AI– Games with Computational Models

• Computational Learning• Soft-computing• Neural Networks and Binary Neural Networks

• Concluding Remarks

Future Game AI: Computational Learning,

and Computer Science Models: Outline

48

Non-conventional Game AI: Concluding Remarks

• Two Computational Models– Chomsky’s Grammars: Formal Languages– Automata Models: FA, PDA, TMs

• Existing Computational Learning Techniques allow automatic construction of – Regular Languages, and,– Part of Context Free Languages– Algorithms for such Constructions remain Compute Intensive,

specially for Context Free Languages

49

Non-conventional Game AI: Concluding Remarks: Continued …

• Neural Network Construction: Trial & Error, and Time Consuming– “constructive methods”

• Many “specialized” variants available • GA based Approaches: NEAT (2004), RT-NEAT (2006),

etc• Binary Neural Networks

– Construction methods based on Geometric Space Expansion

• Future Game AI– would integrate these technologies

50

Questions ? …

Email: asnarendra@ntu.edu.sg; Personal Emails: nsc183@gmail.com, nscmp@lycos.com

Web: www.ntu.edu.sg/home/asnarendra