Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1...

60
Hammer & Hitzler KI2005 Tutorial Koblenz, Germany September 2005 Slide 1 AIFB Connectionist Knowledge Representation and Reasoning (Part I) Neural Networks and Structured Knowledge SCREECH Fodor, Pylyshin: What’s deeply wrong with Connectionist architecture is this: Because it acknowledges neither syntactic nor semantic structure in mental representations, it perforce treats them not as a generated set but as a list. [Connectionism and Cognitive Architecture 88] Our claim: state-of-the-art connectionist architectures do adequately deal with structures!

Transcript of Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1...

Page 1: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 1

AIFB

Connectionist Knowledge Representation and Reasoning

(Part I)

Neural Networks and Structured Knowledge

SCREECH

Fodor, Pylyshin: What’s deeply wrong with Connectionist architecture is this: Because it acknowledges neither syntactic nor semantic structure in mental representations, it perforce treats them not as a generated set but as a list.[Connectionism and Cognitive Architecture 88]

Our claim: state-of-the-art connectionist architectures do adequately deal with structures!

Page 2: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 2

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 3: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 3

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 4: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 4

AIFB

The good old days – KBANN and co.

fw :ℝn ℝo

x y

x1

x2

xn

w1

w2

wn

…θ σ(wtx - θ)

σ(t) = sgd(t) = (1+e-t)-1

feedforward neural network

neuron

1. black box2. distributed

representation3. connection to

rules for symbolic I/O ?

Page 5: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 5

AIFB

The good old days – KBANN and co.

Knowledge Based Artificial Neural Networks [Towell/Shavlik,

AI 94]:• start with a network which represents known rules• train using additional data• extract a set of symbolic rules after training

(partial)rules

data

train

(complete)rules

Page 6: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 6

AIFB

The good old days – KBANN and co.

(partial)rules

propositional acyclic Horn clauses such as A :- B,C,DE :- A

B

C

D

A

1

1

1

2.5

FNN with one neuron per boolean variable

E1

0.5

Page 7: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 7

AIFB

The good old days – KBANN and co.

data

train

use some form of backpropagation,add a penalty to the error e.g. for changing the weights

1. The initial network biases the training result, but2. There is no guarantee that the initial rules are preserved3. There is no guarantee that the hidden neurons maintain their

semantic

Page 8: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 8

AIFB

The good old days – KBANN and co.

1. There is no exact direct correspondence of a neuron and a single rule, although each neuron (and the overall mapping) can be approximated by a set of rules arbitrarily well

2. It is NP complete to find a minimum logical description for a trained network [Golea, AISB'96]

3. Therefore, a couple of different rule extraction algorithms have been proposed, and this is still a topic of ongoing research

(complete)rules

Page 9: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 9

AIFB

The good old days – KBANN and co.

(complete)rules

rules

decompositional approach:

rulesx y

pedagogical approach:

Page 10: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 10

AIFB

The good old days – KBANN and co.

Decompositional approaches:• subset algorithm, MofN algorithm describe single neurons by

sets of active predecessors [Craven/Shavlik, 94] • local activation functions (RBF like) allow an approximate direct

description of single neurons [Andrews/Geva, 96]• MLP2LN biases the weights towards 0/-1/1 during training and

can then extract exact rules [Duch et al., 01]• prototype based networks can be decomposed along relevant

input dimensions by decision tree nodes [Hammer et al., 02]

Observation:• usually some variation of if-then rules is achieved• small rule sets are only achieved if further constraints

guarantee that single weights/neurons have a meaning• tradeoff between accuracy and size of the description

Page 11: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 11

AIFB

The good old days – KBANN and co.

Pedagogical approaches:• extraction of conjunctive rules by extensive search [Saito/Nakano

88]

• interval propagation [Gallant 93, Thrun 95]

• extraction by minimum separation [Tickle/Diderich, 94]

• extraction of decision trees [Craven/Shavlik, 94]

• evolutionary approaches [Markovska, 05]

Observation:• usually some variation of if-then rules is achieved• symbolic rule induction required with a little (or a bit more) help

of a neural network

Page 12: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 12

AIFB

The good old days – KBANN and co.Where is this good for?• Nobody uses FNNs these days • Insertion of prior knowledge might be valuable. But efficient training

algorithms allow to substitute this by additional training data (generated via rules)

• Validation of the network output might be valuable, but there exist alternative (good) guarantees from statistical learning theory

• If-then rules are not very interesting since there exist good symbolic learners for learning propositional rules for classification

• Propositional rule insertion/extraction is often an essential part of more complex rule insertion/extraction mechanisms

• Demonstrates a key problem, different modes of representation, in a very nice way

• Some people e.g. in the medical domain also want an explanation for a classification

• There are at least two application domains where if-then rules are very interesting and not so easy to learn: fuzzy-control and unsupervised data mining

Page 13: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 13

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 14: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 14

AIFB

Useful: neurofuzzy systems

processinput observation

control

Fuzzy control:

if (observation є FMI) then (control є FMO)

Page 15: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 15

AIFB

Useful: neurofuzzy systems

Fuzzy control:

if (observation є FMI) then (control є FMO)

Neurofuzzy control:

observation control

implementation of the fuzzy-if-then rules

Benefit: the form of the fuzzy rules (i.e. neural architecture) and the shape of the fuzzy sets (i.e. neural weights) can be learned from data!

Page 16: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 16

AIFB

Useful: neurofuzzy systems

• NEFCON implements Mamdani control [Nauck/Klawonn/Kruse, 94]

• ANFIS implements Takagi-Sugeno control [Jang, 93]

• and many other …• Learning

– of rules: evolutionary or clustering– of fuzzy set parameters: reinforcement learning or

some form of Hebbian learning

Page 17: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 17

AIFB

Useful: data mining pipeline

Task: describe given inputs (no class information) by if-then rules

• Data mining with emergent SOM, clustering, and rule extraction [Ultsch, 91]

rules

Page 18: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 18

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 19: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 19

AIFB

State of the art: structure kernels

SV

Mkernelk(x,x’)

data

sets, sequences, tree structures, graph structures

just compute pairwise distances for this complex data using structure information

Page 20: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 20

AIFB

State of the art: structure kernels

• Closure properties of kernels [Haussler, Watkins]

• Principled problems for complex structures: computing informative graph kernels is at least as hard as graph isomorphism [Gärtner]

• Several promising proposals - taxonomy [Gärtner]:

count common substructures

derived from a probabilistic model

derived fromlocal transformations

syntax

semantic

Page 21: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 21

AIFB

State of the art: structure kernels

Count common substructures:

GAGAGA

GAT

GA AG AT

3 2 0

1 0 13

Efficient computation:dynamic programmingsuffix trees

locality improved kernel [Sonnenburg et al.], bag of words [Joachims]string kernel [Lodhi et al.], spectrum kernel [Leslie et al.]word-sequence kernel [Cancedda et al.]

convolution kernels for language [Collins/Duffy, Kashima/Koyanagi, Suzuki et al.]kernels for relational learning [Zelenko et al.,Cumby/Roth, Gärtner et al.]

graph kernels based on paths or subtrees [Gärtner et al.,Kashima et al.]kernels for prolog trees based on similar symbols [Passerini/Frasconi/deRaedt]

Page 22: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 22

AIFB

State of the art: structure kernels

Derived from a probabilistic model:

describe byprobabilistic model P(x)

compare characteristics of P(x)

Fisher kernel [Jaakkola et al., Karchin et al., Pavlidis et al., Smith/Gales, Sonnenburg et al., Siolas et al.]tangent vector of log odds [Tsuda et al.]marginalized kernels [Tsuda et al., Kashima et al.]

kernel of Gaussian models [Moreno et al., Kondor/Jebara]

Page 23: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 23

AIFB

State of the art: structure kernels

Derived from local transformations:

is similar

to

local neighborhood, generator H

expand to aglobal kernel

diffusion kernel [Kondor/Lafferty, Lafferty/Lebanon, Vert/Kanehisa]

Page 24: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 24

AIFB

State of the art: structure kernels

• Intelligent preprocessing (kernel extraction) allows an adequate integration of semantic/syntactic structure information

• This can be combined with state of the art neural methods such as SVM

• Very promising results for– Classification of documents, text [Duffy, Leslie, Lodhi, …]

– Detecting remote homologies for genomic sequences and further problems in genome analysis [Haussler, Sonnenburg, Vert, …]

– Quantitative structure activity relationship in chemistry [Baldi et al.]

Page 25: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 25

AIFB

Conclusions: feedforward networks

• propositional rule insertion and extraction (to some extend) are possible

• useful for neurofuzzy systems, data mining • structure-based kernel extraction followed by learning

with SVM yields state of the art results but: sequential instead of fully integrated neuro-

symbolic approach

• FNNs itself are restricted to flat data which can be processed in one shot. No recurrence

Page 26: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 26

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: partially recurrent networks

– Lots of theory: principled capacity and limitations

– To do: challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 27: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 27

AIFB

The basics : partially recurrent networks

xt+1 = f(xt,It)

[Elman, Finding structure in time, CogSci 90]very natural architecture for processing speech/temporal signals/control/robotics

1. can process time series of arbitrary length

2. interesting for speech processing [see e.g. Kremer, 02]

3. training using a variation of backpropagation [see e.g. Pearlmutter, 95]

Page 28: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 28

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 29: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 29

AIFB

Lots of theory: principled capacity and limitations

RNNs and finite automata [Omlin/Giles, 96]

input

state

output

dynamics of the transition function of a DFA

Page 30: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 30

AIFB

Lots of theory: principled capacity and limitations

DFA RNN

unary input

unary state representation

implement (approximate) the boolean formula corresponding to the state transition within a two-layer network

RNNs can exactly simulate finite automata

Page 31: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 31

AIFB

Lots of theory: principled capacity and limitations

RNN DFA

unary input

in general: distributed state representation

cluster into disjoint subsets corresponding to states and observe their behavior approximate description

approximate extraction of automata rules is possible

Page 32: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 32

AIFB

The principled capacity of RNNs can be characterized exactly:

Lots of theory: principled capacity and limitations

RNNs with small weights or Gaussian noise= finite memory models [Hammer/Tino, Maass/Sontag]

RNNs with limited noise = finite state automata[Omlin/Giles, Maass/Orponen]

RNNs with rational weights = Turing machines[Siegelmann/Sontag]

RNNs with arbitrary weights = non uniform Boolean circuits(super Turing capability) [Siegelmann/Sontag]

Page 33: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 33

AIFB

However, learning might be difficult:• gradient based learning schemes face the problem of long-term-

dependencies [Bengio/Frasconi]

• RNNs are not PAC-learnable (infinite VC-dim), only distribution dependent bounds can be derived [Hammer]

• there exist only few general guarantees for the long term behavior of RNNs, e.g. stability [Suykens, Steil, …]

Lots of theory: principled capacity and limitations

error

tatata tatatatatatatatatatatatatatatatatata

tatatatatatatatatatatatatatatatatatatatatatatatatata

Page 34: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 34

AIFB

RNNs • naturally process time series • incorporate plausible regularization such as a bias towards

finite memory models • have sufficient power for interesting dynamics (context free,

context sensitive; arbitrary attractors and chaotic behavior) but: • training is difficult • only limited guarantees for the long term behavior and

generalization ability

symbolic description/knowledge can provide solutions

Lots of theory: principled capacity and limitations

Page 35: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 35

AIFB

RNN recurrent symbolic system

real numbers;iterated function systems give rise to fractals/attractors/chaos, implicit memory

discrete states;crisp boolean function on the states + explicit memory

correspondence?e.g. attractor/repellor for counting anbncn

Lots of theory: principled capacity and limitations

Page 36: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 36

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 37: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 37

AIFB

To do: challenges

Training RNNs: • search for appropriate regularizations inspired by a focus on

specific functionalities – architecture (e.g. local), weights (e.g. bounded), activation function (e.g. linear), cost term (e.g. additional penalties) [Hochreiter,Boden,Steil,Kremer…]

• insertion of prior knowledge – finite automata and beyond (e.g. context free/sensitive, specific dynamical patterns/attractors) [Omlin,Croog,…]

Long term behavior:• enforce appropriate constraints while training• investigate the dynamics of RNNs – rule extraction,

investigation of attractors, relating dynamics and symbolic processing [Omlin,Pasemann,Haschke,Rodriguez,Tino,…]

Page 38: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 38

AIFB

To do: challenges

Some further issues:• processing spatial data

• unsupervised processing

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

[bicausal networks, Pollastri et al.,contextual RCC, Micheli et al.]

x1,x2,x3,x4,…[TKM, Chappell/Taylor, RecSOM, Voegtlin, SOMSD, Sperduti et al., MSOM, Hammer et al., general formulation, Hammer et al.]

Page 39: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 39

AIFB

Conclusions: recurrent networks

• the capacity of RNNs is well understood and promising e.g. for natural language processing, control, …

• recurrence of symbolic systems has a natural counterpart in the recurrence of RNNs

• training and generalization faces problems which could be solved by hybrid systems

• discrete dynamics with explicit memory versus real-valued iterated function systems

• sequences are nice, but not enough

Page 40: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 40

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 41: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 41

AIFB

The general idea: recursive distributed representations

How to turn tree structures/acyclic graphs into a connectionist representation?

vector

Page 42: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 42

AIFB

The general idea: recursive distributed representations

vector

recursion!

inp.cont.

output

cont.

f:ℝixℝcxℝcℝc

yields

fenc wherefenc(ξ)=0fenc(a(l,r))=f(a,fenc(l),fenc(r))

Page 43: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 43

AIFB

f:ℝn+2cℝc g:ℝcℝo

decoding:hdec: ℝo (ℝn)2*hdec(0) = ξhdec(x) = h0(x) (hdec(h1(x)), hdec(h2(x)))

inp.cont.

cont.

encoding

image

h:ℝoℝn+2o

label

leftright

encoding:fenc:(ℝn)2*ℝc

fenc(ξ) = 0fenc(a(l,r)) = f(a,fenc(l),fenc(r))

The general idea: recursive distributed representations

Page 44: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 44

AIFB

The general idea: recursive distributed representations

• recursive distributed description [Hinton,90]

– general idea without concrete implementation • tensor construction [Smolensky, 90]

– encoding/decoding given by (a,b,c) a⊗b⊗c– increasing dimensionality

• Holographic reduced representation [Plate, 95]

– circular correlation/convolution– fixed encoding/decoding with fixed dimensionality (but potential loss of

information) – necessity of chunking or clean-up for decoding

• Binary spatter codes [Kanerva, 96]

– binary operations, fixed dimensionality, potential loss– necessity of chunking or clean-up for decoding

• RAAM [Pollack,90], LRAAM [Sperduti, 94]

– trainable networks, trained for the identity, fixed dimensionality– encoding optimized for the given training set

inp.

con

t.co

nt.

code

lab

el

left

right

Page 45: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 45

AIFB

Nevertheless: results not promising

Theorem [Hammer]:• There exists a fixed size neural network which can

uniquely encode tree structures of arbitrary depth with discrete labels

• For every code, decoding of all trees up to height T requires Ω(2T) neurons for sigmoidal networks

encoding seems possible, but no fixed size architecture exists for decoding

The general idea: recursive distributed representations

Page 46: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 46

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 47: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 47

AIFB

One breakthrough: recursive networks

Recursive networks [Goller/Küchler, 96]:

• do not use decoding• combine encoding and mapping • train this combination directly for the given task with

backpropagation through structure

efficient data and problem adapted encoding is learned

inp.cont.

output

cont.

encodingtransformation

y

Page 48: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 48

AIFB

One breakthrough: recursive networks

Applications:• term classification [Goller, Küchler, 1996]• automated theorem proving [Goller, 1997]• learning tree automata [Küchler, 1998]• QSAR/QSPR problems [Schmitt, Goller, 1998; Bianucci, Micheli,

Sperduti, Starita, 2000; Vullo, Frasconi, 2003] • logo recognition, image processing [Costa, Frasconi, Soda, 1999,

Bianchini et al. 2005]• natural language parsing [Costa, Frasconi, Sturt, Lombardo, Soda,

2000,2005]• document classification [Diligenti, Frasconi, Gori, 2001]• fingerprint classification [Yao, Marcialis, Roli, Frasconi, Pontil, 2001]• prediction of contact maps [Baldi, Frasconi, Pollastri, Vullo, 2002]• protein secondary structure prediction [Frasconi et al., 2005]• …

Page 49: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 49

AIFB

One breakthrough: recursive networks

Desired: approximation completeness - for every (reasonable) function f and ε>0 exists a RecNN which approximates f up to ε (with appropriate distance measure)

Approximation properties can be measured in several ways:

given f, ε, probability P, data points xi, find fw such that

• P(x | |f(x)-fw(x)| > ε ) small (L1 norm) or

• |f(x)-fw(x)| < ε for all x (max norm) or

• f(xi) = fw(xi) for all xi (interpolation of points)

Page 50: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 50

AIFB

One breakthrough: recursive networks

Approximation properties for RecNNs and tree-structured data:

… capable of approximating every continuous function in max-norm with restricted height, every measurable function in L1-norm (σ:squashing) [Hammer]

… capable of interpolating every set {f(x1),…,f(xm)} with O(m2) neurons (σ:squashing, C2 in environment of t s.t. σ‘‘(t)≠0) [Hammer]

… can approximate every tree automaton for arbitrary large inputs [Küchler]

... cannot approximate every f:{1}2*{0,1} (for realistic σ) [Hammer]

… fairly good results - 3:1

Page 51: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 51

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 52: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 52

AIFB

Going on: towards more complex structures

More general trees:

arbitrary number of not positioned children

fenc=f(1/|ch| · ∑ch w · fenc(ch) · labeledge,label)

approximation completefor appropriate edge labels [Bianchini et al. 2005]

Page 53: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 53

AIFB

Going on: towards more complex structures

Planar graphs:

[Baldi,Frasconi,…,2002]

Page 54: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 54

AIFB

Going on: towards more complex structures

Acyclic graphs:

q-1

q-1

q-1

q+1

Contextual cascade correlation[Micheli,Sperduti,03]Approximation complete (under a mild structural restriction)even for structural transduction [Hammer,Micheli,Sperduti,05]

Page 55: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 55

AIFB

Going on: towards more complex structures

Cyclic graphs:

neighbor

[Micheli,05]

Page 56: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 56

AIFB

Conclusions: recursive networks

• Very promising neural architectures for direct processing of tree structures

• Successful applications and mathematical background

• Connections to symbolic mechanisms (tree automata)

• Extensions to more complex structures (graphs) are under development

• Only few approaches which achieve structured outputs

Page 57: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 57

AIFB

Tutorial Outline (Part I)Neural networks and structured knowledge

• Feedforward networks– The good old days: KBANN and co.

– Useful: neurofuzzy systems, data mining pipeline

– State of the art: structure kernels

• Recurrent networks– The basics: Partially recurrent networks

– Lots of theory: Principled capacity and limitations

– To do: Challenges

• Recursive data structures– The general idea: recursive distributed representations

– One breakthrough: recursive networks

– Going on: towards more complex structures

Page 58: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 58

AIFB

Conclusions (Part I)

Overview literature:• FNN and rules: Duch,Setiono,Zurada,Computational intelligence methods

for understanding of data, Proc. of the IEEE 92(5):771- 805, 2004 • Structure kernels: Gärtner,Lloyd,Flach, Kernels and distances for structured

data, Machine Learning, 57, 2004. (new overview is forthcoming)• RNNs: Hammer,Steil, Perspectives on learning with recurrent networks, in:

Verleysen, ESANN'2002, D-side publications, 357-368, 2002• RNNs and rules: Jacobsson, Rule extraction from recurrent neural

networks: a taxonomy and review, Neural Computation, 17:1223-1263, 2005• Recursive representations: Hammer, Perspectives on Learning Symbolic

Data with Connectionistic Systems, in: Kühn, Menzel, Menzel, Ratsch, Richter, Stamatescu, Adaptivity and Learning, 141-160, Springer, 2003.

• Recursive networks: Frasconi,Gori,Sperduti, A General Framework for Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks, 9(5):768-786,1998

• Neural networks and structures: Hammer,Jain, Neural methods for non-standard data, in: Verleysen, ESANN'2004, D-side publications, 281-292, 2004

Page 59: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 59

AIFB

Conclusions (Part I)

• There exist networks which can directly deal with structures (sequences, trees, graphs) with good success: kernel machines, recurrent and recursive networks

• Efficient training algorithms and theoretical foundations exist

• (Loose) connections to symbolic processing have been established and indicate benefits

Now: towards strong connections PART II: Logic and neural networks

Page 60: Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1 Connectionist Knowledge Representation and Reasoning (Part I) Neural.

Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005

Slide 60

AIFB