© 2008 SRI International Systems Learning for Complex Pattern Problems Omid Madani AI Center, SRI...

© 2008 SRI International

Systems Learning for Complex Pattern ProblemsOmid MadaniAI Center, SRI International


Foundations of Intelligence: Concepts (Categories)• Intelligent systems categorize their perceptions (objects, events, relations) • Categorization involves substantial abstraction: you rarely see the same exact

thing again…• Categorization is necessarily for intelligence• Categories are complex: have adaptive structure, composed of parts, of

absrtactions,…• High intelligence (advanced animals) requires myriad categories

What are the principles behind such learning and development?

• Assumptions/Evidence: These (perceptual) categories are developed mainly in an unsupervised manner

– Doubtful they are all programmed in.. Many are not (in particular, for humans)– Explicit teacher is absent


Example Perceptual Concepts• In text, every word, phrase, expression: “book”, “new”, “a”, …• Single characters are primitive concepts: “a”, “b”, …, “1”,”2”, “;” ….• Concepts can be composed of other concepts:

– “n”+”e” = “ne”– “new” + “york” = “new york”

• Concepts can be abstractions: – week-day = {Monday, Tuesday, ….}– Digits = {1,2,3,4,….}

• Area code is a concept that involves both composing and abstraction:– Composition of 3 digits– A digit is a grouping, i.e., the set {0,1,2,…,9} ( 2 is a digit )

• Other examples: phone number, address, resume page, face (in visual domain), etc.


Acquiring and Developing Concepts

• Higher intelligence, such as “advanced” pattern recognition/generation (e.g. vision), may require

– Long term learning (weeks, months, years,…)– Cumulative learning (learn these first, then these, then these,…)– Massive Learning: Myriad inter-related categories/concepts– Systems learning: multiple algorithms working together– Autonomy (relatively little human involvement)

What are the learning processes?

?

Applications: learning to segment words in speech stream in any language, visual object recognition, learn to play Go/Chess


Prediction System

…. 0011101110000….

After a while(much learning)

predict observe & update

Prediction System

observe & updatepredict

low level or “hard-wired” categories

higher level categories(bigger chunks)

(Input say text: characters, .. or vision: edges, curves,…)

(e.g. words, digits, phrases, phone numbers, faces, visual objects, home pages, sites,…)

• In a nutshell, we seek a system such that:

Learning by Repeatedly Predicting in a Rich World

Prediction Games in Infinitely Rich Worlds, AAAI FSS07


“ther ”

Example Category Node (processed Jane Austen’s online books)

“and ”

“heart”0.087

0.07

0.057

0.052

0.13

0.11

“love ”0.10

“by ”

(Exploring Massive Learning via a Prediction System, AAAI FSS’07)

7.1 0.41(keep local statistics)

prediction weights

categories appearing before

“ bro”

“ far”

“toge”

“nei”


Some Challenges or Features of the Task

• Lots of – Features/predictors (input dimensionality), – classes (output dimensionality), – instances (episodes)

• Uncertainty in the value of features, classes, adequate segmentation, …– No one segments them for us! (what about written language?)

• Require algorithms that are primarily:– incremental, handle nonstationarities, uncertainty, asymptotic

convergence, efficient sample complexity• Objectives and evaluation criteria?


Many-Class Learning (.. A Wiring Problem)• The questions raised during this research:1. Given the need to quickly classify (a given instance) into one of myriad classes (e.g. millions), how can this be done?1. How about space efficiency? 2. How can we efficiently learn such efficient classification systems?

many-class learning

classification system

x ?,nRx


A Solution: Index Learning

features categoriesinstances

Input:tripartite graph

learn

features categories

Output: an index = sparse weighted bipartite graph

if jcijw

1c

if

jc

ijw

0

0

0

Output:A (sparse) matrix

W


Classification/Prediction (retrieval & scoring)

}f,f{x 32

1. Features are “activated”

features classes

c1

c2

c3

c4

c5

f1

f2

f3

f42. Edges are activated

3. Receiving classes are activated4. Classes sorted/ranked

).,c(),.,c(),.,c(),.,c(

:list sorted

10104050 1534

40.

30.20.

10.

10.

see omadani.net for the learning algorithms


Summary• Encouraging signs that elements of unsupervised (more

“autonomous”) long-term learning systems are developing:– For instance, efficient many-class learning a good possibility– Good progress in machine learning (e.g. some evidence that hierarchical networks are

useful)• Our work stresses large-scale and long-term learning

– A “systems” approach (compared to traditional neural network approaches): we require to solve multiple problems and need multiple algorithms

– Many challenges: Uncertainties (e.g. feature noise and label noise) Nonstationarities (concepts evolve, the system evolves and develops) System objective(s)? Avoiding accumulation of error, local minima, slow learning Understanding the interaction between different modules (segmentation and concept learning,

etc.)

• Driven by goal of robustly solving practical problems (versus driven by “modeling” the brain), but problems that we think intelligence in the biological world solves.


… New Jersey in …

predictors (active categories)

window containing contextand target

target (category to predict)

… New Jersey in …

next time step

predictors

target

In this example, context contains one category on each side

Expedition (a 1st System)


… loves New York life …

predictors

window containing contextand target

target (category to predict)

.. Some Time Later ..

In terms of supervised learning/classification, in this learning activity (prediction games):• The set of concepts grows over time• Same for features/predictors (concepts ARE the predictors!)• Instance representation (segmentation of the data stream) changes/grows over time ..


On Learning a Task (or a Dilemma of AI!)

Program It!

Learn It!Program to Learn It!

Program to Learn to Learn It!...


A View of ML: On the Source of Classes(A Spectrum of Feedback-Driven (“Supervised”) Learning)

1. Machine defined2. Implicitly assigned (by the “world” or a “natural” activity/machine)

1. Human defined2. Human/Explicitly assigned(human procures training data)

1. Human defined2. Implicitly assigned (by the “world” or a “natural” activity, or by machine)

More machine autonomy (less human involvement)More noise/uncertaintyMore training dataMore classes More open problems!More interesting!

(classic supervised learning )Annotator/Editorial label assignment, (Reuters RCV1, ODP,…) controlled image tagging, ~mechanical Turk, explicit personalization (news filtering, spam,…)

predict a word using

context in text

The Newsgroup data setImage tagging in FlickerUsers as classesQueries as classesPredict clicks…..

Autonomouslearning systems( systems acquiring and developing their own concepts, prediction games, complex sensory input streams, cumulative learning, life-long learning, development,... )


Summary

See omadani.net/publications.html

© 2008 SRI International Systems Learning for Complex Pattern Problems Omid Madani AI Center, SRI...

Documents

Transcript of © 2008 SRI International Systems Learning for Complex Pattern Problems Omid Madani AI Center, SRI...