Tom M. Mitchell
E. Fredkin Professor and Department Head
March 2007
The Discipline and Futureof Machine Learning
The Discipline of Machine Learning
The defining question: • How can we build computer systems that automatically
improve with experience, and what are the fundamental laws that govern all learning processes?
A process learns with respect to <T,P,E> if it • Improves its performance P• at task T• through experience E
Machine Learning - Practice
Object recognition
Mining Databases
Speech Recognition
Control learning
• Reinforcement learning
• Supervised learning
• Bayesian networks
• Hidden Markov models
• Unsupervised clustering
• Explanation-based learning
• ....
Extracting facts from text
Machine Learning - Theory
PAC Learning Theory
# examples (m)
representational complexity (H)
error rate ()failure probability ()
Other theories for
• Reinforcement skill learning
• Semi-supervised learning
• Active student querying
• …
… also relating:
• # of mistakes during learning
• convergence rate
• asymptotic performance
• bias, variance
• VC dimension
(for supervised concept learning)
The Discipline of Machine Learning
Machine Learning: • How can we build computer systems that automatically improve with
experience, and what are the fundamental laws that govern all learning processes?
Computer Science:• How can we build machines that solve problems, and which
problems are inherently tractable/intractable?
Statistics:• What can be learned from data with a set of modeling assumptions,
while taking into account the data-collection process?
Animal learning (Cognitive science,
Psychology, Neuroscience)Machine learning
Statistics
Computer science
Adaptive Control Theory
and
Robotics
Evolution
Economics
ML and CS
• Machine learning already the preferred approach to– Speech recognition, Natural language processing– Computer vision– Medical outcomes analysis– Many robot control problems– …
• The ML niche will grow– Why?
All software
ML software
ML and Empirical Sciences• Empirical science is a learning process, subject to automation and to study
– improve performance P (accuracy)
– at task T (predict which gene knockouts will impact the aromatic AA pathway, and how)
– with experience E (active experimentation)
Functional genomic hypothesis generation and experimentation by a robot scientist, King et al., Nature, 427(6971), 247-252
Which protein ORFs influence which enzymes in the AAA pathway
Our current state:
• The problem of tabula-rasa function approximation is solved (in an 80-20 sense): – Given:
• Class of hypotheses H = {h: X Y}
• Labeled examples {<xi,f(yi)>}
– Determine:• The h from H that best approximates f
• It’s time to move on– Enrich the function approx problem definition– Use function approx as building block– Work on new problems
Some Current Research Questions
• When/how can unlabeled data be useful in function approximation?
• How can assumed sparsity of relevant features be exploited in high dimensional nonparametric learning?
• How can information learned from one task be transferred to simplify learning another?
• What algorithms can learn control strategies from delayed rewards and other inputs?
• What are the best “active learning” strategies for different learning problems?
• To what degree can one preserve data privacy while obtaining the benefits of data mining?
The Future of Machine Learning
A Quick Look Back
1960 1970 1980 1990 2000
Samuel’s checker learner
Perceptrons
Winston’s symbolic
concept learner
Rule learning
Decision tree learning
Neural networks
Explanation-based
learning
Dimensionality reduction
Bayes nets
PAC learning theory
Architectures for learning
and problem solving
Reinforcement learning
Semi-supervised
learning
Non-parametric methods
Statistical perspective on learning
HMMs SVMs
Theories of grammar induction
Large scale datamining
Speech applications
Robot control
Privacy preserving data mining
Transfer learning
Version Spaces
Theories of perceptron
capacity and learnability
Evolutionary and revolutionary changes
What might lead to the next revolution?
1. Use Machine Learning to help understand Human Learning(and vice versa)
Models of Learning Processes
• # of examples• Error rate• Reinforcement learning• Explanations
• Learning from examples• Complexity of learner’s
representation• Probability of success• Prior probabilities• Loss functions
• # of examples• Error rate• Reinforcement learning• Explanations
• Human supervision– Lectures– Questions, Homeworks
• Attention, motivation• Skills vs. Principles• Implicit vs. Explicit learning• Memory, retention, forgetting• Hebbian learning, consolidation
Machine Learning: Human Learning:
Reinforcement Learning
...]rγr γE[r(s)V 2t2
1tt*
[Sutton and Barto 1981; Samuel 1957]
Observed immediate reward
Learned sum of future rewards
Reinforcement Learning in ML
r =100
V=100
0
V=72 V=81 V=90
= .9
...]rγr γE[r)V(s 2t2
1ttt
S0 S2S1 S3
)V(s γ]E[r)V(s 1ttt
To learn V, use each transition to generate a training signal:
Reinforcement Learning in ML
• Variants of RL have been used for a variety of practical control learning problems – Temporal Difference learning– Q learning – Learning MDPs, POMDPs
• Theoretical results too– Assured convergence to optimal V(s) under certain conditions– Assured convergence for Q(s,a) under certain conditions
)V(s)V(s γr error training t1tt
Dopamine As Reward Signal
[Schultz et al., Science, 1997]
t
Dopamine As Reward Signal
[Schultz et al., Science, 1997]
t
Dopamine As Reward Signal
[Schultz et al., Science, 1997]
t
)V(s)V(s γr error t1tt
RL Models for Human Learning[Seymore et al., Nature 2004]
[Seymore et al., Nature 2004]
Human and Machine Learning
Additional overlaps:
• Learning of perceptual representations– Dimensionality reduction methods, low level percepts– Lewicky et al.: optimal sparse codes of natural scenes yield gabor
filters found in primate visual cortex. Similar result for auditory cortex.
• Learning with redundant sensory input– CoTraining methods, Sensory redundancy hypothesis in development– De Sa & Ballard; Coen: co-clustering voice/video yields phonemes– Mitchell & Perfetti: co-training in second language learning
• Learning and explanations– Explanation-based learning, teaching concepts & skills, chunking– VanLehn et al: explanation-based learning accounts for some human
learning behaviors.– Chi: students learn best when forced to explain– Newell; Anderson: chunking/knowledge-compilation models
2. Never-ending learning
Never-Ending Learning
Current machine learning systems: • Learn one function• Are shut down after they learn it• Start from scratch when programmed to learn the next
function
Let’s study and construct learning processes that:• Learn many different things• Formulate their own next learning task• Use what they have already learned to help learn the
next thing
Example: Never-ending learning robot
Imagine a robot with three goals: (1) avoid collisions, (2) recharge when battery low, and (3) find and collect trash
What is stopping us from giving it some trash examples, then letting it learn for a year?
What must it start with to formulate and solve relevant learning subtasks?• Learn to recognize trash in scene• Learn where to search for trash, and when• Learn how close to get to find out whether trash is there• Learn to manipulate trash• Transfer what it learned about paper trash to help with bottle trash• Discover relevant subcategories of trash (e.g., plastic versus glass
bottles), and of other objects in the environment
Core Questions for Never-Ending Learning Agent
• What function or fact to learn next?– Self-reflection on performance, credit assignment
• What representation for this target function or fact?– Choice of input-output representation for target function– E.g., “classify whether it’s trash”
• How to obtain (which type of) training experience?– Primarily self-supervised, but occasional teacher input– E.g., “classify whether it’s trash”
• Guided by what prior knowledge?– Transfer learning, but transfer between what?– XPaperTrash help learn XPlasticTrash ?– State(t) x Action(t) State(t+1) help learn XPlasticTrash ?
Example: Never-ending language learner
Read the Web project: Create 24x7 web agent that each day:• Extracts more facts from the web into structured database• Learns to extract facts better than yesterday
Starting point:• Ontology of hundreds of categories and relations
– and 6-10 training examples of each
• Never-ending learning architecture– State of art language processing primitives– Learning mechanisms
• Top level task:– Populate a database of these categories and relations by reading
the web, and improve continually
[Carlson, Cohen, Fahlman, Hong, Nyberg, Wang, ...]
Q: how can it obtain useful training experience (i.e., self-supervise)?
A: redundancy
Bootstrapping: Learning to extract named entities
I arrived in Pittsburgh on Saturday.
location?
x1: I arrived in _________ on Saturday.
x2: Pittsburgh
Bootstrap learning to extract named entities[Riloff and Jones, 1999], [Collins and Singer, 1999], ...
Iterations
InitializationAustraliaCanada China England France Germany Japan Mexico Switzerland United_states
locations in ?x
South AfricaUnited KingdomWarrentonFar_EastOregonLexingtonEuropeU.S._A.Eastern CanadaBlairSouthwestern_statesTexasStatesSingapore …
operations in ?x
ThailandMaineproduction_controlnorthern_LosNew_Zealandeastern_EuropeAmericasMichigan New_HampshireHungarysouth_americadistrictLatin_AmericaFlorida ...
republic of ?x
…...
Co-Training
Answer1
Classifier1
Answer2
Classifier2
I flew to New York today.
New York I flew to ____ today
Idea: Train Classifier1 and Classifier2 to:
1. Correctly classify labeled examples
2. Agree on classification of unlabeled
Co-Training Theory [Blum&Mitchell 98; Dasgupta 04, ...]
Final Accuracy
# unlabeled examples
Conditional dependence among inputs
# labeled examples
Number of redundant inputs
want inputs less dependent, increased number of redundant inputs, …
)()()()(,
:
:
221121
21
xfxgxgxggand
ondistributiunknownfromdrawnxwhere
XXXwhere
YXflearn
settingCoTraining
disagreement over unlabeled examples can bound true error
Example Bootstrap learning algorithms:
• Classifying web pages [Blum&Mitchell 98; Slattery 99]
• Classifying email [Kiritchenko&Matwin 01; Chan et al. 04]
• Named entity extraction [Collins&Singer 99; Jones 05]
• Wrapper induction [Muslea et al., 01; Mohapatra et al. 04]
• Word sense disambiguation [Yarowsky 96]
• Discovering new word senses [Pantel&Lin 02]
• Synonym discovery [Lin et al., 03]
• Relation extraction [Brin et al.; Yangarber et al. 00]
• Statistical parsing [Sarkar 01]
What is relation between “Elvis” and “January 8”?
Q: how can it choose next learning task?
A: self-reflect on where it is failing, then formulate learning task to repair failure
Some strategies for generating new tasks
• Collect more data from web– To learn about specific entities (e.g., “Rolling Stones”) – To learn meaning of particular language (e.g., “will attend”)– To locate easy-to extract facts (e.g., web pages with lists)
• Learn regularities from the populated KB– “Most LTI office names are of the form “NSH dddd”
• Explore specializations of ontological categories– What distinguishes events occurring on CMU campus from
those who occurring elsewhere? Can this be predicted? What subsets of events warrant becoming categories?
• Explore specializations of language structures– Which ‘location’ entities share surrounding language? e.g., “the city of ?x,” Do they share other properties?
Some Types of Knowledge to Learn
• Linguistic regularities– {“spoon”,”fork”,”chopsticks”} occur often in “eat with my ___”
– They’re instances of ontology class “eating implement”
• HTML layout regularities– HTML lists often contain items of the same class
• Web site regularities– University departments often have page listing all faculty
• Regularities over extracted facts– ‘Professors typically have more publications than their advisees’
– ‘Professors typically received their BS degree before their advisees’
• Temporal stability– Birthdays don’t change. Stock prices do.
Research Issues
• What target knowledge representation?• How can initial ontology be extended?• What types of self-reflection are required?• Can one learn language without non-linguistic
knowledge?• How can we manage mapping between text
tokens and non-text entities they describe?• What curriculum for staging the learning?• What active learning methods?
More Revolutionary Research Directions
• Can we design new kinds of computer programming languages with explicit learning primitives?
• Can we build robot scientists?
• What are the fundamental tradeoffs between computational efficiency and statistical efficiency?
• How can we build systems that learn from instruction, dialogs and problem sets, in addition to labeled examples?
• How can we unify machine learning theories and models with those from other fields studying adaptation, eg., adaptive control theory, economics, evolution?
Summary
• Machine Learning research is (should be more) connected to understanding all learning processes
• Field is ripe for new revolutionary directions:– Computational models for human learning– Never-ending learners– <your idea here>
Thank you!
Top Related