Practical Probabilistic Relational Learning Sriraam Natarajan.
-
Upload
malcolm-lester -
Category
Documents
-
view
220 -
download
0
Transcript of Practical Probabilistic Relational Learning Sriraam Natarajan.
Practical Probabilistic Relational Learning
Sriraam Natarajan
Take-Away Message
Learn from rich, highly structured data!
Traditional Learning
+
DataAttributes(Features)
Data is i.i.d.
B E A M J
1 0 1 1 0
0 0 0 0 1
. . .
0 1 1 0 1
Earthquake
Alarm
Burglary
MaryCalls
JohnCalls
Learning
Earthquake
Alarm
Burglary
MaryCalls JohnCalls
0.08 0.92 0.01 0.99
0.1 0.9
0.55 0.45
0.6 0.4
0.95 0.05
0.3 0.7
0.8 0.2
0.1 0.9
0.9 0.1
PatientID Date Prescribed Date Filled Physician Medication Dose Duration
P1 5/17/98 5/18/98 Jones prilosec 10mg 3 months
PatientID SNP1 SNP2 … SNP500K
P1 AA AB BB P2 AB BB AA
Real-World Problem: Predicting Adverse Drug Reactions
PatientID Gender Birthdate
P1 M 3/22/63
PatientID Date Physician Symptoms Diagnosis
P1 1/1/01 Smith palpitations hypoglycemic P1 2/1/03 Jones fever, aches influenza
PatientID Date Lab Test Result
P1 1/1/01 blood glucose 42 P1 1/9/01 blood glucose 45
Pati
en
t Ta
ble
Vis
it T
ab
le
Lab
Tests
SN
P T
ab
le
Pre
scri
pti
on
s
Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models
Logic
Probabilities
Add ProbabilitiesStatistical Relational
Learning (SRL)
• Several previous SRL Workshops in the past decade• This year – StaRAI @ AAAI 2013
Add Relations
PropositionalLogic
First Order Logic
Statistical Relational Learning
Probability Theory Probabilistic Logic
Inductive Logic Programming
Classical MachineLearning
Prop Rule Learning
Deterministic
Stochastic
Learning
No Learning
Prop FO
Costs and Benefits of the SRL soup
BenefitsRich pool of different languagesVery likely that there is a language that fits your task
at hand wellA lot research remains to be done, ;-)
Costs“Learning” SRL is much harderNot all frameworks support all kinds of inference and
learning settings
How do we actually learn relational models from data?
Why is this problem hard?
Non-convex problem Repeated search of parameters for every step in
induction of the model First-order logic allows for different levels of
generalization Repeated inference for every step of parameter
learningInference is P# complete
How can we scale this?
Relational Probability Trees
Each conditional probability distribution can be learned as a tree
Leaves are probabilities The final model is the
set of the RRTs
male(X)
chol(X,Y,L), Y>40,L>200
diag(X,Hypertension,Z),Z>55
bmi(X,W,55), W>30
0.8
0.77
0.05
0.3
noyes
noyes
no
no
yes
yes[Blockeel & De Raedt ’98]
To predict heartAttack(X)
…
Gradient (Tree) Boosting [Friedman Annals of Statistics 29(5):1189-1232, 2001]
Models = weighted combination of a large number of small trees (models) Intuition: Generate an additive model by sequentially fitting small trees to
pseudo-residuals from a regression at each iteration…
Data
Predictions
- Residuals=Data
+Loss fct
Initial Model+
++
Induce
Iterate
Final Model =
+ + + +…
Boosting Results – MLJ 11Algo Likelihood AUC-ROC AUC-PR Time
Boosting 0.810 0.961 0.930 9sMLN 0.730 0.535 0.621 93 hrs
Predicting the advisor for a
student
Movie Recommendation
Citation Analysis Machine Reading
Other Applications
Similar Results in several other problems Imitation Learning – Learning how to act from
demonstrations (Natarajan et al IJCAI ‘11) Robocup, a grid world domain, traffic signal domain and blocksworld
Prediction of CAC Levels – Predicting cardio-vascular risks in young adults (Natarajan et al – IAAI 13)
Prediction of heart attacks (Weiss et al – IAAI 12, AI Magazine 12)
Prediction of onset of Alzheimer’s (Natarajan et al ICMLA ’12, Natarajan et al IJMLC 2013)
Parallel Lifted Learning
Stochastic ML
Statistical Relational
Scales well, stochastic gradients, online learning, …
Symmetries, compact models, lifted inference, ….
ParallelSymmetries, compact models, lifted inference, ….
Symmetry based inference
1
3
5
42 3
2
1
4
5
1
3
5
42
1
3
5
42
P(Anna) HI (Bob)
P(Bob)HI(Anna)
root clause
P(Anna) !P(Bob)
neighboring clauses
P(Anna) => !HI(Bob)
P(Anna) => HI(Anna)
P(Bob) => HI(Bob)
P(Bob) => !HI(Anna)
Tree (set of clauses)
P(Anna)!P(Bob)P(Bob)=> HI(Bob)P(Bob)=> !HI(Anna)
Variabilized tree
P(X)!P(Y)P(Y)=> HI(Y)P(Y)=> !HI(X)
Lifted TrainingGenerate tree
pieces from corresponding
patterns.
Compute gradient using lifted BP
Update covariance matrix C or some low rank variant
Update parameter vector and the corresponding
equations
Randomly draw mini-batches
Generate initial tree pieces and
variablize its arguments.
Challenges
Message schedules Iterative Map-reduce? How do we take this idea to learning the
models?How can we more efficiently parallelize
symmetry identification?What are the compelling problems? Vision,
NLP,…
Conclusion
The world is inherently relational and uncertain SRL has developed into an exciting field in the past decade
Several previous SRL workshops Boosting Relational models has promising initial results
Applied to several different problems First scalable relational learning algorithm How can we parallelize/scale this algorithm? Can this benefit from an inference algorithm like Belief
Propagation that can be parallelized easily?