Applications of Probabilistic Logic to Materials Discovery: Solving problems with hard and soft...
-
Upload
adolfo-neto -
Category
Education
-
view
210 -
download
0
description
Transcript of Applications of Probabilistic Logic to Materials Discovery: Solving problems with hard and soft...
Ignorance Motivation PSAT oPSAT Application Conclusion
Applications of Probabilistic Logic to
Materials Discovery
Solving problems with hard and soft constraints
Marcelo Finger
Department of Computer ScienceInstitute of Mathematics and Statistics
University of Sao Paulo, Brazil
Nov 2013
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Topics
1 What You Probably Don’t Know
2 Pragmatic Motivation
3 Probabilistic Satisfiability
4 Optimizing Probability Distributions with oPSAT
5 oPSAT and Combinatorial Materials Discovery
6 Conclusions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Next Topic
1 What You Probably Don’t Know
2 Pragmatic Motivation
3 Probabilistic Satisfiability
4 Optimizing Probability Distributions with oPSAT
5 oPSAT and Combinatorial Materials Discovery
6 Conclusions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
My Anguish
How to Lead My Career?
Basic Research
Application Oriented Research
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
My Anguish
How to Lead My Career?
Basic Research
Advantages: Does not require a lot of resources; there is acommunity to talk to; can be beautiful
Application Oriented Research
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
My Anguish
How to Lead My Career?
Basic Research
Advantages: Does not require a lot of resources; there is acommunity to talk to; can be beautifulProblems: Beauty is hard to achieve; boring, not interestingto most; attracts little money
Application Oriented Research
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
My Anguish
How to Lead My Career?
Basic Research
Advantages: Does not require a lot of resources; there is acommunity to talk to; can be beautifulProblems: Beauty is hard to achieve; boring, not interestingto most; attracts little money
Application Oriented Research
Advantages: Sexy, attracts visibility and money
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
My Anguish
How to Lead My Career?
Basic Research
Advantages: Does not require a lot of resources; there is acommunity to talk to; can be beautifulProblems: Beauty is hard to achieve; boring, not interestingto most; attracts little money
Application Oriented Research
Advantages: Sexy, attracts visibility and moneyProblems: Very hard to obtain data (and money is a must!),easily obsolescent
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
My Anguish
How to Lead My Career?
Basic Research
Advantages: Does not require a lot of resources; there is acommunity to talk to; can be beautifulProblems: Beauty is hard to achieve; boring, not interestingto most; attracts little money
Application Oriented Research
Advantages: Sexy, attracts visibility and moneyProblems: Very hard to obtain data (and money is a must!),easily obsolescent
No answer, but keep this in mind
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
What you most certainly don’t know
Combinatorial Chemistry: discovery of new products
Lots of experimentsEach has a different settingsSearch for the right setting that produces an interestingproductVery useful to find new medicine, paint, adhesives, etc.
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Combinatorial Materials Science
Search for new materials.
Aim: new catalysts for the purification of water and air
Buzz word: Sustainability
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Combinatorial Materials Science
Search for new materials.
Aim: new catalysts for the purification of water and air
Buzz word: Sustainability
This work was developed within the
Institute of Computational Sustainability
at the Department of Computer Science, Cornell University
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The Materials Discovery Problem
In a reaction, where are the materials formed?
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The Crazy Idea
Use Probabilistic Logic to solve this problem
Consider only peaks
Each peak is in a layer
Connect all peaks in the same layer with a graph
Model graphs with propositional logic
Each disconnected peak is a defect
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The Crazy Idea
Use Probabilistic Logic to solve this problem
Consider only peaks
Each peak is in a layer
Connect all peaks in the same layer with a graph
Model graphs with propositional logic
Each disconnected peak is a defect
Different models, different graphs, different defects
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The Crazy Idea
Use Probabilistic Logic to solve this problem
Consider only peaks
Each peak is in a layer
Connect all peaks in the same layer with a graph
Model graphs with propositional logic
Each disconnected peak is a defect
Different models, different graphs, different defects
Limit the probability of the occurrence of defects
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Challenges: Only for the tough
Logic
Probability
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Challenges: Only for the tough
Logic
Probability
Linear Algebra
Linear Programming
Optimization
Algorithms
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Student Profile
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Next Topic
1 What You Probably Don’t Know
2 Pragmatic Motivation
3 Probabilistic Satisfiability
4 Optimizing Probability Distributions with oPSAT
5 oPSAT and Combinatorial Materials Discovery
6 Conclusions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Pragmatic Motivation
Practical problems combine real-world hard constraints withsoft constraints
Soft constraints: preferences, uncertainties, flexiblerequirements
We explore probabilistic logic as a mean of dealing withcombined soft and hard constraints
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Goals
Aim: Combine Logic and Probabilistic reasoning to dealwith Hard (L) and Soft (P) constraints
Method: develop optimized Probabilistic Satisfiability(oPSAT)
Application: Demonstrate effectiveness on a real-worldreasoning task in the domain of Materials Discovery.
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
An Example
Summer course enrollment
m students and k summer courses.Potential team mates, to develop coursework. Constraints:
Hard Coursework to be done alone or in pairs.Students must enroll in at least one and at mostthree courses.There is a limit of ℓ students per course.
Soft Avoid having students with no teammate.
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
An Example
Summer course enrollment
m students and k summer courses.Potential team mates, to develop coursework. Constraints:
Hard Coursework to be done alone or in pairs.Students must enroll in at least one and at mostthree courses.There is a limit of ℓ students per course.
Soft Avoid having students with no teammate.
In our framework: P(student with no team mate) “minimal” orbounded
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Combining Logic and Probability
Many proposals in the literature
Markov Logic Networks [Richardson & Domingos 2006]Probabilistic Inductive Logic Prog [De Raedt et. al 2008]Relational Models [Friedman et al 1999], etc
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Combining Logic and Probability
Many proposals in the literature
Markov Logic Networks [Richardson & Domingos 2006]Probabilistic Inductive Logic Prog [De Raedt et. al 2008]Relational Models [Friedman et al 1999], etc
Our choice: Probabilistic Satisfiability (PSAT)
Natural extension of Boolean LogicDesirable properties, e.g. respects Kolmogorov axiomsProbabilistic reasoning free of independence presuppositions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Combining Logic and Probability
Many proposals in the literature
Markov Logic Networks [Richardson & Domingos 2006]Probabilistic Inductive Logic Prog [De Raedt et. al 2008]Relational Models [Friedman et al 1999], etc
Our choice: Probabilistic Satisfiability (PSAT)
Natural extension of Boolean LogicDesirable properties, e.g. respects Kolmogorov axiomsProbabilistic reasoning free of independence presuppositions
What is PSAT?
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Next Topic
1 What You Probably Don’t Know
2 Pragmatic Motivation
3 Probabilistic Satisfiability
4 Optimizing Probability Distributions with oPSAT
5 oPSAT and Combinatorial Materials Discovery
6 Conclusions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Is PSAT a Zombie Idea?
An idea that refuses to die!
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
A Brief History of PSAT
Proposed by [Boole 1854],On the Laws of Thought
Rediscovered several times since Boole
De Finetti [1937, 1974], Good [1950], Smith [1961]Studied by Hailperin [1965]Nilsson [1986] (re)introduces PSAT to AIPSAT is NP-complete [Georgakopoulos et. al 1988]Nilsson [1993]: “complete impracticability” of PSATcomputationMany other works; see Hansen & Jaumard [2000]
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Or a Wild Amazonian Flower?
Awaits special conditions to bloom!
(Linear programming + SAT-based techniques)
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The Setting
Formulas α1, . . . , αℓ over logical variables P = {x1, . . . , xn}
Propositional valuation v : P → {0, 1}
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The Setting
Formulas α1, . . . , αℓ over logical variables P = {x1, . . . , xn}
Propositional valuation v : P → {0, 1}
A probability distribution over propositional valuations
π : V → [0, 1]
2n∑
i=1
π(vi ) = 1
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The Setting
Formulas α1, . . . , αℓ over logical variables P = {x1, . . . , xn}
Propositional valuation v : P → {0, 1}
A probability distribution over propositional valuations
π : V → [0, 1]
2n∑
i=1
π(vi ) = 1
Probability of a formula α according to π
Pπ(α) =∑
{π(vi )|vi (α) = 1}
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The PSAT Problem
Consider ℓ formulas α1, . . . , αℓ defined on n atoms{x1, . . . , xn}
A PSAT problem Σ is a set of ℓ restrictions
Σ = {P(αi ) S pi |1 ≤ i ≤ ℓ}
Probabilistic Satisfiability: is there a π that satisfies Σ?
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
The PSAT Problem
Consider ℓ formulas α1, . . . , αℓ defined on n atoms{x1, . . . , xn}
A PSAT problem Σ is a set of ℓ restrictions
Σ = {P(αi ) S pi |1 ≤ i ≤ ℓ}
Probabilistic Satisfiability: is there a π that satisfies Σ?
In our framework, ℓ = m + k , Σ = Γ ∪Ψ:
Hard Γ = {α1, . . . , αm}, P(αi ) = 1 (clauses)Soft Ψ = {P(si ) ≤ pi |1 ≤ i ≤ k}
si atomic; pi given, learned or minimized
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Example continued
Only one course, three student enrollments: x , y and z
Potential partnerships: pxy and pxz , mutually exclusive.Hard constraint
P(x ∧ y ∧ z ∧ ¬(pxy ∧ pxz)) = 1
Soft constraints
P(x ∧ ¬pxy ∧ ¬pxz) ≤ 0.25P(y ∧ ¬pxy ) ≤ 0.60P(z ∧ ¬pxz) ≤ 0.60
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Example continued
Only one course, three student enrollments: x , y and z
Potential partnerships: pxy and pxz , mutually exclusive.Hard constraint
P(x ∧ y ∧ z ∧ ¬(pxy ∧ pxz)) = 1
Soft constraints
P(x ∧ ¬pxy ∧ ¬pxz) ≤ 0.25P(y ∧ ¬pxy ) ≤ 0.60P(z ∧ ¬pxz) ≤ 0.60
(Small) solution: distribution π
π(x , y , z ,¬pxy ,¬pxz) = 0.1 π(x , y , z , pxy ,¬pxz) = 0.4π(x , y , z ,¬pxy , pxz) = 0.5 π(v) = 0 for other 29 valuations
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Solving PSAT
Linear algebraic formulation
Optimization problem with exponential columns
Columns are not explicitly represented, but provided by acolumn generation process
Goal: minimize the probability of inconsistent columns
A SAT solver is employed for column generation and to forcea decrease in cost
Interface between logic and algebra is an algebraic constraintthat can be seen as a logic formula
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Example of PSAT solution
Add variables for each soft violation: sx , sy , sz .
Γ =
{
x , y , z , ¬pxy ∨ ¬pxz ,(x ∧ ¬pxy ∧ ¬pxz) → sx , (y ∧ ¬pxy ) → sy , (z ∧ ¬pxz) → sz
}
Ψ = { P(sx) = 0.25, P(sy ) = 0.6, P(sz) = 0.6 }
Iteration 0:
sxsysz
1 1 1 10 0 0 10 0 1 10 1 1 1
·
0.40
0.350.25
=
10.250.600.60
cost(0) = 0.4
b(0) = [1 0 1 0]′ : col 3
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Example of PSAT solution
Add variables for each soft violation: sx , sy , sz .
Γ =
{
x , y , z , ¬pxy ∨ ¬pxz ,(x ∧ ¬pxy ∧ ¬pxz) → sx , (y ∧ ¬pxy ) → sy , (z ∧ ¬pxz) → sz
}
Ψ = { P(sx) = 0.25, P(sy ) = 0.6, P(sz) = 0.6 }
Iteration 1:
sxsysz
1 1 1 10 0 0 10 0 1 10 1 0 1
·
0.050.350.350.25
=
10.250.600.60
cost(1) = 0.05
b(1) = [1 1 0 1]′ : col 1
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Example of PSAT solution
Add variables for each soft violation: sx , sy , sz .
Γ =
{
x , y , z , ¬pxy ∨ ¬pxz ,(x ∧ ¬pxy ∧ ¬pxz) → sx , (y ∧ ¬pxy ) → sy , (z ∧ ¬pxz) → sz
}
Ψ = { P(sx) = 0.25, P(sy ) = 0.6, P(sz) = 0.6 }
Iteration 2:
sxsysz
1 1 1 11 0 0 10 0 1 11 1 0 1
·
0.050.350.400.20
=
10.250.600.60
cost(2) = 0
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Next Topic
1 What You Probably Don’t Know
2 Pragmatic Motivation
3 Probabilistic Satisfiability
4 Optimizing Probability Distributions with oPSAT
5 oPSAT and Combinatorial Materials Discovery
6 Conclusions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Optimizing PSAT solutions
Solutions to PSAT are not unique
First optimization phase: determines if constraints are solvable
Second optimization phase to obtain a distribution withdesirable properties.
A different objective (cost) function
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Minimizing Expected Violations
Idea: minimize the expected number of soft constraintsviolated by each valuation, S(v)
E (S) =∑
vi |vi (Γ)=1
S(vi )π(vi )
Theorem: Every linear function of a model (valuation) hasconstant expected value for any PSAT solution
In particular, E (S) is constant, no point in minimizing it
Any other linear expectation function not a candidate forminimization
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Minimizing Variance
Idea: penalize high S , minimize E (S2)
Lemma: The distribution that minimizes E (S2) alsominimizes variance, Var(S) = E ((S − E (S))2)
oPSAT is a second phase minimization whose objectivefunction is E (S2)
Problem: computing a SAT formula that decreases cost isharder than in PSAT
oPSAT needs a more elaborate interface logic/linear algebra
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Next Topic
1 What You Probably Don’t Know
2 Pragmatic Motivation
3 Probabilistic Satisfiability
4 Optimizing Probability Distributions with oPSAT
5 oPSAT and Combinatorial Materials Discovery
6 Conclusions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Problem Definition
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Problem Modeling
Find an association of peaks to phases, respecting structuralconstraints, such that the probability of defect is limited.
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Modeling Process
Logic modeling
Identify the structural elements of the problem
Identify the {0,1}-variables of the problem
Formulas encode the constraints on relationships betweenvariables
Identify the possible defects
Limit defects with “acceptable” upper probabilities. Mayrequire iteration and/or learning
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Implementation
Implementated in C++
Linear Solver uses blas and lapack
SAT-solver: minisat
PSAT formula (DIMACS extension) generated by C++formula generator
P(dp) ≤ 2% =⇒ P(di ) ≤ 1− (1− P(dp))Li
Input: Peaks at sample point
Output: oPSAT most probable model
Compare with SMT implementation in SAT 2012
psat.sourceforge.net
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Experimental Results
Dataset SMT oPSAT
System P L∗ K #Peaks Time(s) Time(s) Accuracy
Al/Li/Fe 28 6 6 170 346 5.3 84.7%Al/Li/Fe 28 8 6 424 10076 8.8 90.5%Al/Li/Fe 28 10 6 530 28170 12.6 83.0%Al/Li/Fe 45 7 6 651 18882 121.1 82.0%Al/Li/Fe 45 8 6 744 46816 128.0 80.3%
The accuracy of SMT is 100%
P : n. of sample points; L∗: the average n, of peaks per phase
K : n. of basis patterns; #Peaks: overall n. of peaks
#Aux variables > 10 000
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Next Topic
1 What You Probably Don’t Know
2 Pragmatic Motivation
3 Probabilistic Satisfiability
4 Optimizing Probability Distributions with oPSAT
5 oPSAT and Combinatorial Materials Discovery
6 Conclusions
Marcelo Finger
HardSoft & PSAT
Ignorance Motivation PSAT oPSAT Application Conclusion
Conclusions and the Future
oPSAT can be effectively implemented to deal with hard andsoft constraints
Can be successfully applied to non-trivial problems ofmaterials discovery with acceptable precision and superior runtimes than existing methods
Other forms of logic-probabilistic inference are underinvestigation
Marcelo Finger
HardSoft & PSAT