Multiple Instance Learning for Sparse Positive Bags
Razvan C. BunescuMachine Learning GroupDepartment of Computer
SciencesUniversity of Texas at [email protected]
Raymond J. MooneyMachine Learning GroupDepartment of Computer
SciencesUniversity of Texas at [email protected]
Two Types of Supervision
• Single Instance Learning (SIL):– the traditional type of supervision in machine learning.– a dataset of positive and negative training instances.
• Multiple Instance Learning (MIL):– a dataset of positive and negative training bags of instances.
– a bag is positive if at least one instance in the bag is positive.
– a bag is negative if all instances in the bag are negative.
– the bag instance labels are hidden.
2
MIL Background: Domains
• Originally introduced to solve a Drug Activity prediction problem in biochemistry [Dietterich et al., 1997]
• Content Based Image Retrieval [Zhang et al., 2002]
• Text categorization [Andrews et al., 03], [Ray et al., 05].
3
MIL Background: Algorithms
• Axis Parallel Rectangles [Dietterich, 1997]
• Diverse Density [Maron, 1998]
• Multiple Instance Logistic Regression [Ray & Craven, 05]
• Multi-Instance SVM kernels of [Gartner et al., 2002]
– Normalized Set Kernel.
– Statistic Kernel.
4
Outline
Introduction• MIL as SIL with one-side noise• The Normalized Set Kernel (NSK)• Three SVM approaches to MIL:
– An SVM approach to sparse MIL (sMIL)– A transductive SVM approach to sparse MIL (stMIL)– A balanced SVM approach to MIL (sbMIL)
• Experimental Results• Future Work & Conclusion
5
SIL Approach to MIL
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
minimize:
subject to:
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
6
SIL Approach to MIL
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
minimize:
subject to:
Negative Bags
7
SIL Approach to MIL
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
minimize:
subject to:
Positive Bags
8
SIL Approach to MIL
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
minimize:
subject to:
Regularization term
9
SIL Approach to MIL
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
minimize:
subject to:
Error on negative bags
10
SIL Approach to MIL
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
minimize:
subject to:
Error on positive bags
11
Outline
Introduction MIL as SIL with one-side noise• The Normalized Set Kernel (NSK)• Three SVM approaches to MIL:
– An SVM approach to sparse MIL (sMIL)– A transductive SVM approach to sparse MIL (stMIL)– A balanced SVM approach to MIL (sbMIL)
• Experimental Results• Future Work & Conclusion
12
From SIL to the Normalized Set Kernel
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
minimize:
subject to:
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
13
From SIL to the Normalized Set Kernel
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
minimize:
subject to:
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
14
From SIL to the Normalized Set Kernel
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
minimize:
subject to:
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
pXx
xXx
nXx
xXx
XXX
bX
xw
XXX
bX
xw
,||
1||
)(
,||
1||
)(
15
From SIL to the Normalized Set Kernel
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
minimize:
subject to:
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
pXx
xXx
nXx
xXx
XXX
bX
xw
XXX
bX
xw
,||
1||
)(
,||
1||
)(
(X)
16
From SIL to the Normalized Set Kernel
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
minimize:
subject to:
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
pXx
x
nXx
x
XXX
bXXw
XXX
bXXw
,||
1||)(
,||
1||)(
X
17
From SIL to the Normalized Set Kernel
• Apply bag label to all bag instances.• Formulate as SVM problem.
pn X Xxx
X XxxL
CwJ 2
21)(
minimize:
subject to:
0
,1)( ,1)(
x
px
nx
XxbxwXxbxw
pX
nX
XXbXXw
XXbXXw
,1||)(
,1||)(
Normalized Set Kernel
18
The Normalized Set Kernel
• A bag is represented as the normalized sum of its instances.• Use bags as examples in an SVM formulation.
pn XX
XX
pn
CwJ ||2
1)( 2
minimize:
subject to:
pX
nX
XXbXXw
XXbXXw
,1||)(
,1||)(
[Gartner et al., 2002]
19
The Normalized Set Kernel
• A bag is represented as the normalized sum of its instances.• Use bags as examples in an SVM formulation.
pn XX
XX
pn
CwJ ||2
1)( 2
minimize:
subject to:
pX
nX
XXbXXw
XXbXXw
,1||)(
,1||)(
[Gartner et al., 2002]
20
nx Xxbxw ,1)(
The Normalized Set Kernel (NSK)
• A positive bag is the normalized sum of its instances.• Use positive bags and negative instances as examples.
pn X
XpX Xx
xn
CLCwJ
||21)( 2
minimize:
subject to:
pX
nx
XXbXXw
XXxbxw
,1||)(
,1)(
21
Outline
Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK)• Three SVM approaches to MIL:
– An SVM approach to sparse MIL (sMIL)– A transductive SVM approach to sparse MIL (stMIL)– A balanced SVM approach to MIL (sbMIL)
• Experimental Results• Future Work & Conclusion
22
The Normalized Set Kernel (NSK)
• A positive bag is the normalized sum of its instances.• Use positive bags and negative instances as examples.
pn X
XpX Xx
xn
CLCwJ
||21)( 2
minimize:
subject to:
pX
nx
XXbXXw
XXxbxw
,1||)(
,1)(
23
too strong, especially when
sparse positive bags
Inequality Constraints for Positive Bags
24
XbXXw
1||)(
NSK constraint
Xxxy
Xxy
Xbxw
XXxXx
,1)(
||)(
||)(
Balancing constraint
implicitly assumes that all instances inside the bag X are positive
Inequality Constraints for Positive Bags
25
want balancing contraint to express that at least one instance in the bag X is
positive
1)ˆ(}ˆ{,1)(
||)(
||)(
xyxXxxy
Xxy
Xbxw
XXxXx
XXXb
XXw
||
||2||)(
sparse MIL constraint
The Sparse MIL (sMIL)
pn X
XpX Xx
xn
CLCwJ
||21)( 2
minimize:
subject to:
pX
nx
XXX
XbXXw
XXxbxw
,||
||2||)(
,1)(
26
larger for smaller bags small positive bags are more informative
than large positive bags
Outline
Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK)• Three SVM approaches to MIL:
An SVM approach to sparse MIL (sMIL)– A transductive SVM approach to sparse MIL (stMIL)– A balanced SVM approach to MIL (sbMIL)
• Experimental Results• Future Work & Conclusion
27
Xxbxw X ,1)(
Inequality Constraints for Positive Bags
• sMIL is closer than NSK at expressing the constraint that at least one instance from a positive bag is positive.
• However, sMIL does not guarantee that at least one instance is positive– Problem: constraint may be satisfied when all instances have
negative scores that are very close to zero.
– Solution: force all negative instances to have scores –1 + X using the transductive constraint:
28
XXXb
XXw
||
||2||)(
sparse MIL constraint
Xxbxw X ,1)(
Inequality Constraints for Positive Bags
29
XXXb
XXw
1||
||2||)(
sparse MIL constraint
transductive constraintXbxwXx 1)(such that ˆ
shared slacks mixed integer programming problem
at least one instance is positive
Xxbxw x ,1)(
Inequality Constraints for Positive Bags
30
XXXb
XXw
||
||2||)(
sparse MIL constraint
transductive constraintXbxwXx 1)(such that ˆ
independent slacks easier problem, solve with CCCP [Yuille et al., 2002]
at least one instance is positive
The Sparse Transductive MIL (stMIL)
ppn X
XpX Xx
xpX Xx
xn
CLC
LCwJ
||21)(
*2
minimize:
subject to:
0,
,||
||2||)(
,1)(
,1)(
Xx
pX
p
n
x
x
XXX
XbXXw
XXxXXx
bxw
bxw
31
solve with CCCP, as in [Collobert et al. 2006]
Outline
Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK)• Three SVM approaches to MIL:
An SVM approach to sparse MIL (sMIL) A transductive SVM approach to sparse MIL (stMIL)– A balanced SVM approach to MIL (sbMIL)
• Experimental Results• Future Work & Conclusion
32
A Balanced SVM Approach to MIL
• SIL ideal when bags are dense in positive instances.
• sMIL ideal when bags are sparse in positive instances.
• If expected density of positive instances is known, design a method that:– converges to SIL when 1.
– converges to sMIL when 0.
• If is unknown, can set it using cross-validation.
33
The Balanced MIL (sbMIL)
• Input:– Training negative bags Xn, define Xn {x | x X Xn}.– Training positive bags Xp, define Xp {x | x X Xp}– Features (x), or kernel K(x,y).– Capacity parameter C 0 and balance parameter [0,1].
• Output:– Decision function f(x) w(x)+b.
34
(w,b) solve_sMIL(Xn, Xp, , C). order all instances xXp using f(x). label instances xXp:
the top |Xp| as positive. the rest (1–) |Xp| as negative.
(w,b) solve_SIL(Xn , Xp , , C).
Outline
Introduction MIL as SIL with one-side noise The Normalized Set Kernel (NSK) Three SVM approaches to MIL:
An SVM approach to sparse MIL (sMIL) A transductive SVM approach to sparse MIL (stMIL) A balanced SVM approach to MIL (sbMIL)
• Experimental Results• Future Work & Conclusion
35
Experimental Results: Datasets
• [AIMed] An artificial, maximally sparse dataset:– Created from AIMed [Bunescu et al., 2005]:
• A dataset of documents annotated for protein interactions;• A sentence example contains a pair of proteins – the sentence is
positive iff it asserts an interaction between the two proteins;– Create positive bags of sentences:
• choose bag size randomly between Smin and Smax.• start with exactly one positive instance, • randomly add negative instances.
– Create negative bags of sentences:• choose bag size randomly between Smin and Smax.• randomly add negative instances.
• Use subsequence kernel from [Bunescu & Mooney, 2005].
36
Experimental Results: Datasets
• [CBIR] Content Based Image Retrieval:– categorize images as to whether they contain an object of interest.– an image is a bag of image regions.– the number of regions varies widely between images.– for every image, expect that relatively few regions contain object of
interest naturally sparse positive bags.– Evaluate on [Tiger], [Elephant], [Fox] datasets from [Andrews et
al., 2003].
• Use a quadratic kernel with the original feature vectors.
37
Experimental Results: Datasets
• [TST] Text categorization datasets:– Medline articles are bags of overlapping text passages.– Articles are annotated with Mesh terms – use them as classes.– Use [TST1] and [TST2] from [Andrews et al., 2003].
• [MUSK] Drug Activity prediction:– Bags of 3D low energy conformations for every molecule.– A bag is positive is at least one conformation binds to target.– [MUSK1] and MUSK2] datasets from [Dietterich et al., 1997]– A bag is positive if the molecule smells “musky”.
• Use a quadratic kernel with the original feature vectors.38
Experimental Results: Systems
• [SIL] The MIL as SIL with one-side noise.• [NSK] The Normalized Set Kernel.• [STK] The Statistic Kernel.
• [sMIL] The SVM approach to sparse MIL.• [stMIL] The transductive SVM approach to sparse MIL.• [sbMIL] The balanced SVM approach to MIL.
39
Experimental Results
40
Dataset SIL NSK STK sMIL sbMIL stMILAIMed 57.44 87.11 N/A 87.19 87.99 92.11AIMed½ 45.86 54.06 N/A 54.08 67.66 72.94
Tiger 76.65 79.07 80.80 81.12 82.95 74.48
Elephant 85.08 82.94 85.22 87.98 88.58 81.64
Fox 52.72 64.01 62.14 66.13 69.78 60.67
MUSK1 87.82 85.61 69.44 86.91 91.78 79.46
MUSK2 87.33 90.78 61.01 81.19 87.74 68.41
TST1 96.25 97.16 96.19 97.29 97.41 96.81
TST2 85.37 90.60 86.87 87.97 90.57 88.55
Experimental Results
41
Dataset SIL NSK STK sMIL sbMIL stMILAIMed 57.44 87.11 N/A 87.19 87.99 92.11AIMed½ 45.86 54.06 N/A 54.08 67.66 72.94
Tiger 76.65 79.07 80.80 81.12 82.95 74.48
Elephant 85.08 82.94 85.22 87.98 88.58 81.64
Fox 52.72 64.01 62.14 66.13 69.78 60.67
MUSK1 87.82 85.61 69.44 86.91 91.78 79.46
MUSK2 87.33 90.78 61.01 81.19 87.74 68.41
TST1 96.25 97.16 96.19 97.29 97.41 96.81
TST2 85.37 90.60 86.87 87.97 90.57 88.55
Experimental Results
42
Dataset SIL NSK STK sMIL sbMIL stMILAIMed 57.44 87.11 N/A 87.19 87.99 92.11AIMed½ 45.86 54.06 N/A 54.08 67.66 72.94
Tiger 76.65 79.07 80.80 81.12 82.95 74.48
Elephant 85.08 82.94 85.22 87.98 88.58 81.64
Fox 52.72 64.01 62.14 66.13 69.78 60.67
MUSK1 87.82 85.61 69.44 86.91 91.78 79.46
MUSK2 87.33 90.78 61.01 81.19 87.74 68.41
TST1 96.25 97.16 96.19 97.29 97.41 96.81
TST2 85.37 90.60 86.87 87.97 90.57 88.55
Future Work
• Capture distribution imbalance in the MIL model:– instances belonging to the same bag are, in general, more similar
than instances belonging to different bags.
• Incorporate estimates of bag-level densitiy in the MIL model:– in some applications, estimates of density of positive instances
are available for every bag.
43
Conclusion
• Proposed an SVM approach to MIL that is particularly effective when bags are sparse in positive instances.
• Modeling a global density of positive instancs in positive bags further improves the accuracy.
• Treating instances from positive bags as unlabeled data in a transductive setting is useful when negative instances in positive and negative bags come from the same distribution.
44
Questions
45
?
Top Related