StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs...
Transcript of StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs...
![Page 1: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/1.jpg)
StreamSVMLinear SVMs and Logistic Regression When Data Does Not Fit
In Memory
S.V.N. (vishy) Vishwanathan
Purdue University and [email protected]
October 9, 2012
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 1 / 41
![Page 2: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/2.jpg)
Linear Support Vector Machines
Outline
1 Linear Support Vector Machines
2 StreamSVM
3 Experiments - SVM
4 Logistic Regression
5 Experiments - Logistic Regression
6 Conclusion
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 2 / 41
![Page 3: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/3.jpg)
Linear Support Vector Machines
Binary Classification
yi = −1
yi = +1
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 3 / 41
![Page 4: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/4.jpg)
Linear Support Vector Machines
Binary Classification
yi = −1
yi = +1
x | 〈w , x〉 = 0
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 3 / 41
![Page 5: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/5.jpg)
Linear Support Vector Machines
Linear Support Vector Machines
yi = −1
yi = +1
x | 〈w , x〉 = 0x | 〈w , x〉 = −1
x | 〈w , x〉 = 1
x2 x1
〈w , x1 − x2〉 = 2⟨w‖w‖ , x1 − x2
⟩= 2‖w‖
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 4 / 41
![Page 6: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/6.jpg)
Linear Support Vector Machines
Linear Support Vector Machines
Optimization Problem
minw
12‖w‖2 + C
m∑i=1
max (0,1− yi 〈w , xi〉)
−3 −2 −1 1 2 3 4
1
2
3Hinge Loss
Margin
Loss
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 5 / 41
![Page 7: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/7.jpg)
Linear Support Vector Machines
The Dual
Objective
minα
D(α) :=12
∑i,j
αiαjyiyj⟨xi , xj
⟩−∑
i
αi
s.t. 0 ≤ αi ≤ C
Convex and quadratic objectiveLinear (box) constraintsw =
∑i αiyixi
Each αi corresponds to an xi
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 6 / 41
![Page 8: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/8.jpg)
Linear Support Vector Machines
Coordinate Descent
−2
0
2−3
−2−1
01
23
0
20
40
60
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41
![Page 9: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/9.jpg)
Linear Support Vector Machines
Coordinate Descent
−2
0
2−3
−2−1
01
23
0
20
40
60
x y
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41
![Page 10: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/10.jpg)
Linear Support Vector Machines
Coordinate Descent
−3 −2 −1 0 1 2 30
20
40
60
x
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41
![Page 11: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/11.jpg)
Linear Support Vector Machines
Coordinate Descent
−3 −2 −1 0 1 2 30
20
40
60
x
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41
![Page 12: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/12.jpg)
Linear Support Vector Machines
Coordinate Descent
−2
0
2−3
−2−1
01
23
0
20
40
60
x y
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41
![Page 13: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/13.jpg)
Linear Support Vector Machines
Coordinate Descent
−3 −2 −1 0 1 2 3
2
4
6
8
y
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41
![Page 14: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/14.jpg)
Linear Support Vector Machines
Coordinate Descent
−3 −2 −1 0 1 2 3
2
4
6
8
y
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41
![Page 15: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/15.jpg)
Linear Support Vector Machines
Coordinate Descent in the Dual
One dimensional function
D(αt ) =α2
t2〈xt , xt〉+
∑i 6=t
αtαiyiyt 〈xi , xt〉 − αt + const.
s.t. 0 ≤ αt ≤ C
Solve
∇D(αt ) = αt 〈xt , xt〉+∑i 6=t
αiyiyt 〈xi , xt〉 − 1 = 0
αt = median(
0,C,1−
∑i 6=t αiyiyt 〈xi , xt〉〈xt , xt〉
)S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 8 / 41
![Page 16: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/16.jpg)
StreamSVM
Outline
1 Linear Support Vector Machines
2 StreamSVM
3 Experiments - SVM
4 Logistic Regression
5 Experiments - Logistic Regression
6 Conclusion
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 9 / 41
![Page 17: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/17.jpg)
StreamSVM
We are Collecting Lots of Data . . .
yi = −1
yi = +1
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41
![Page 18: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/18.jpg)
StreamSVM
We are Collecting Lots of Data . . .
yi = −1
yi = +1
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41
![Page 19: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/19.jpg)
StreamSVM
We are Collecting Lots of Data . . .
yi = −1
yi = +1
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41
![Page 20: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/20.jpg)
StreamSVM
We are Collecting Lots of Data . . .
yi = −1
yi = +1
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41
![Page 21: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/21.jpg)
StreamSVM
What if Data Does not Fit in Memory?
Idea 1: Block Minimization [Yu et al., KDD 2010]Split data into blocks B1, B2 . . . such that Bj fits in memoryCompress and store each block separatelyLoad one block of data at a time and optimize only those αi ’s
Idea 2: Selective Block Minimization [Chang and Roth, KDD 2011]Split data into blocks B1, B2 . . . such that Bj fits in memoryCompress and store each block separatelyLoad one block of data at a time and optimize only those αi ’sRetain informative samples from each block in main memory
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 11 / 41
![Page 22: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/22.jpg)
StreamSVM
What are Informative Samples?
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 12 / 41
![Page 23: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/23.jpg)
StreamSVM
Some Observations
SBM and BM are wastefulBoth split data into blocks and compress the blocks
This requires reading the entire data at least once (expensive)
Both pause optimization while a block is loaded into memory
Hardware 101Disk I/O is slower than CPU (sometimes by a factor of 100)Random access on HDD is terrible
sequential access is reasonably fast (factor of 10)
Multi-core processors are becoming commonplaceHow can we exploit this?
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 13 / 41
![Page 24: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/24.jpg)
StreamSVM
Our Architecture
Reader Trainer
RAM RAMHDD
Data Working Set Weight Vec
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 14 / 41
![Page 25: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/25.jpg)
StreamSVM
Our Philosophy
Iterate over the data in main memory while streaming datafrom disk. Evict primarily examples from main memory thatare “uninformative”.
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 15 / 41
![Page 26: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/26.jpg)
StreamSVM
Reader
for k = 1, . . . ,max_iter dofor i = 1, . . . ,n do
if |A| = Ω thenrandomly select i ′ ∈ AA = A \ i ′delete yi ,Qii , xi from RAM
end ifread yi , xi from Diskcalculate Qii = 〈xi , xi〉store yi ,Qii , xi in RAMA = A ∪ i
end forif stopping criterion is met then
exitend if
end forS.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 16 / 41
![Page 27: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/27.jpg)
StreamSVM
Trainer
α1 = 0,w1 = 0, ε = 9, εnew = 0, β = 0.9while stopping criterion is not met do
for t = 1, . . . ,n doIf |A| > 0.9× Ω then ε = βεrandomly select i ∈ A and read yi ,Qii , xi from RAMcompute ∇iD := yi
⟨w t , xi
⟩− 1
if (αti = 0 and ∇iD > ε) or (αt
i = C and ∇iD < −ε) thenA = A \ i and delete yi ,Qii , xi from RAMcontinue
end ifαt+1
i = median(
0,C, αti −
∇i DQii
), w t+1 = w t + (αt+1
i − αti )yixi
εnew = max(εnew, |∇iD|)end forUpdate stopping criterionε = εnew
end whileS.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 17 / 41
![Page 28: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/28.jpg)
StreamSVM
Proof of Convergence (Sketch)
Definition (Luo and Tseng Problem)
minimizeα
g(Eα) + b>α
subject to Li ≤ αi ≤ Ui
α,b ∈ Rn, E ∈ Rd×n, Li ∈ [−∞,∞),Ui ∈ (−∞,∞]The above optimization problem is a Luo and Tseng problem if thefollowing conditions hold:
1 E has no zero columns2 the set of optimal solutions A is non-empty3 the function g is strictly convex and twice cont. differentiable4 for all optimal solutions α∗ ∈ A, ∇2g(Eα∗) is positive definite.
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 18 / 41
![Page 29: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/29.jpg)
StreamSVM
Proof of Convergence (Sketch)
Lemma (see Theorem 1 of Hsieh et al., ICML 2008)If no data xi = 0, then the SVM dual is a Luo and Tseng problem.
Definition (Almost cyclic rule)There exists an integer B ≥ n such that every coordinate is iteratedupon at least once every B successive iterations.
Theorem (Theorem 2.1 of Luo and Tseng, JOTA 1992)
Let αt be a sequence of iterates generated by a coordinate descentmethod using the almost cyclic rule. The αt converges linearly to anelement of A.
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 19 / 41
![Page 30: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/30.jpg)
StreamSVM
Proof of Convergence (Sketch)
LemmaIf the trainer accesses the cached training data sequentially, and thefollowing conditions hold:
The trainer is at most κ ≥ 1 times faster than the reading thread,that is, the trainer performs at most κ coordinate updates in thetime that it takes the reader to read one training data from disk.A point is never evicted from the RAM unless the αi correspondingto that point has been updated.
Then this confirms to the almost cyclic rule.
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 20 / 41
![Page 31: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/31.jpg)
Experiments - SVM
Outline
1 Linear Support Vector Machines
2 StreamSVM
3 Experiments - SVM
4 Logistic Regression
5 Experiments - Logistic Regression
6 Conclusion
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 21 / 41
![Page 32: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/32.jpg)
Experiments - SVM
Experiments
dataset n d s(%) n+ :n− Datasizeocr 3.5 M 1156 100 0.96 45.28 GBdna 50 M 800 25 3e−3 63.04 GBwebspam-t 0.35 M 16.61 M 0.022 1.54 20.03 GBkddb 20.01 M 29.89 M 1e-4 6.18 4.75 GB
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 22 / 41
![Page 33: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/33.jpg)
Experiments - SVM
Does Active Eviction Work?
0 0.5 1 1.5 2 2.5·104
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
ce
Linear SVM
webspam-t C = 1.0RandomActive
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 23 / 41
![Page 34: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/34.jpg)
Experiments - SVM
Comparison with Block Minimization
0 1 2 3 4·104
10−9
10−7
10−5
10−3
10−1
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
ce
ocr C = 1.0StreamSVM
SBMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41
![Page 35: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/35.jpg)
Experiments - SVM
Comparison with Block Minimization
0 0.5 1 1.5 2·104
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
ce
webspam-t C = 1.0StreamSVM
SBMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41
![Page 36: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/36.jpg)
Experiments - SVM
Comparison with Block Minimization
0 1 2 3 4·104
10−6
10−5
10−4
10−3
10−2
10−1
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
ce
kddb C = 1.0StreamSVM
SBMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41
![Page 37: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/37.jpg)
Experiments - SVM
Comparison with Block Minimization
0 1 2 3 4·104
10−11
10−9
10−7
10−5
10−3
10−1
Wall Clock Time (sec)
Rel
ativ
eO
bjec
tive
Func
tion
Valu
e
dna C = 1.0StreamSVM
SBMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41
![Page 38: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/38.jpg)
Experiments - SVM
Effect of Varying C
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·105
10−11
10−9
10−7
10−5
10−3
10−1
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
ce
dna C = 10.0StreamSVM
SBMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41
![Page 39: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/39.jpg)
Experiments - SVM
Effect of Varying C
0 0.5 1 1.5·105
10−2
10−1
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
ce
dna C = 100.0StreamSVM
SBMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41
![Page 40: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/40.jpg)
Experiments - SVM
Effect of Varying C
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·105
10−1
100
Wall Clock Time (sec)
Rel
ativ
eO
bjec
tive
Func
tion
Valu
e
dna C = 1000.0StreamSVM
SBMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41
![Page 41: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/41.jpg)
Experiments - SVM
Varying Cache Size
0 0.5 1 1.5 2·104
10−6
10−5
10−4
10−3
10−2
10−1
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
cekddb C = 1.0
256 MB1 GB4 GB
16 GB
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 26 / 41
![Page 42: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/42.jpg)
Experiments - SVM
Expanding Features
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·105
10−10
10−8
10−6
10−4
10−2
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
cedna expanded C = 1.0
16 GB32GB
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 27 / 41
![Page 43: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/43.jpg)
Logistic Regression
Outline
1 Linear Support Vector Machines
2 StreamSVM
3 Experiments - SVM
4 Logistic Regression
5 Experiments - Logistic Regression
6 Conclusion
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 28 / 41
![Page 44: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/44.jpg)
Logistic Regression
Logistic Regression
Optimization Problem
minw
12‖w‖2 + C
m∑i=1
log (1 + exp (−yi 〈w , xi〉))
−4 −2 2 4
1
2
3Hinge Loss
Logistic Loss
Margin
Loss
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 29 / 41
![Page 45: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/45.jpg)
Logistic Regression
The Dual
Objective
minα
D(α) :=12
∑i,j
αiαjyiyj⟨xi , xj
⟩+∑
i
αi logαi + (C − αi) log(C − αi)
s.t. 0 ≤ αi ≤ C
w =∑
i αiyixi
Convex objective (but not quadratic)Box constraints
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 30 / 41
![Page 46: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/46.jpg)
Logistic Regression
Coordinate Descent
One dimensional function
D(αt ) :=α2
t2〈xt , xt〉+
∑i 6=t
αtαiyiyt 〈xi , xt〉
+ αt logαt + (C − αt ) log(C − αt )
s.t. 0 ≤ αt ≤ C
SolveNo closed form solutionA modified Newton method does the job!See Yu, Huang, Lin 2011 for details.
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 31 / 41
![Page 47: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/47.jpg)
Logistic Regression
Determining Importance: SVM case
Shrinking
αti = 0 and ∇iD(α) > ε or
αti = C and ∇iD(α) < −ε
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 32 / 41
![Page 48: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/48.jpg)
Logistic Regression
Determining Importance: SVM case
−4 −2 2 4 6
0.5
1
Margin
Loss Gradient
∇w loss(w , xi , yi) = αi (KKT condition)
∇πi D(α) = 0|〈w , xi〉 − 1| ≥ ε
Computing ε
ε = maxi∈A∣∣∇πi D(α)
∣∣S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 33 / 41
![Page 49: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/49.jpg)
Logistic Regression
Determining Importance: Logistic Regression
−8 −6 −4 −2 2 4 6 8
0.2
0.4
0.6
0.8
1
Margin
Loss Gradient
∇w loss(w , xi , yi) ≈ αi (KKT condition)
∇iD(α) ≈ 0|〈w , xi〉| ≥ ε
Computing ε
ε = maxi∈A |∇iD(α)|
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 34 / 41
![Page 50: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/50.jpg)
Experiments - Logistic Regression
Outline
1 Linear Support Vector Machines
2 StreamSVM
3 Experiments - SVM
4 Logistic Regression
5 Experiments - Logistic Regression
6 Conclusion
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 35 / 41
![Page 51: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/51.jpg)
Experiments - Logistic Regression
Does Active Eviction Work?
0 0.5 1 1.5 2 2.5·104
10−10
10−8
10−6
10−4
10−2
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
ce
Logistic Regression
webspam-t C = 1.0RandomActive
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 36 / 41
![Page 52: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/52.jpg)
Experiments - Logistic Regression
Comparison with Block Minimization
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6·104
10−10
10−8
10−6
10−4
10−2
100
Wall Clock Time (sec)
Rel
ativ
eFu
nctio
nVa
lue
Diff
eren
cewebspam-t C = 1.0
StreamSVMBM
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 37 / 41
![Page 53: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/53.jpg)
Conclusion
Outline
1 Linear Support Vector Machines
2 StreamSVM
3 Experiments - SVM
4 Logistic Regression
5 Experiments - Logistic Regression
6 Conclusion
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 38 / 41
![Page 54: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/54.jpg)
Conclusion
Things I did not talk about/Work in Progress
StreamSVMAmortizing the cost of training different modelsOther storage Heirarchies (e.g. Solid State Drives)Extensions to other problems
Multiclass problemsStructured Output Prediction
Distributed version of the algorithm
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 39 / 41
![Page 55: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/55.jpg)
Conclusion
References
Shin Matsushima, S. V. N. Vishwanathan, and Alexander J Smola.Linear Support Vector Machines via Dual Cached Loops. KDD2012.Code for StreamSVM available fromhttp://www.r.dl.itc.u-tokyo.ac.jp/~masin/streamsvm.html
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 40 / 41
![Page 56: StreamSVM - Linear SVMs and Logistic Regression …vishy/talks/streamsvm.pdfStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V:N. (vishy) Vishwanathan](https://reader038.fdocuments.us/reader038/viewer/2022102921/5b0cb8b07f8b9a2f788ca8bd/html5/thumbnails/56.jpg)
Conclusion
Joint work with
Reading Thread Training Thread
Update
RAM
Weight Vector
RAM
Cached Data (Working Set)
Disk
Dataset
Read
(Random Access)
Read
(Sequential Access)
Load
(Random Access)
S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 41 / 41