Pfizer HTS Machine Learning Algorithms: November 2002

Post on 03-Jan-2016

31 views 3 download

Tags:

description

Pfizer HTS Machine Learning Algorithms: November 2002. Paul Hsiung (hsiung+@cs.cmu.edu) Paul Komarek (komarek@cs.cmu.edu) Ting Liu (tingliu@cs.cmu.edu) Andrew W. Moore (awm@cs.cmu.edu) Auton Lab , Carnegie Mellon University School of Computer Science www.autonlab.org. Datasets. - PowerPoint PPT Presentation

Transcript of Pfizer HTS Machine Learning Algorithms: November 2002

Pfizer HTS Machine Learning Algorithms:

November 2002Paul Hsiung (hsiung+@cs.cmu.edu)

Paul Komarek (komarek@cs.cmu.edu)Ting Liu (tingliu@cs.cmu.edu)

Andrew W. Moore (awm@cs.cmu.edu)

Auton Lab, Carnegie Mellon UniversitySchool of Computer Science

www.autonlab.org

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 2

DatasetsOur Name

Num. Records

Num Attributes

Num non-zero input cells

Num positive outputs

Description

train1 26,733 6,348 3.7M 804 The original dataset sent to CMU in Feb 2002

test1 1,456 6,121 0.2M 878 The test set associated with the above training set

jun-3-1 88,358 1,143,054

30M 423 The large “TEST3” dataset sent to us in May 2002. the “-1” at the end denotes that we were using the first of the four activation columns

combined

88,358 1,143,054

30M 211 Combining the “TEST3” datasets. The activation in Combined is positive if and only if at least two of the four original activations were positive.

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 3

Projections

Our Name name given to original

name given to 100 dimensional projection

name given to 10 dimensional projection

train1 train1 train100 train10

test1 test1 test100 test10

train1 train1 train-pls-100

train-pls-10

test1 test1 test-pls-100 test-pls-10

jun-3-1 n/a jun-3-1 n/a

combined n/a combined n/a

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 4

Previous AlgorithmsBC Bayes Classifier

On original data, a naïve categorical classifier was used.On Real-valued projected data, a Naïve Gaussian classifier was used.

Dtree

Decision TreeThis technique is also known as Recursive Partitioning and CART. It was only implemented for the original data.

SVM Support Vector Machine.Except where stated otherwise, a linear SVM was used. We could not find significant performance difference between Linear SVM and Radial Basis Function SVM with a variety of RBF parameters.

k-NN k-nearest neighborExcept where stated otherwise, k=9 neighbors were used. Only implemented for projected data.

LR Logistic RegressionExcept where stated otherwise, used Conjugate Gradient to perform intermediate weighted regressions, using a newly developed technique.

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 5

New Algorithmsnew-KNN

Tractable High dimensional k-nearest neighborCan work on the 1,000,000 dimensional “June” data.

EFP Explicit False Positive Logistic RegressionLogistic regression that accounts for the high false positive rate.

SMod

Super Model.Automatically combining the predictions from multiple algorithms with a “meta-level” of logistic regression.

PLS-proj

Partial Least Squares ProjectionUsing PLS instead of PCA to project down data

PLS Partial Least Squares PredictionUsing the PLS algorithm as a predictor

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 6

Explicit False Positive Model

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 7

Explicit False Positive Model

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 8

Example in 2 dimensions: Decision Boundary

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 9

Example in 2 dimensions: 100 true positives

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 10

100 true positives and 100 true negatives

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 11

100 TP, 100 TN, 10 FP

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 12

Using regular logistic regression

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 13

Using EFP Model

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 14

Example: 10000 true positives

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 15

10000 true positives, 10000 true negatives

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 16

10000 TP, 10000 TN, 1000 FP

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 17

Using regular logistic regression

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 18

Using EFP Model

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 19

EFP Model Real Data Results

K-fold

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 20

EFP Effect

…Very impressive on Train1 / Test1

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 21

Log X-axis

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 22

EFP Effect

…Unimpressive on jun31 / jun32

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 23

Super Model• Divide Training Set into Compartment A

and Compartment B

• Learn each of N models on Compartment A

• Predict each of N models on Compartment B

• Learn best weighting of opinions with Logistic Regression of Predictions on Compartment B

• Apply the models and their weights to Test Data

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 24

Comparison

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 25

Log X-Axis Scale

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 26

Comparison on 100-dims

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 27

Log X-axis

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 28

Comparison on 10 dims

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 29

Log X-axis

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 30

NewKNN summary of results and timings

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 31

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 32

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 33

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 34

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 35

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 36

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 37

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 38

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 39

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 40

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 41

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 42

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 43

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 44

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 45

PLS summary of results•PLS projections did not do so well.•However, PLS as a predictor performed well,especially under train100/test100.•PLS is fast. The runtime varies from 1 to 10 minutes.•But PLS takes large amounts of memory. Impossibleto use in a sparse representation. (This is due to theupdate on each iteration.)

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 46

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 47

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 48

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 49

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 50

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 51

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 52

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 53

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 54

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 55

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 56

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 57

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 58

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 59

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 60

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 61

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 62

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 63

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 64

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 65

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 66

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 67

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 68

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 69

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 70

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 71

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 72

Summary of results• SVM best early on in Train1, LR better in the

long-haul.• Projecting to 10-d always a disaster• Projecting to 100-d often indistinguishable from

behavior with original data (and much cheaper)• Naïve Gaussian Bayes Classifier best on JUN-3-1

(k-nn better for long haul)• Naïve Gaussian Bayes Classifier best on

combined• Non-linear SVM never seems distinguishable

from Linear SVM• All methods have won in at least one context,

except Dtree.

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 73

Some AUC ResultsExperiment Algorithm AUC

Train on Train1 then test on Test1

Linear SVM 0.876*

Best non-Linear SVM

0.875*

BC 0.867*

LR 0.71

KNN 0.872*

DTree 0.70

Combined SVM 0.638

BC 0.700

LR 0.606

KNN 0.603

* = Not statistically significantly different

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 74

Some AUC ResultsExperiment Algorithm AUC

10-fold cross-validation on Train1

Linear SVM 0.919

BC 0.885

LR 0.933

DTree 0.894