Pfizer HTS Machine Learning Algorithms: November 2002

Pfizer HTS Machine Learning Algorithms:

November 2002Paul Hsiung (hsiung+@cs.cmu.edu)

Paul Komarek (komarek@cs.cmu.edu)Ting Liu (tingliu@cs.cmu.edu)

Andrew W. Moore (awm@cs.cmu.edu)

Auton Lab, Carnegie Mellon UniversitySchool of Computer Science

www.autonlab.org

Auton Lab, www.autonlab.org HTS Results, 11/25/2: Slide 2

DatasetsOur Name

Num. Records

Num Attributes

Num non-zero input cells

Num positive outputs

Description

train1 26,733 6,348 3.7M 804 The original dataset sent to CMU in Feb 2002

test1 1,456 6,121 0.2M 878 The test set associated with the above training set

jun-3-1 88,358 1,143,054

30M 423 The large “TEST3” dataset sent to us in May 2002. the “-1” at the end denotes that we were using the first of the four activation columns

combined

88,358 1,143,054

30M 211 Combining the “TEST3” datasets. The activation in Combined is positive if and only if at least two of the four original activations were positive.

Projections

Our Name name given to original

name given to 100 dimensional projection

name given to 10 dimensional projection

train1 train1 train100 train10

test1 test1 test100 test10

train1 train1 train-pls-100

train-pls-10

test1 test1 test-pls-100 test-pls-10

jun-3-1 n/a jun-3-1 n/a

combined n/a combined n/a

Previous AlgorithmsBC Bayes Classifier

On original data, a naïve categorical classifier was used.On Real-valued projected data, a Naïve Gaussian classifier was used.

Decision TreeThis technique is also known as Recursive Partitioning and CART. It was only implemented for the original data.

SVM Support Vector Machine.Except where stated otherwise, a linear SVM was used. We could not find significant performance difference between Linear SVM and Radial Basis Function SVM with a variety of RBF parameters.

k-NN k-nearest neighborExcept where stated otherwise, k=9 neighbors were used. Only implemented for projected data.

LR Logistic RegressionExcept where stated otherwise, used Conjugate Gradient to perform intermediate weighted regressions, using a newly developed technique.

New Algorithmsnew-KNN

Tractable High dimensional k-nearest neighborCan work on the 1,000,000 dimensional “June” data.

EFP Explicit False Positive Logistic RegressionLogistic regression that accounts for the high false positive rate.

Super Model.Automatically combining the predictions from multiple algorithms with a “meta-level” of logistic regression.

PLS-proj

Partial Least Squares ProjectionUsing PLS instead of PCA to project down data

PLS Partial Least Squares PredictionUsing the PLS algorithm as a predictor

Explicit False Positive Model

Example in 2 dimensions: Decision Boundary

Example in 2 dimensions: 100 true positives

100 true positives and 100 true negatives

100 TP, 100 TN, 10 FP

Using regular logistic regression

Using EFP Model

Example: 10000 true positives

10000 true positives, 10000 true negatives

10000 TP, 10000 TN, 1000 FP

Using regular logistic regression

Using EFP Model

EFP Model Real Data Results

K-fold

EFP Effect

…Very impressive on Train1 / Test1

Log X-axis

EFP Effect

…Unimpressive on jun31 / jun32

Super Model• Divide Training Set into Compartment A

and Compartment B

• Learn each of N models on Compartment A

• Predict each of N models on Compartment B

• Learn best weighting of opinions with Logistic Regression of Predictions on Compartment B

• Apply the models and their weights to Test Data

Comparison

Log X-Axis Scale

Comparison on 100-dims

Log X-axis

Comparison on 10 dims

Log X-axis

NewKNN summary of results and timings

PLS summary of results•PLS projections did not do so well.•However, PLS as a predictor performed well,especially under train100/test100.•PLS is fast. The runtime varies from 1 to 10 minutes.•But PLS takes large amounts of memory. Impossibleto use in a sparse representation. (This is due to theupdate on each iteration.)

Summary of results• SVM best early on in Train1, LR better in the

long-haul.• Projecting to 10-d always a disaster• Projecting to 100-d often indistinguishable from

behavior with original data (and much cheaper)• Naïve Gaussian Bayes Classifier best on JUN-3-1

(k-nn better for long haul)• Naïve Gaussian Bayes Classifier best on

combined• Non-linear SVM never seems distinguishable

from Linear SVM• All methods have won in at least one context,

except Dtree.

Some AUC ResultsExperiment Algorithm AUC

Train on Train1 then test on Test1

Linear SVM 0.876*

Best non-Linear SVM

0.875*

BC 0.867*

LR 0.71

KNN 0.872*

DTree 0.70

Combined SVM 0.638

BC 0.700

LR 0.606

KNN 0.603

* = Not statistically significantly different

Some AUC ResultsExperiment Algorithm AUC

10-fold cross-validation on Train1

Linear SVM 0.919

BC 0.885

LR 0.933

DTree 0.894

Pfizer HTS Machine Learning Algorithms: November 2002

Documents

Transcript of Pfizer HTS Machine Learning Algorithms: November 2002

HP-HTS 800-8M Lineup HP-HTS 800-8M 8.0mm HP-HTS 8M · 2018-01-16 · HP-HTS 8M New Generation Transmission Belt Curvilinear Toothed Profile HP-HTS 800-8M HP-HTS 800-8M HP-HTS 800-8M

Pfizer Ringaskiddy Groundwater Monitoring · Pfizer Ringaskiddy Groundwater Monitoring Round 1 (May) 2017 IEL Report Pfizer Ireland Limited Prepared for: Pfizer Ireland Limited Ckrp0001

Presentation Pfizer

High Throughput Sequencing: Technologies & Applications Michael Brudno CSC 2431 – Algorithms for HTS University of Toronto 06/01/2010.

BAA Higher Training Scheme (HTS): Briefing for Prospective HTS Examiners and Existing CAC/HTS Examiners Liverpool 2009 John Day, HTS Chief Examiner.

58766 Pfizer Prospectus · PROSPECTUS Pfizer Inc. Shareholder Investment Program ... Plan of Distribution ... consumer products. Every day, Pfizer colleagues

HTS Infographic

Atlas PFizer de Parasitologia Clinica Veterinaria-pFizer

APPLICATION OF EVOLVED EVOLUTIONARY ALGORITHMS …thenucleuspak.org.pk/oldsite/Fulltext/MS-904 S Haroon Proof OK.pdfHydrothermal Scheduling (HTS) presents highly complicated, non-linear

Pfizer Management

HTS Series - kiltsandmore.com Snare Drums... · HTS Series HTS Snare Drums Ref. Part Number Description Quantity per Pack 13 700/13SBA HTS 12” Down Tube 2 800/13C HTS 12” Down

Validating New Techniques for HTS data analysis Alain Calvet, Kjell Johnson and George S. Cowan Pfizer Global Research and Development Ann Arbor Laboratories.

Pfizer Pipeline

Pfizer Ventures

Pfizer Sustainability

Pfizer CentreOne

HTS CORPORATE.PDF

High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.

alprazolam: List of nationally authorised medicinal ......Alprazolam "Pfizer", tabletter not available 14630 PFIZER APS DK Alprazolam "Pfizer", tabletter not available 14213 PFIZER

Pfizer Ppt