A data driven nomogram for breast cancer survival

Post on 17-Jan-2017

277 views 0 download

Transcript of A data driven nomogram for breast cancer survival

A Data-Driven Nomogram for Breast Cancer SurvivalCapstone by Lisa Federer, Bret Gaulin, Michal Haskell, Andy Pollack, and Carmen Smiley

1 in 8 US women will be diagnosed with breast cancer during their lifetime.

Our hypothesisCertain demographic, diagnostic, and treatment parameters can reliably predict survival time for women with breast cancer. Using this knowledge, we could build a “calculator” to estimate survival time for individuals.

The data

http://seer.cancer.gov/

Data exploration and variable selection

• Survival time (months)• Age at diagnosis• Year of birth• Race• Origin (Hispanic recode)• Stage• Histology• Tumor extent• Number of primary tumors• Laterality• ER Status• PR Status• Radiation therapy

146 variables in SEER database

13 variablesof interest

Classification by survival time

Classification model resultsModel   Precision Recall f-score Support

Logistic Regression

0 0.65 0.43 0.23 146

1 0.55 0.37 0.44 4162 0.55 0.76 0.63 471

Average 0.56 0.56 0.54 1033

Naïve Bayes

0 0.39 0.16 0.23 1621 0.63 0.3 0.4 4022 0.5 0.84 0.63 466

Average 0.54 0.84 0.63 1030

Decision Tree

0 0.3 0.39 0.34 1311 0.76 0.75 0.75 4162 0.8 0.75 0.78 485

Average 0.72 0.7 0.71 1032

Random Forest

0 0.58 0.43 0.5 1831 0.73 0.75 0.74 4142 0.71 0.78 0.74 440

Average 0.7 0.7 0.7 1037

K Neighbors

0 0.53 0.33 0.41 1501 0.83 0.82 0.82 4252 0.77 0.88 0.82 472

Average 0.76 0.77 0.76 1047

Regression and survival models

Cox proportional hazards model

Aalen’s additive model

Nomogram demonstration

Limitations

Not a substitute for medical advice

Limitations

Missing or incomplete data in SEER database

Questions?