Introduction of Thomas H. Taylor, Jr., PE Georgia Institute of Technology, BS Applied Mathematics,...

23
Introduction of Thomas H. Taylor, Jr., PE Georgia Institute of Technology, BS Applied Mathematics, 1975 Georgia State University, MS Decision Sciences, Statistics Concentration, 1985 Registered Professional Engineer, Industrial 25 years in private-sector energy industry + 8 years in micro-biology and public health, in federal government Senior Executive in utility consulting industry Senior federal employee, well published in scientific journals. Holder of Methods Patent for new computational approach and associated SAS TM -based software for series-dilution bioassays Career conclusions: Modeling (and much of statistics in general) is transferable across sectors, industries, and disciplines. The jargon varies across sectors, industries, and disciplines
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    1

Transcript of Introduction of Thomas H. Taylor, Jr., PE Georgia Institute of Technology, BS Applied Mathematics,...

Page 1: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Introduction ofThomas H. Taylor, Jr., PE

Georgia Institute of Technology, BS Applied Mathematics, 1975 Georgia State University, MS Decision Sciences, Statistics

Concentration, 1985 Registered Professional Engineer, Industrial 25 years in private-sector energy industry + 8 years in micro-biology

and public health, in federal government Senior Executive in utility consulting industry Senior federal employee, well published in scientific journals. Holder of Methods Patent for new computational approach and associated

SASTM-based software for series-dilution bioassays Career conclusions:

Modeling (and much of statistics in general) is transferable across sectors, industries, and disciplines.

The jargon varies across sectors, industries, and disciplines

Page 2: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Presentation Outline Introduction of T. Taylor Regression Modeling Motivation

Implicit in the development of a real-world model is the expectation that it be used for decision making.

The decision-making is the guiding principle for model development. Modeling Examples

Course of Disease – response decisions Epidemiological, Chronic – policy and treatment decisions Epidemiological, Outbreak – announcements & recalls

Software for modeling – SASTM is superior to ExcelTM in modeling situations, due to documentation, reproducibility, and audit-worthiness.

Regression modeling in the real world is not as clean as it is in many textbooks

Page 3: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Decision-making and Risk Implicit in decision making is the minimization of risk

Risk = probability (event) X loss function (event) Loss functions are different in different industries and sectors “Risk” is used incorrectly in some sectors and industries. Government decision criteria are considerably different from private sector

Public welfare is not expected to be cost-effective Epidemiology

Objective: Reduce burden of disease or rate of mortality Intervention: Vaccine introduction; educational campaigns, e.g. hand-washing;

avoidance of specific behaviors; food and drug recalls Energy

Objective: reduce energy use, or re-arrange energy use Actions: green marketing; efficiency mandates; development of alternatives

Classic Marketing Objective: increase sales; maximize profit; minimize risk Decisions: pricing, product/service choice; R&D

Page 4: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

exposure

Ind

ivid

ual

to

lera

nce

spores

Spo

re

eqiu

vale

nt o

f to

xin

leve

l

y=x

sick

not

sick

Decision/Outcome Decision/Outcome CriterionCriterion

Page 5: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Anthrax Course of Disease

0

4

8

12

16

20

24

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30Days

Exposure just OVER threshold

Exposure just UNDER thresholdExposure=Personal Tolerance

FulminantStage

Prodromal

Stage

Page 6: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Anthrax Course of Disease

0

4

8

12

16

20

24

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30Days

Exposure >> Personal Tolerance

FulminantStage

Page 7: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

exposure

Ind

ivid

ual

to

lera

nce

10-11 d

ays to p

eak toxin

level

(asym

ptom

atic)Not sick

10-11 d

ays to p

rodro

mal d

isease

6-7 d

ays till p

rodro

mal

4-5 d

ays till p

rodro

mal

2-3 d

ays

3 hrs.600 50,000 100,000

600

50,000

100,000

Decision Timepoints (from Decision Timepoints (from Model!)Model!)

Page 8: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Popular Regression ModelsTime series

Simple Trends, e.g. energy increase per year Application-specific functions, e.g. sigmoidal ARIMA et al

“Causal” – not really: association ≠ cause Energy

End-use: BTU=f(appliance stock, efficiency) Econometric: BTU=f(cost of energy, income, inflation)

Epidemiological Case-status=f(age, sex, race, genetic factors) Case-status=f(exposure1, exposure2,…)

“Survival” (Time-to-Event) models

Page 9: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

SASSASTMTM Regression Procedures Regression ProceduresGeneral Regression: The REG ProcedureGeneral Regression: The REG ProcedureNonlinear Regression: The NLIN ProcedureNonlinear Regression: The NLIN ProcedureResponse Surface Regression: The RSREG ProcedureResponse Surface Regression: The RSREG ProcedurePartial Least Squares Regression: The PLS ProcedurePartial Least Squares Regression: The PLS ProcedureRegression for Ill-conditioned Data: The ORTHOREG ProcedureRegression for Ill-conditioned Data: The ORTHOREG ProcedureLocal Regression: The LOESS ProcedureLocal Regression: The LOESS ProcedureRobust Regression: The ROBUSTREG ProcedureRobust Regression: The ROBUSTREG ProcedureLogistic Regression: The LOGISTIC ProcedureLogistic Regression: The LOGISTIC ProcedureRegression with Transformations: The TRANSREG ProcedureRegression with Transformations: The TRANSREG ProcedureRegression Using the GLM, CATMOD, LOGISTIC, PROBIT, and Regression Using the GLM, CATMOD, LOGISTIC, PROBIT, and LIFEREG ProceduresLIFEREG ProceduresInteractive Features in the CATMOD, GLM, and REG ProceduresInteractive Features in the CATMOD, GLM, and REG Procedures

http://support.sas.com/onlinedoc/913/docMainpage.jsphttp://support.sas.com/onlinedoc/913/docMainpage.jsp

Page 10: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

SASSASTMTM Regression Help (1) Regression Help (1)CATMOD CATMOD

– analyzes data that can be represented by a contingency table. PROC CATMOD fits linear models to analyzes data that can be represented by a contingency table. PROC CATMOD fits linear models to functions of response frequencies, and it can be used for linear and logistic regression. The CATMOD functions of response frequencies, and it can be used for linear and logistic regression. The CATMOD procedure is discussed in detail in procedure is discussed in detail in Chapter 5, "Introduction to Categorical Data Analysis Procedures."Chapter 5, "Introduction to Categorical Data Analysis Procedures."

GENMOD GENMOD – fits generalized linear models. PROC GENMOD is especially suited for responses with discrete outcomes, fits generalized linear models. PROC GENMOD is especially suited for responses with discrete outcomes,

and it performs logistic regression and Poisson regression as well as fitting Generalized Estimating and it performs logistic regression and Poisson regression as well as fitting Generalized Estimating Equations for repeated measures data. See Equations for repeated measures data. See Chapter 5, "Introduction to Categorical Data Analysis Procedures,"Chapter 5, "Introduction to Categorical Data Analysis Procedures," and and Chapter 29, "The GENMOD Procedure,"Chapter 29, "The GENMOD Procedure," for more information. for more information.

GLM GLM – uses the method of least squares to fit general linear models. In addition to many other analyses, PROC uses the method of least squares to fit general linear models. In addition to many other analyses, PROC

GLM can perform simple, multiple, polynomial, and weighted regression. PROC GLM has many of the same GLM can perform simple, multiple, polynomial, and weighted regression. PROC GLM has many of the same input/output capabilities as PROC REG, but it does not provide as many diagnostic tools or allow interactive input/output capabilities as PROC REG, but it does not provide as many diagnostic tools or allow interactive changes in the model or data. See changes in the model or data. See Chapter 4, "Introduction to Analysis-of-Variance Procedures,"Chapter 4, "Introduction to Analysis-of-Variance Procedures," for a more for a more detailed overview of the GLM procedure. detailed overview of the GLM procedure.

LIFEREG LIFEREG – fits parametric models to failure-time data that may be right censored. These types of models are commonly fits parametric models to failure-time data that may be right censored. These types of models are commonly

used in survival analysis. See used in survival analysis. See Chapter 10, "Introduction to Survival Analysis Procedures,"Chapter 10, "Introduction to Survival Analysis Procedures," for a more for a more detailed overview of the LIFEREG procedure. detailed overview of the LIFEREG procedure.

http://v8doc.sas.com/sashtmlhttp://v8doc.sas.com/sashtml//

Page 11: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

SASSASTMTM Regression Help (2) Regression Help (2)LOGISTIC LOGISTIC – fits logistic models for binomial and ordinal outcomes. PROC LOGISTIC provides fits logistic models for binomial and ordinal outcomes. PROC LOGISTIC provides

a wide variety of model-building methods and computes numerous regression a wide variety of model-building methods and computes numerous regression diagnostics. See diagnostics. See Chapter 5, "Introduction to Categorical Data Analysis Procedures,"Chapter 5, "Introduction to Categorical Data Analysis Procedures," for a brief for a brief comparison of PROC LOGISTIC with other procedures.comparison of PROC LOGISTIC with other procedures.

NLIN NLIN – builds nonlinear regression models. Several different iterative methods are builds nonlinear regression models. Several different iterative methods are

available.available.ORTHOREG ORTHOREG – performs regression using the Gentleman-Givens computational method. For ill-performs regression using the Gentleman-Givens computational method. For ill-

conditioned data, PROC ORTHOREG can produce more accurate parameter conditioned data, PROC ORTHOREG can produce more accurate parameter estimates than other procedures such as PROC GLM and PROC REG.estimates than other procedures such as PROC GLM and PROC REG.

PLS PLS – performs partial least squares regression, principal components regression, and performs partial least squares regression, principal components regression, and

reduced rank regression, with cross validation for the number of components. reduced rank regression, with cross validation for the number of components.

http://v8doc.sas.com/sashtml/http://v8doc.sas.com/sashtml/

Page 12: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

SASSASTMTM Regression Help (3) Regression Help (3)

PROBIT PROBIT – performs probit regression as well as logistic regression and ordinal logistic regression. The performs probit regression as well as logistic regression and ordinal logistic regression. The

PROBIT procedure is useful when the dependent variable is either dichotomous or PROBIT procedure is useful when the dependent variable is either dichotomous or polychotomous and the independent variables are continuous.polychotomous and the independent variables are continuous.

REG REG – performs linear regression with many diagnostic capabilities, selects models using one of performs linear regression with many diagnostic capabilities, selects models using one of

nine methods, produces scatter plots of raw data and statistics, highlights scatter plots to nine methods, produces scatter plots of raw data and statistics, highlights scatter plots to identify particular observations, and allows interactive changes in both the regression model identify particular observations, and allows interactive changes in both the regression model and the data used to fit the model. and the data used to fit the model.

RSREG RSREG – builds quadratic response-surface regression models. PROC RSREG analyzes the fitted builds quadratic response-surface regression models. PROC RSREG analyzes the fitted

response surface to determine the factor levels of optimum response and performs a ridge response surface to determine the factor levels of optimum response and performs a ridge analysis to search for the region of optimum response.analysis to search for the region of optimum response.

TRANSREG TRANSREG – fits univariate and multivariate linear models, optionally with spline and other nonlinear fits univariate and multivariate linear models, optionally with spline and other nonlinear

transformations. Models include ordinary regression and ANOVA, multiple and multivariate transformations. Models include ordinary regression and ANOVA, multiple and multivariate regression, metric and nonmetric conjoint analysis, metric and nonmetric vector and ideal regression, metric and nonmetric conjoint analysis, metric and nonmetric vector and ideal point preference mapping, redundancy analysis, canonical correlation, and response point preference mapping, redundancy analysis, canonical correlation, and response surface regression. surface regression.

http://v8doc.sas.com/sashtml/http://v8doc.sas.com/sashtml/

Page 13: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

SASSASTMTM Regression Help (4) Regression Help (4)Several SAS/ETS procedures also perform regression. The following procedures are Several SAS/ETS procedures also perform regression. The following procedures are

documented in the documented in the SAS/ETS User's GuideSAS/ETS User's Guide. .

AUTOREG AUTOREG – implements regression models using time-series data where the errors are autocorrelated.implements regression models using time-series data where the errors are autocorrelated.

PDLREG PDLREG – performs regression analysis with polynomial distributed lags.performs regression analysis with polynomial distributed lags.

SYSLIN SYSLIN – handles linear simultaneous systems of equations, such as econometric models.handles linear simultaneous systems of equations, such as econometric models.

MODEL MODEL – handles nonlinear simultaneous systems of equations, such as econometric models. handles nonlinear simultaneous systems of equations, such as econometric models.

http://v8doc.sas.com/sashtml/http://v8doc.sas.com/sashtml/

Page 14: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Point-and-click vs. SASTM code

SASTM has tremendously more capabilityUse of SASTM procedures provides

documentation, formally and operationallySpreadsheets and point-and-click environments

cannot withstand auditsRegulatory agencies: FERC, FDA, NRC, USDA

(FDA: 21 CFR Part 11)Labor intensive point-and-click can be replaced

with SASTM code to save time and, therefore, focus on analysis, not mechanics.

Page 15: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Specific Models Disease A (used as decision/outcome example above)

Course of disease - NOT regression Disease P

Time series Simple periodic with exception!

Page 16: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

SPI KES REMOVED f r om Seasonal Cur ve Fi t

Seasonal Data with Aberrations

1996 1997 1998 1999

Page 17: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Sinusoidal Piecewise Regression with Trend

Page 18: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Specific Models Disease A

Course of disease - NOT regression Disease P

Time series Simple periodic with exception!

Sigmoid Laboratory applications

Page 19: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Plot of Measured Response* by Dilution“Well-behaved” Specimen

Dilution

True Midpoint (LD50, ED50, etc)

True 50% Titer

0%

100%

Mea

sure

d R

espo

nse

Observed 50% Titer

*Measured response can be cell counts, optical density, luminescence, or other lab-measured quantity.

D

Cx

BABy

1

Page 20: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

What about?… High-Variance SpecimensRobustness of True 50% Endpoint

Dilution

Midpoint (50%)

50%

Obs

erve

d R

espo

nse

D

Cx

BABy

1

Page 21: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Specific Models Disease A

Course of disease - NOT regression Disease P

Time series Simple periodic with exception!

Sigmoid Laboratory applications

Investigation of foodborne disease outbreak Not a laboratory Not a controlled experiment Not even a designed experiment Observational data

Page 22: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

Foodborne Disease Outbreak

Associative (not causal) models Epidemiological

Case-status=f(exposure1, exposure2,…)

0

50

100

Sick People Not Sick

Tomatoes

No tomatoes

Page 23: Introduction of Thomas H. Taylor, Jr., PE  Georgia Institute of Technology, BS Applied Mathematics, 1975  Georgia State University, MS Decision Sciences,

George Box: “…all models are wrong, but some are useful.”

George Edward Pelham Box (18 October 1919 – ) is one of the most influential statisticians of the 20th century and a pioneer in the areas of quality control, time series analysis, design of experiments and Bayesian inference.

He served as President of the American Statistical Association in 1978 and of the Institute of Mathematical Statistics in 1979. He received the Shewhart Medal from the American Society for Quality Control in 1968, the Wilks Memorial Award from the American Statistical Association in 1972, the R. A. Fisher Lectureship in 1974, and the Guy Medal in Gold from the Royal Statistical Society in 1993. He was elected a member of the American Academy of Arts and Sciences in 1974 and a Fellow of the Royal Society in 1979.