20130308 Preparing data for modeling in R

Preparing data for modeling in

2013-03-08 @HSPHKazuki Yoshida, M.D. MPH-CLE student

FREEDOMTO KNOW

Group Website is at:

http://rpubs.com/kaz_yos/useR_at_HSPH

Open R Studio

Create a new scriptand save it.

http://www.umass.edu/statdata/statdata/data/

lowbwt.dat

http://www.umass.edu/statdata/statdata/data/lowbwt.txthttp://www.umass.edu/statdata/statdata/data/lowbwt.dat

We will use lowbwt dataset used inBIO213 Applied Regression for Clinical Research

NAME: ! LOW BIRTH WEIGHT DATA (LOWBWT.DAT)KEYWORDS: Logistic RegressionSIZE: 189 observations, 11 variables

SOURCE: Hosmer and Lemeshow (2000) Applied Logistic Regression: Second ! Edition. These data are copyrighted by John Wiley & Sons Inc. and must ! be acknowledged and used accordingly. Data were collected at Baystate! Medical Center, Springfield, Massachusetts during 1986.

DESCRIPTIVE ABSTRACT:

The goal of this study was to identify risk factors associated withgiving birth to a low birth weight baby (weighing less than 2500 grams).Data were collected on 189 women, 59 of which had low birth weight babiesand 130 of which had normal birth weight babies. Four variables which werethought to be of importance were age, weight of the subject at her lastmenstrual period, race, and the number of physician visits during the firsttrimester of pregnancy.

This data set consists of the complete data. A paired data setcreated from this low birth weight data may be found in lowbwtm11.dat anda 3 to 1 matched data set created from the low birth weight data may befound in mlowbwt.dat.

http://www.umass.edu/statdata/statdata/data/lowbwt.txt

LIST OF VARIABLES:

Columns Variable Abbreviation-----------------------------------------------------------------------------2-4 Identification Code ID 10 Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) 17-18 Age of the Mother in Years AGE 23-25 Weight in Pounds at the Last Menstrual Period LWT 32 Race (1 = White, 2 = Black, 3 = Other) RACE 40 Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE 48 History of Premature Labor (0 = None 1 = One, etc.) PTL 55 History of Hypertension (1 = Yes, 0 = No) HT 61 Presence of Uterine Irritability (1 = Yes, 0 = No) UI 67 Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) 73-76 Birth Weight in Grams BWT-----------------------------------------------------------------------------

http://www.umass.edu/statdata/statdata/data/lowbwt.txt

PEDAGOGICAL NOTES: These data have been used as an example of fitting a multiplelogistic regression model.

STORY BEHIND THE DATA: Low birth weight is an outcome that has been of concern to physiciansfor years. This is due to the fact that infant mortality rates and birthdefect rates are very high for low birth weight babies. A woman's behaviorduring pregnancy (including diet, smoking habits, and receiving prenatal care)can greatly alter the chances of carrying the baby to term and, consequently,of delivering a baby of normal birth weight. The variables identified in the code sheet given in the table have beenshown to be associated with low birth weight in the obstetrical literature. Thegoal of the current study was to ascertain if these variables were importantin the population being served by the medical center where the data werecollected.

References:

1. Hosmer and Lemeshow, Applied Logistic Regression, Wiley, (1989).

lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4)

Load dataset from web

header = TRUEto pick up

variable names

skip 4 rows

lbw[c(10,39), "BWT"] <- c(2655, 3035)

“Fix” dataset

Replace data pointsto make the dataset identical

to BIO213 dataset10th,39th

BWT column

Lower case variable names

names(lbw) <- tolower(names(lbw))

Convert variable names to lower case

Put them back into variable names

See overview

library(gpairs)gpairs(lbw)

RecodingChanging and creating variables

Different variable forms mean different modeling

assumptions!

Variable form and assumption

n Continuous variables:

n Linearity assumption

n Categorical variables:

n No residual confounding assumption

Relabel race: 1, 2, 3 to White, Black, Other

lbw$race.cat <- factor(lbw$race, levels = 1:3, labels = c("White","Black","Other"))

Using this variable as continuous is meaning less!!

Take race variable

Order levels 1, 2, 3Make 1 reference level

Label levels 1, 2, 3 as White, Black, Other

Create new variable named

race.cat

Dichotomize ptl

lbw$preterm <- factor(ifelse(lbw$ptl >= 1, "1+", "0"))

Change to categorical

If condition is true, then “1+”

if not (else) “0”ifelse function give either one of two values

condition

Change 0,1 binary to No,Yes binary

lbw$smoke <- factor(ifelse(lbw$smoke == 1, "Yes", "No")) lbw$ht <- factor(ifelse(lbw$ht == 1, "Yes", "No"))lbw$ui <- factor(ifelse(lbw$ui == 1, "Yes", "No"))lbw$low <- factor(ifelse(lbw$low == 1, "Yes", "No"))

equality is tested by ==, not =

if 1, return “Yes”

if not, return “No”

cutting a continuous variableinto categories

lbw$ftv.cat <- cut(lbw$ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))

-Inf Inf0 1 2 3 4 5 6] ] ](None Normal Many

breaks = c(-Inf, 0, 2, Inf)

labels = c("None","Normal","Many")

Breaks at

Label them as

4 bounds for 3 categories

Make “Normal” the reference level

lbw$ftv.cat <- relevel(lbw$ftv.cat, ref = "Normal")

“Normal” as reference level

within() allows direct use of variable names

lbw <- within(lbw, {

## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal")

## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes"))})

You can specify variables with variable name only. No need for lbw$

within() method

model formula

outcome ~ predictor1 + predictor2 + predictor3

formula

SAS equivalent: model outcome = predictor1 predictor2 predictor3;

age ~ zyg

In the case of t-test

continuous variable to be compared

grouping variable to separate groups

Variable to be explained

Variable used to explain

Y ~ X1 + X2

linear sum

n . All variables except for the outcome

n + X2 Add X2 term

n - 1 Remove intercept

n X1:X2 Interaction term between X1 and X2

n X1*X2 Main effects and interaction term

Y ~ X1 + X2 + X1:X2

Interaction term

Main effects Interaction

Y ~ X1 * X2

Interaction term

Main effects & interaction

Y ~ X1 + I(X2 * X3)

On-the-fly variable manipulation

New variable (X2 times X3) created on-the-fly and used

Inhibit formula interpretation. For math

manipulation

lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)

Fit a model

lm.full

See model object

Call: command repeated

Coefficient for each variable

summary(lm.full)

See summary

Call: command repeated

Model F-test

Residual distribution

Dummy variables created

R^2 and adjusted R^2

Coef/SE = t

ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level)

ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)

race.catBlack Black people compared to White people (reference level)

race.catOther Other people compared to White people (reference level)

confint(fit.lm)

Confidence intervals

Lower boundary

Upper boundary

Confidence intervals

20130308 Preparing data for modeling in R

Documents

Transcript of 20130308 Preparing data for modeling in R

Cbst v Hsp 20130308 - Cc Answ Hospira

Best practices for preparing vessel internals segmentation ... · February 17, 2016 Best practices for preparing vessel ... Tooling design Lower Core Plate •Extensive 3-D modeling

BBC News 20130308

March 8,20 - KY PSC Home cases/2012-00562/20130308... · 2013. 3. 8. · March 8,20 13 Mr. Jeff Derouen, Executive Director Kentucky Public Service Cornmission 21 1 Sower Boulevard

Preparing for the Masters & PhD Career Fair and On ......• MS with 3 years computational chemistry modeling and programming experience • My favorite LinkedIn Headline Professional

Object-Oriented Modeling and Design. understanding problems communicating with application experts modeling enterprises preparing documentation Object-Oriented.

20130308 webstrategie

World Office Conferencia Reforma Tributaria 20130308

PREPARING PRE-SERVICE TEACHERS TO TEACH PROBABILITY USING ... 2-8 Preparing Pre... · preparing pre-service teachers to teach probability using heuristics by ... preparing pre-service

ADDRESSING UNUSED AND VACANT FEDERAL …docs.house.gov/meetings/GO/GO24/20130308/100452/HMTG-113-GO24...Letter about Dyer Courthouse and Ferguson Courthouse ... Frederica Wilson, and

IH Technology: Preparing for Scale · Catalyst partner, oil refinery expertise, exclusive global licensor. Experimental fluidization dynamics, cold flow modeling, solids handling.

PHAT! Productions Profile 040612 [Read-Only]epresskitz.com/pdf/bio-35258969-20130308-PHATProductions_Bio.pdf · PHAT! Productions, since June ... Launch of the Strategic Plan for

CECL: Planning & Preparing€¦ · CECL Committee. Data Gap Analysis. Review/ Select Solution. Preliminary Modeling. Review Model with Auditors. Final Model & Validation • Consider:

Preparing Your Business Plan Preparing Your Business Plan ...€¦ · Preparing Your Business Plan Preparing Your Business Plan ---- GRANTfinder Special Feature a ‘How to’ Guidea

Financial Modeling & Valuation Course - …uppsalaekonomerna.se/.../09/Financial-Modeling-Valuation-Course... · Financial Modeling & Valuation Course 1 Preparing Business students

20130308 2011 cancer care annual report hem onc

WMS 10.1 Tutorial Hydraulics and Floodplain Modeling …wmstutorials-10.1.aquaveo.com/29... · · 2016-06-222 Preparing the Model ... 2.3 Creating 1D Hydraulic Coverages ... Switch

Preparing for the Interview - suncoastworkforce.comsuncoastworkforce.com/assets/online-workshops/Preparing for the... · Preparing for the Interview Workshop 1 careersourcesuncoast.com

Wordnet-Enhanced Topic Modelsphdforum.im.ntu.edu.tw/1012/20130308.pdf · Wordnet Concept Construction • Filter out Wordnet synsets that are most relevant to the given corpus •

Preparing complex parts for manufacture - cadcamthai.comcadcamthai.com/wp-content/uploads/2018/05/THAI-Powershape-overview... · PowerShape is the ideal modeling companion for Autodesk