Basic Statistics for Research: Choosing Appropriate Analyses and Using SPSS Dr. Beth A. Bailey Dr....

28
Basic Statistics for Basic Statistics for Research: Choosing Research: Choosing Appropriate Analyses Appropriate Analyses and Using SPSS and Using SPSS Dr. Beth A. Bailey Dr. Beth A. Bailey Dr. Tiejian Wu Dr. Tiejian Wu Department of Family Department of Family Medicine Medicine

Transcript of Basic Statistics for Research: Choosing Appropriate Analyses and Using SPSS Dr. Beth A. Bailey Dr....

Basic Statistics for Research: Basic Statistics for Research: Choosing Appropriate Choosing Appropriate

Analyses and Using SPSSAnalyses and Using SPSS

Dr. Beth A. BaileyDr. Beth A. Bailey

Dr. Tiejian WuDr. Tiejian Wu

Department of Family MedicineDepartment of Family Medicine

OverviewOverview

• Two primary objectives of this workshop:Two primary objectives of this workshop:• Learn how to choose the appropriate statistical Learn how to choose the appropriate statistical tests to analyze datatests to analyze data• Learn the basics of conducting these tests in Learn the basics of conducting these tests in SPSSSPSS

• First objective – learn what is important in First objective – learn what is important in choosing analyses, and information about choosing analyses, and information about some of the more common statistical some of the more common statistical analysesanalyses

• Second objective – will get a data set and Second objective – will get a data set and walk through how to conduct some specific walk through how to conduct some specific analysesanalyses

Levels of MeasurementLevels of Measurement

• Which statistics you can use to analyze Which statistics you can use to analyze your data are determined by the level of your data are determined by the level of measurement of each variablemeasurement of each variable

• Four levels of measurement:Four levels of measurement:• NominalNominal• OrdinalOrdinal• IntervalInterval• RatioRatio

Levels of Measurement - NominalLevels of Measurement - Nominal

• The term “measurement” applied to The term “measurement” applied to nominal data collection is actually nominal data collection is actually incorrectincorrect

• Nominal refers to differences in quality Nominal refers to differences in quality not quantitynot quantity

• Data is nominal when the best you can Data is nominal when the best you can do is group it into classes with no do is group it into classes with no particular orderparticular order

• Numbers can be used to represent the Numbers can be used to represent the categories, but have no inherent meaningcategories, but have no inherent meaning

• Example: religion (Christian, Jewish, Example: religion (Christian, Jewish, Muslim) – can’t say one has “more”Muslim) – can’t say one has “more”

Levels of Measurement - OrdinalLevels of Measurement - Ordinal

• Data has an inherent order, but the Data has an inherent order, but the distance between the groups is unknowndistance between the groups is unknown

• So, categories represent a higher or So, categories represent a higher or lower level of the variable, but you don’t lower level of the variable, but you don’t know how much higher or lowerknow how much higher or lower

• Example: weight categories Example: weight categories (underweight, normal, overweight, obese)(underweight, normal, overweight, obese)

• A variable with only 2 categories can be A variable with only 2 categories can be ordinal if there is an inherent order (ex: ordinal if there is an inherent order (ex: course grade recorded as “pass” or “fail”)course grade recorded as “pass” or “fail”)

Levels of Measurement - IntervalLevels of Measurement - Interval• Data categories have an inherent order Data categories have an inherent order andand the distance between the categories the distance between the categories is knownis known

• The data represent a true magnitude so The data represent a true magnitude so that the difference between equal units is that the difference between equal units is the samethe same

• Example: patient temperature – 99, 101, Example: patient temperature – 99, 101, 103 – difference of 2 degrees means the 103 – difference of 2 degrees means the same thingsame thing

• In theory, data could be decimalized In theory, data could be decimalized when a variable is interval (not always when a variable is interval (not always measured with that level of precision, measured with that level of precision, though)though)

Levels of Measurement - RatioLevels of Measurement - Ratio• Data categories have an inherent order Data categories have an inherent order andand the distance between the categories is the distance between the categories is knownknown

• In addition – there is a true 0 point that In addition – there is a true 0 point that represents the absence of somethingrepresents the absence of something

• Example: drug dosages – 10mg, 20mg, Example: drug dosages – 10mg, 20mg, 30mg, 40mg – 40mg is twice as much as 30mg, 40mg – 40mg is twice as much as 20mg – 0mg would represent the absence of 20mg – 0mg would represent the absence of any drugany drug

• Why is temperature not a ratio level Why is temperature not a ratio level variable?variable?

• For purposes of choosing statistical For purposes of choosing statistical analyses, the distinction between interval analyses, the distinction between interval and ratio is unimportantand ratio is unimportant

Levels of MeasurementLevels of Measurement• A single variable can have more than one A single variable can have more than one level of measurement, depending on how the level of measurement, depending on how the data are collected.data are collected.

• Example: weightExample: weight• assessed as either overweight or not assessed as either overweight or not overweightoverweight

• assessed as an actual value in poundsassessed as an actual value in pounds

• Since more statistical analysis options are Since more statistical analysis options are typically available at higher levels of typically available at higher levels of measurement, and you have more power to measurement, and you have more power to find effects, data collection should be done find effects, data collection should be done at the highest level of measurement possible at the highest level of measurement possible – it can always be reduced later (ex: weight)– it can always be reduced later (ex: weight)

Levels of Measurement - QuizLevels of Measurement - Quiz1.1. IQ scoresIQ scores

2.2. GenderGender

3.3. Income Income (as a dollar amount)(as a dollar amount)

4.4. Income Income (in 6 categories)(in 6 categories)

5.5. Likert scale scores Likert scale scores (1=strongly disagree, (1=strongly disagree, 2=slightly disagree, 3=neutral, 4=slightly agree, 2=slightly disagree, 3=neutral, 4=slightly agree, 5=strongly disagree)5=strongly disagree)

6.6. Cancer status Cancer status (has cancer, does not have cancer)(has cancer, does not have cancer)

7.7. Practice location Practice location (rural, urban)(rural, urban)

8.8. Cigarette smoking Cigarette smoking (# cig/day)(# cig/day)

9.9. Cigarette smoking Cigarette smoking (none, up to ½ ppd, ½ ppd-<1 (none, up to ½ ppd, ½ ppd-<1 ppd, 1 ppd+)ppd, 1 ppd+)

Measures of Central TendencyMeasures of Central Tendency

• Measures of central tendency allow us to Measures of central tendency allow us to summarize the data collected for a particular summarize the data collected for a particular variablevariable

• Specifically, a measure of central tendency Specifically, a measure of central tendency tells you the “typical” level or score of a tells you the “typical” level or score of a variablevariable

• Three primary measures of central tendency:Three primary measures of central tendency:• Mean – average – interval & ratio onlyMean – average – interval & ratio only

• Median – score in the middle – ordinal or higherMedian – score in the middle – ordinal or higher

• Mode – most common score – all LOM Mode – most common score – all LOM

Measures of Association - IntroMeasures of Association - Intro• Measures of association allow us to determine Measures of association allow us to determine

whether 2 or more variables are relatedwhether 2 or more variables are related

• The measure of association that is appropriate The measure of association that is appropriate depends on depends on • the level of measurement of the variables (and the level of measurement of the variables (and

number of categories if nominal or ordinal; normality number of categories if nominal or ordinal; normality of distribution if interval/ratio)of distribution if interval/ratio)

• whether the variables represent independent whether the variables represent independent observationsobservations

• Independence of observationsIndependence of observations• Results from the same individual or matched Results from the same individual or matched

individuals are not independentindividuals are not independent

• Depends on the study design – the same questions Depends on the study design – the same questions can often be answered with both independent and can often be answered with both independent and dependent observationsdependent observations

Measures of Association - IntroMeasures of Association - Intro• Example: want to know if Drug X significantly Example: want to know if Drug X significantly

lowers cholesterol lowers cholesterol

• There are at least 3 potential study designs:There are at least 3 potential study designs:• Take a group of people, measure cholesterol initially Take a group of people, measure cholesterol initially

and then again after they have taken Drug X for 8 and then again after they have taken Drug X for 8 weeks (same people – repeated measures)weeks (same people – repeated measures)

• Take a group of people, divide in to two groups based Take a group of people, divide in to two groups based on age and gender, give 1 group the drug, one group on age and gender, give 1 group the drug, one group a placebo, then measure cholesterol after 8 weeks a placebo, then measure cholesterol after 8 weeks (matched groups)(matched groups)

• Take a group of people, randomly divide in to two Take a group of people, randomly divide in to two groups, give 1 group the drug, one group the placebo, groups, give 1 group the drug, one group the placebo, then measure cholesterol after 8 weeks (independent then measure cholesterol after 8 weeks (independent groups)groups)

• What are the variables and levels of What are the variables and levels of measurement?measurement?

Measures of AssociationMeasures of AssociationIndependent Observations – 2 VariablesIndependent Observations – 2 Variables

NominalNominal Ordinal*Ordinal* Interval**Interval**

NominalNominal(2 levels)(2 levels)

ϰϰ22;;Fisher’s Exact Fisher’s Exact

TestTest

Wilcoxon Mann-Wilcoxon Mann-Whitney TestWhitney Test

t-test t-test (indep groups);(indep groups);

point biserialpoint biserial

NominalNominal(3+ Levels)(3+ Levels)

ϰϰ22 Kruskal WallisKruskal Wallis one-way ANOVAone-way ANOVA

Ordinal*Ordinal* ϰϰ22;;Mann-Whitney Mann-Whitney

TestTest

Spearman rank Spearman rank testtest

one-way ANOVA;one-way ANOVA;

Spearman rank;Spearman rank;

linear regressionlinear regression

Interval**Interval** logistic logistic regression;regression;

point biserialpoint biserial

Spearman rank Spearman rank testtest

Pearson Pearson correlation;correlation;

linear regressionlinear regression

Outcome

Pre

dic

tor

Measures of AssociationMeasures of AssociationDependent Observations – 2 VariablesDependent Observations – 2 Variables

NominalNominal Ordinal*Ordinal* Interval**Interval**

NominalNominal(2 levels)(2 levels)

McNemar testMcNemar test Wilcoxon signed Wilcoxon signed ranks testranks test

paired t-testpaired t-test

NominalNominal(3+ Levels)(3+ Levels)

repeated repeated measures measures

logistic logistic regressionregression

Friedman testFriedman test one-way repeated one-way repeated measures ANOVAmeasures ANOVA

Outcome

Pre

dic

tor

* Ordinal or interval but not normally distributed** Interval and normally distributed

Measures of Association - ExercisesMeasures of Association - Exercises 1. Study designed to look at the association 1. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women during pregnancy and assessed for 200 women during pregnancy and then delivery charts are reviewed and infant then delivery charts are reviewed and infant birth weight is recorded.birth weight is recorded.

a. a. Smoking – defined as positive or negativeSmoking – defined as positive or negative

Birth weight – defined as LBW or normalBirth weight – defined as LBW or normal

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 1. Study designed to look at the association 1. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women during pregnancy and assessed for 200 women during pregnancy and then delivery charts are reviewed and infant then delivery charts are reviewed and infant birth weight is recorded.birth weight is recorded.

b. b. Smoking – defined as positive or negativeSmoking – defined as positive or negative

Birth weight – defined as weight in gBirth weight – defined as weight in g

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 1. Study designed to look at the association 1. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women during pregnancy and assessed for 200 women during pregnancy and then delivery charts are reviewed and infant then delivery charts are reviewed and infant birth weight is recorded.birth weight is recorded.

c. c. Smoking – # cigarettes/daySmoking – # cigarettes/day

Birth weight – LBW or normalBirth weight – LBW or normal

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 1. Study designed to look at the association 1. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women during pregnancy and assessed for 200 women during pregnancy and then delivery charts are reviewed and infant then delivery charts are reviewed and infant birth weight is recorded.birth weight is recorded.

d. d. Smoking – # cigarettes/daySmoking – # cigarettes/day

Birth weight – weight in gmBirth weight – weight in gm

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 1. Study designed to look at the association 1. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women during pregnancy and assessed for 200 women during pregnancy and then delivery charts are reviewed and infant then delivery charts are reviewed and infant birth weight is recorded.birth weight is recorded.

e. e. Smoking – none, < ppd, ppd+Smoking – none, < ppd, ppd+

Birth weight – LBW or normalBirth weight – LBW or normal

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 1. Study designed to look at the association 1. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women during pregnancy and assessed for 200 women during pregnancy and then delivery charts are reviewed and infant then delivery charts are reviewed and infant birth weight is recorded.birth weight is recorded.

f. f. Smoking – none, < ppd, ppd+Smoking – none, < ppd, ppd+

Birth weight – weight in gmBirth weight – weight in gm

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 2. Study designed to look at the association 2. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women who smoked during assessed for 200 women who smoked during their first pregnancy but not during their second their first pregnancy but not during their second pregnancy, and then delivery charts are pregnancy, and then delivery charts are reviewed and infant birth weight is recorded.reviewed and infant birth weight is recorded.

a. a. Smoking – defined as positive or negativeSmoking – defined as positive or negative

Birth weight – defined as LBW or normalBirth weight – defined as LBW or normal

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 2. Study designed to look at the association 2. Study designed to look at the association

between smoking during pregnancy and between smoking during pregnancy and newborn birth weight. Smoking status is newborn birth weight. Smoking status is assessed for 200 women who smoked during assessed for 200 women who smoked during their first pregnancy but not during their second their first pregnancy but not during their second pregnancy, and then delivery charts are pregnancy, and then delivery charts are reviewed and infant birth weight is recorded.reviewed and infant birth weight is recorded.

b. b. Smoking – defined as positive or negativeSmoking – defined as positive or negative

Birth weight – weight in gmBirth weight – weight in gm

Independent or dependent groups?Independent or dependent groups?

Smoking – level of measurement?Smoking – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association Measures of Association 3 or more Variables3 or more Variables

• The analysis you choose here will depend on The analysis you choose here will depend on the nature of the question. Generally:the nature of the question. Generally:• Regression (multiple or logistic) – 1 outcome (any Regression (multiple or logistic) – 1 outcome (any

level), multiple predictors (all interval level)level), multiple predictors (all interval level)

• ANOVA (factorial ANOVA, ANCOVA) – 1 outcome ANOVA (factorial ANOVA, ANCOVA) – 1 outcome (interval), multiple predictors (1 or more not interval)(interval), multiple predictors (1 or more not interval)

• MANOVA/MANCOVA – 2+ outcomes (interval), 1+ MANOVA/MANCOVA – 2+ outcomes (interval), 1+ predictor (1 or more not interval)predictor (1 or more not interval)

• Multivariate multiple linear regression/canonical Multivariate multiple linear regression/canonical correlation – 2+ outcomes, 1+ predictor, all interval correlation – 2+ outcomes, 1+ predictor, all interval levellevel

• Repeated measures versions exist for Repeated measures versions exist for dependent observationsdependent observations

Measures of Association - ExercisesMeasures of Association - Exercises 3. Same as study 1, but we are interested in 3. Same as study 1, but we are interested in

looking at how both cigarette smoking and looking at how both cigarette smoking and alcohol consumption impact birthweightalcohol consumption impact birthweight

a. a. Smoking – defined as # cigarettes/daySmoking – defined as # cigarettes/day

Alcohol – defined as # oz/dayAlcohol – defined as # oz/day

Birth weight – weight in gmBirth weight – weight in gm

Smoking – level of measurement?Smoking – level of measurement?

Alcohol – level of measurement?Alcohol – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 3. Same as study 1, but we are interested in 3. Same as study 1, but we are interested in

looking at how both cigarette smoking and looking at how both cigarette smoking and alcohol consumption impact birthweightalcohol consumption impact birthweight

b. b. Smoking – defined as # cigarettes/daySmoking – defined as # cigarettes/day

Alcohol – defined as none or anyAlcohol – defined as none or any

Birth weight – weight in gmBirth weight – weight in gm

Smoking – level of measurement?Smoking – level of measurement?

Alcohol – level of measurement?Alcohol – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 3. Same as study 1, but we are interested in 3. Same as study 1, but we are interested in

looking at how both cigarette smoking and looking at how both cigarette smoking and alcohol consumption impact birthweightalcohol consumption impact birthweight

c. c. Smoking – defined as # cigarettes/daySmoking – defined as # cigarettes/day

Alcohol – defined as # oz/dayAlcohol – defined as # oz/day

Birth weight – LBW or normalBirth weight – LBW or normal

Smoking – level of measurement?Smoking – level of measurement?

Alcohol – level of measurement?Alcohol – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Appropriate statistical test?Appropriate statistical test?

Measures of Association - ExercisesMeasures of Association - Exercises 4. Same as study 3, but we are interested in 4. Same as study 3, but we are interested in

looking at how smoking and alcohol impact both looking at how smoking and alcohol impact both birthweight and prematuritybirthweight and prematurity

Smoking – defined as positive or negativeSmoking – defined as positive or negative

Alcohol – defined as positive or negativeAlcohol – defined as positive or negative

Birth weight – weight in gmBirth weight – weight in gm

Prematurity – gestational age at birth in Prematurity – gestational age at birth in wkswks

Smoking – level of measurement?Smoking – level of measurement?

Alcohol – level of measurement?Alcohol – level of measurement?

Birth weight – level of measurement?Birth weight – level of measurement?

Prematurity – level of measurementPrematurity – level of measurement

Appropriate statistical test?Appropriate statistical test?

Basic Statistics for Research: Basic Statistics for Research: Choosing Appropriate Choosing Appropriate

Analyses and Using SPSSAnalyses and Using SPSS

Dr. Beth A. BaileyDr. Beth A. Bailey

Dr. Tiejian WuDr. Tiejian Wu

Department of Family MedicineDepartment of Family Medicine