Relative Resource Manager

81
HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research Deakin University Page 1 Topic 8: Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research Section Title of section Description Page 8.1 Introduction and Learning Objectives This is a short introduction to Topic 8, including the learning objectives. 3 Note : There is no Challenge in Topic 8 8.2 Descriptive Statistics a way to summarise data This section discusses descriptive statistics that are used to summarise and present data in a simple way. Descriptive statistics include frequency distributions, measures of central tendency, measures of dispersion and distribution patterns including normal and skewed distributions. 5 Prescribed reading (online) 8A Bonita R, Beaglehole R & Kjellström T (2006). „Chapter 4 – Basic biostatistics: concepts and tools‟, in: Basic Epidemiology, 2 nd Edition, World Health Organization, pp. 63-8. Available at URL: http://whqlibdoc.who.int/publications/2006/9241547073_eng.pdf . Date accessed 25 February 2010. 8.3 Measures of association: Correlation, Chi-Square, Relative Risk and Odds Ratio This section explains commonly used measures of association, such as correlation coefficient (r), chi-square (X 2 ), Relative Risk (RR) and Odds Ratio (OR). 23 8.4 Statistical inference and hypothesis testing: Confidence Intervals and P-Values This section discusses the basic concepts of statistical inference and hypothesis testing where we analyse the sample data to make inferences about the population using confidence intervals (CI) and p-values. 44 Prescribed reading (text) 8B Pierson J (2010). „Chapter 24: Data Analysis in Quantitative Research‟, in: Liamputtong P (ed.) Research Methods in Health: Foundations for evidence-based practice, South Melbourne: Oxford University Press, pp. 412-3.

Transcript of Relative Resource Manager

Page 1: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 1

Topic 8: Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Section Title of section Description Page

8.1 Introduction and Learning

Objectives

This is a short introduction to Topic 8, including the learning objectives. 3

Note: There is no Challenge in Topic 8

8.2 Descriptive Statistics – a

way to summarise data

This section discusses descriptive statistics that are used to summarise and present data in a

simple way. Descriptive statistics include frequency distributions, measures of central

tendency, measures of dispersion and distribution patterns including normal and skewed

distributions.

5

Prescribed reading (online) 8A Bonita R, Beaglehole R & Kjellström T (2006). „Chapter 4 – Basic biostatistics: concepts and tools‟,

in: Basic Epidemiology, 2nd Edition, World Health Organization, pp. 63-8. Available at URL:

http://whqlibdoc.who.int/publications/2006/9241547073_eng.pdf. Date accessed 25 February

2010.

8.3 Measures of association:

Correlation, Chi-Square,

Relative Risk and Odds

Ratio

This section explains commonly used measures of association, such as correlation coefficient

(r), chi-square (X2), Relative Risk (RR) and Odds Ratio (OR).

23

8.4 Statistical inference and

hypothesis testing:

Confidence Intervals and

P-Values

This section discusses the basic concepts of statistical inference and hypothesis testing where

we analyse the sample data to make inferences about the population using confidence

intervals (CI) and p-values.

44

Prescribed reading (text) 8B

Pierson J (2010). „Chapter 24: Data Analysis in Quantitative Research‟, in: Liamputtong P (ed.)

Research Methods in Health: Foundations for evidence-based practice, South Melbourne: Oxford

University Press, pp. 412-3.

Page 2: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 2

Optional Readings (online) 8C

Easton VJ & McColl JH (2004); „Statistics Glossary‟, STEPS (STatistical Education through

Problem Solving. University of Glasgow (Dept. of Statistics). Available from URL:

http://www.stats.gla.ac.uk/steps/glossary/paired_data.html#corrcoeff. Date accessed 23

February 2010.

Note: scroll down and check out the following sections: “Correlation Coefficient”, “Pearson's

Product Moment Correlation Coefficient”, and “Spearman Rank Correlation Coefficient”.

Hobart and William Smith Colleges (Geneva, New York, USA) (Department of Mathematics and

Computer Science). Mathbeans Project. (November 2005) “The Chi Square Statistic”. Available at

URL: http://math.hws.edu/javamath/ryan/ChiSquare.html . Date accessed 23 January 2010.

Statistics Tutorials (2008): Available at URL: http://www.stattutorials.com/p-value-

interpreting.htm l, date accessed 2 February 2009.

8.5 Mixed Methods Research

Studies

This section provides a brief introduction to Mixed Methods in health research. 76

Prescribed Reading (Text)

8D

Taket A (2010). „Chapter 20 – The Use of Mixed Methods in Health Research‟, in: Liampputtong P

(ed), Research Methods in Health: Foundations for evidence-based practice, South Melbourne:

Oxford University Press, pp. 332-340.

8.6 Topic 8 Movie I

Topic 8 Movie II

You will find a set of two labelled movie presentations designed to guide you through the

Topic 8 content.

8.7 To access Topic 8 Test, go to

Assessments folder on course

content page, and select Topic 8

Test folder. Follow the instructions.

Find out here about the test for Topic 8, to be posted on DSO as follows:

Opens: 30 August, 2010

Due: 5pm 1 October, 2010

81

Page 3: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 3

Topic 8.1: Introduction and Learning Objectives

Introduction

The aim of Topic 8 is for you have a basic understanding of the principles that underpin statistical analysis and interpretation. In

this unit, we do not expect you to be able to calculate the statistical formulae, but to be able to understand the information in health

research journals that publish research results.

Topic 8 introduces you to some important methods of data analysis and interpretation including:

a. Basic methods for summarising data through the use of descriptive statistics;

b. How to use good quality tables and graphs to effectively communicate with readers about the data;

c. The common measures of association including Relative Risk (RR) and Odds Ratio (OR), Correlation Coefficient (r), and Chi

Square (χ2);

d. The basic concepts of statistical inference and hypothesis testing where we analyse the sample data to make inferences about

the population using confidence intervals (CI) and p-values; and

e. We look briefly at the issue of causation that assists us to interpret the results from a research studies.

f. The applicability of mixed methods in health research.

Page 4: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 4

Learning Objectives

By the end of this topic you should –

1. Know and understand methods to summarise and present data from samples including:

a. Commonly used frequency distributions;

b. Measures of central tendency (mean, median and mode);

c. Measures of dispersion (range, percentiles, inter-quartile range (IQR) and standard deviation (SD;) and

d. Commonly used distribution patterns including normal and skewed

2. Understand and interpret some commonly used measures of association including: Relative Risk (RR) and Odds Ratio (OR),

Correlation Coefficient (r), and Chi Square (χ2);

3. Understand the rationale and logic of statistical inference and hypothesis testing through the use of confidence intervals (CI) and

p-values; and

4. Be aware of the principles of causation that underpins the interpretation of results from quantitative research studies.

5. Identify the strengths of Mixed Methods in health research.

Page 5: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 5

Topic 8.2: Descriptive Statistics – a way to summarise data

Key Terms (Check the Glossary)

Descriptive statistics

Frequency distributions;

Inference

Inter-quartile range (IQR)

Mean

Measures of Central Tendency;

Measures of Dispersion

Median

Mode

Normal distribution (or the bell-shaped curve)

Percentiles

Probability distributions (also called sampling distribution or

distribution patterns)

Quartiles

Range

Skewed distribution

Standard deviation (SD)

Variable

Page 6: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 6

8.2(1) Introduction

The aim of Topic 8 is for you have a basic understanding of the principles that underpin statistical analysis and interpretation. In

this unit, we do not expect you to be able to calculate the statistical formulae, but to be able to understand the information in health

research journals that publish research results. In this section of Topic 8, we will build on the material we presented in the Topic 7,

especially the section on Measurement.

Where are we in this course?

In this section of Topic 8 we will introduce you to common methods to summarise and present data from samples. We use descriptive

statistics as the first step in analysis and to communicate parts of the study results to the reader. The use of descriptive statistics and

good quality tables are an aspect of the final step of the research process, which we introduced to you in Topic 6.

How do you establish that there is a relationship between two variables? There are a number of ways, depending on the level of

measurement of the variables (i.e. the scales used to measure observations from the sample data) and whether the goal of the study is

descriptive or inferential (i.e., to make suppositions about the population from sample data).

If the goal is to be descriptive, we can use graphs or tables, depending on the level of measurement of the variables, as well as

many different summary statistics designed to quantify the size and direction of a relationship observed in the sample studied. This is the

focus of this section of Topic 8.

If the goal is to be inferential, then a hypothesis test (i.e. p-value) or confidence interval (CI) is used to estimate the likelihood that

the observed relationship in the sample can be generalised to the population from which the sample was drawn. We will discuss inference

and hypothesis testing in more detail in Section 8.4 of this topic.

Page 7: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 7

Descriptive statistics are indispensable in quantitative research. They are basic methods to summarise and present sample

data. Analysis of possible independent/dependent variable relationships is more meaningful if there is a firm understanding of how the

measurements of independent variables (IV) and dependent variables (DV) are distributed. A common practice in the first stages after

data collection is to describe the values encountered in the data. Descriptive statistics allow you to check for unusual, incorrect, or

unexpected values that caution, or encourage certain analyses. Descriptive statistics are used to describe the shape, central tendency and

variability in a set of data. Common descriptive statistics used to summarise sample data include:

Frequency distributions;

Measures of Central Tendency;

Measures of Dispersion; and

Probability distributions (also called sampling distribution or distribution patterns)

Reading:

Bonita R, Beaglehole R & Kjellström T (2006). „Chapter 4 – Basic biostatistics: concepts and tools‟, in: Basic Epidemiology, 2nd Edition,

World Health Organization, pp. 63-8. Available at URL: http://whqlibdoc.who.int/publications/2006/9241547073_eng.pdf. Date accessed

25 February 2010.

Page 8: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 8

8.2(2) Frequency distributions and presenting data in tables and graphs

Data that is not organised is overwhelming, and you cannot work out general trends until some order or structure is imposed.

Frequency distributions are the first and most basic descriptive statistic, and are a method of imposing some structure on numerical data.

More specifically, a frequency distribution is a systematic way of arranging numeric values from the lowest to the highest, together with a

count or percentage of the number of times each value occurred. So the distribution reflects a simple count of how frequently each value

of the variable occurred.

For example, we can present this data in a frequency table that shows the marks a cohort of university students received for their

exam, and the number or frequency that scored this mark. As you can see the marks are ranked from highest to lowest with the frequency

of students who received this mark.

Mark Frequency Percent

(%)

90-99 0 0

80-89 2 4

70-79 6 12

60-69 9 18

50-59 12 24

40-49 10 20

30-39 7 14

20-29 3 6

10-19 1 2

0-9 0 0

Total 50 100

Page 9: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 9

We can also present this data in a graph called a histogram. As you can see it shows us the same information, but in a different

way. Graphs have the advantage of being able to communicate a lot of information at once. At a glance you can see that more students

scored 50-59 and few students scored 80-89.

Another way to present this data in a visual way is using a frequency polygon.

Page 10: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 10

Again, both of these graphs show us the same information, but in different ways.

Distributions of data are sometimes described by their shape. For example, we might describe the shape of the data presented in a

graph or histogram as either symmetric or skewed.

Symmetric in shape indicates that if we were to fold the graph in half, the two halves of the frequency polygon would be superimposed.

If the peak of the graph is off-centre, with one tail longer than the other, the distribution is said to be skewed. If the longer tail is pointed

towards the right, the distribution is said to be positively skewed. If the longer tail points towards the left, the distribution is said to be

negatively skewed. (Go to p. 12 to see what these distributions look like.) We will talk more about distributions later in this topic.

8.2(3) Measures of Central Tendency

A measure of central tendency is another type of descriptive statistic for presenting interval and ratio scales or data from the

sample under study. The measures of central tendency are an index or summary numbers that describe the middle of a distribution of

sample data and the most commonly used descriptive statistics are the mean, median and mode.

For variables measured using interval and ratio scales, the distribution of the data might be of less interest to the researcher than

an overall summary of the scores. For example, we might be interested in the average blood pressure of patients with heart disease. We

can use the mean for this data.

Page 11: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 11

The mean gives the researcher an impression of the “average” or “typical” score or measurement in a distribution. The mean is

the arithmetic average. The mean is equal to the sum of all values divided by the number of participants. The mean is sensitive to extreme

values, also known as outliers. They are not appropriate for data that differs greatly from normal (bell shaped) distribution. If the mean

differs meaningfully from the median it suggests that the distribution is skewed (not normal).

Let‟s say we weighed 50 ten-year old primary school students from one primary school in Warrnambool. The following table shows

the students weights and frequency.

Using this example we can calculate the mean or average weight for ten-year old, primary school students at this particular school.

To do this, we add up all of the children‟s weights, which equals 1,480.5kg. We then divide this by the number of children we weighed

(50). So 1,480.5kg divided by 50 students = 29.61kg. Therefore, the mean weight of the ten-year old primary school students is 29.61kg.

Weight

(kg)

Frequency Percent (%)

25.6 1 2

26.9 3 6

27.8 4 8

28.3 6 12

29.5 11 22

30.1 9 18

30.9 8 16

31.2 6 12

31.7 2 4

1,480.5 50 100

Page 12: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 12

Another measure of central tendency is the mode. The mode is the value that occurs most frequently. The mode is useful for

numerical and categorical variables or data. Using the above example of ten-year old students‟ weights, we can see that 29.5kg was the

weight that occurred most frequently. Therefore, the mode is 29.5 kg because it occurred 11 times.

So the mode is the value that has the highest frequency, i.e., it occurs the most often in sample dataset. The mode is mainly used

for categorical variables, particularly when categories are not naturally ranked (nominal).

For example, from these numbers we can count how many times each of them occurs.

1, 3, 2, 3, 1, 1, 1, 3, 2, 3, 3, 1, 2, 2, 3, 1, 2, 3

We can see that the number 1 occurs six times, the number 2 occurs five times and the number 3 occurs seven times. So the

number that occurs most frequently in this set of numbers is 3, as it occurs seven times. Some datasets can have multiple modes, that is,

several values occur equally as frequently.

The last measure of central tendency is the median. The median is the middle value in a ranked list. That is, the median is the

point in the distribution above which and below which 50% of the cases fall. The median is often paired with other information about the

distribution, such as percentile and inter-quartile range. We will talk more about these when we look at measures of dispersion.

The median is not sensitive to extreme values or outliers. When the distribution is highly skewed the median provides a better

measure of central tendency than the mean as the mean is influenced by extreme values in the dataset. So the median is used to indicate

the “middle” value for a variable. With observations arranged in increasing or decreasing order, the median is defined as the middle

observation. If the number of observations is even, so that there is no middle observation, the median is defined as the average of the

two middle observations.

Page 13: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 13

For example, if we were to rank in order from lowest to highest first year students age we could work out the median age.

19, 20, 20, 21, 22, 24, 27, 27, 27, 34 (23)

Since there are 10 ages included, there is no middle number, therefore the median is the fifth observation plus the sixth

observation divided by two. So in this example, the median age would be 22+24 / 2 = 23. So the median age in this example is 23.

8.2(4) Measures of Dispersion

A measure of dispersion is a type of descriptive statistic used to present two sets of sample data with identical means. They

describe the spread of observations/data about the mean. Measures of dispersion are a way of presenting the data spread or dispersion

of data from a sample. We use measures of dispersion because for some sample data, measures of central tendency do not give a total

picture of the data spread. For example, two sets of data with identical means could be quite different from one and another. So we use

measures of dispersion to tell us how spread or dispersed the data are. So measures of dispersion indicate the spread or variety of

peoples‟ measurements in a distribution. If all the values are similar, the dispersion is low but if the there is a wide range of different

values the dispersion is high. Measures of dispersion are chiefly used for numerical variables. The most commonly used measures of

dispersion are:

Range

Percentile

Inter-quartile range (IQR) and

Standard deviation.

Page 14: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 14

Range is defined as the difference between the largest and smallest observations, or the lowest score and the highest score.

Range is the highest score minus the lowest score in a given distribution. If we go back and look at our example (on above) of the weight

of primary school children aged ten year, we see that the highest weight is 31.7kg and the lowest weight is 25.6kg. So the range is 31.7 -

25.6 = 6.1kg.

As range is determined by only two extreme observations in the dataset, the use of range is limited, because it tells us nothing

about how the data between these two extremes are spread.

Another measure of dispersion is the percentile. Percentiles divide the data into 100 equal parts with equal numbers of

observations in each part. Percentiles help us to provide information about the location of a score in respect to other scores. So a

percentile is the score at or below a certain percentage of scores lying in a distribution. For example, let‟s say we weighed four children

aged 10 as part of a school health check, as follows:

Amy 21kg

Rebecca 19.5kg

Anne 17kg and

Dawn 14kg

The following table shows the percentile weights for girls aged 10:

Percentile 3 10 30 50 70 90 97

Weight (kg) 16 17 19 20.5 22 24 27

So we can see that Amy‟s weight is very close to the 50th percentile. This is also the median weight. So from the definition of

median, it follows that half the population are lighter and half the population are heavier. From the definition of a percentile it follows

that 50% of the population are heavier than the 50th percentile and 50% are lighter. Rebecca‟s weight falls just above the 30th percentile

so 30% of the population of children her age are lighter than her so about 70% of the children are heavier than her. What about Dawn?

Page 15: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 15

Her weight falls just below the 3rd percentile indicating that only 3% of the population are lighter than her, and that most (97%) of

children are heavier than her.

We can then break this down further into quartiles. Quartiles are sets of values that divide the distribution into four parts such

that there are an equal number of observations in each part. The 25th percentile is exactly the first quartile, the 50th quartile is the

Median and the 75th percentile is the third quartile. The 2nd quartile (50th percentile) is associated with the median.

The inter-quartile range (IQR) is the difference between the first quartile and the third quartile. It is the companion to the

median. The IQR indicates the range of values spanned by the middle 50% of the observations, for example, 25% of observations to each

side of the median. The figure below shows a normal distribution or bell-shaped curve of a dataset. In this example we have ranked all of

the observations according to their score and:

Count 25% of the way through the list… report the “lower quartile” score; then

Count 75% of the way through the list… report the “upper quartile” score.

Page 16: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 16

You can then calculate the difference between the “upper quartile” score and the “lower quartile” score:

i.e., IQR = Upper quartile score (Q3) – lower quartile score (Q1)

Example: Let us now look at a real example of how researchers use descriptive statistics. In topic 6, we discussed different study

designs. We used an example of a randomised controlled trial that examined different vaccine injection techniques by Ipp etal (2007). In

that study, the authors used the median and the inter-quartile range (IQR) to describe the base-line characteristics of the participants in

their study.

[Reference: Ipp M, Taddio A, Sam J, Gladbach M & Parkin PC (2007). „Vaccine-related pain: randomised controlled trial of two injection techniques‟, Archives of

Disease in Childhood, 92: 1105-1108.]

Page 17: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 17

The standard deviation (SD) is another measure of dispersion that indicates the average distance or amount, by which each

observation differs from the mean. The SD is a descriptive statistic that is the most commonly used measure of dispersion or spread,

denoted by σ, in the population and SD or s in the sample. It can be used with the mean to describe the distribution of observations. The

statistical formula (in English) is the “square root of the average of the squared deviations of the observations from the mean”.

The SD is the most widely used measure of dispersion, particularly with interval and ratio level data. The SD is the companion to

the mean, and is an indicator of the average deviation of scores around the mean. So how do you calculate the SD? You do this by

calculating how different each score is from the mean. Ignore positive and negative differences - just record “how far” each score is from

the mean. Add up all of these “distances” and divide them by the number of observations. The formula for the SD looks like this (you are

not expected to learn or use this formula in this course):

The standard deviation has a close relationship with the normal distribution or bell-shaped curve, which we will now discuss.

Page 18: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 18

8.2(5) Probability Distributions (also called sampling distributions or distribution patterns)

In Topic 7 (Sampling in Quantitative Research) we introduced you to sampling distributions and sampling errors. You might like to

review that section again before reading the following information. Sampling distributions are also called probability distributions.

What is a probability distribution? Imagine if we repeatedly draw a number of random samples from the same population. This

process will reveal an important property. As we pool the means of each of the separate samples from the population, we may observe

their average and spread – collectively called the sampling distribution or probability distribution. This distribution is centred (or

approximates), the population‟s true mean. The sampling distribution of a statistic is defined as the distribution of all possible values that

can be taken by that statistic. It is formed by calculating that statistic for each of a large number of samples of the same size that are

drawn randomly from the population. So probability distributions are the final method for presenting the data from a sample.

There are a number of commonly used probability distributions are important in statistics, including the binomial distribution, the

Poisson distribution and the Normal (Gaussian) distribution (or the bell-shaped curve).

The binomial and Poisson distributions display discrete or binomial variable data (i.e. scales that measure “Yes” or “No” type data

or integer1 data). They can only use integer values. Binomial distributions are used in research into certain medical conditions like the

probability of inheriting certain genetic conditions or the reactions to medications. Poisson distributions are used in relation to research

that examines the probability of the occurrence of very rare medical conditions. We will not be discussing these two distributions in this

course. We will now look at the most commonly used distribution in statistics called the normal or bell-shaped curve.

The normal curve is continuous and can display any value from a dataset, it is a smooth, bell-shaped curve that is symmetric

about the mean of the distribution. Note the symbol for the sample mean is . The normal distribution is also called the standard normal

distribution. It is defined by a formula in which there are only two sample variables: the mean and standard deviation (SD).

1

Integers are whole-valued positive or negative numbers or 0. The integers are generated from the set of counting numbers 1, 2, 3, . . . and the operation of subtraction. An integer is member of the set of positive whole numbers {1, 2, 3, . . . }, negative whole numbers {−1, −2, −3, . . . }, and zero {0}.

Page 19: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 19

The normal distribution curve is a useful way of “seeing” how spread your data is and where most of the cases lie. The normal

distribution is central to the analysis of data because many commonly employed statistical methods are based on this probability

distribution. The normal curve assumes that the population data, from which the sample data are drawn, are normally distributed.

This is what a normal distribution looks like. As you can see it does indeed look like a bell and that if you were to fold it in half it

would superimpose.

The next graph is a histogram with a normal distribution curve overlaid. In this example, this histogram shows us the amount of

social support experienced by middle-aged women. You can see that the social support scores are somewhat normally distributed as the

normal distribution curve is symmetrical. You can also see the mean score was 36.21 with 5.941 SD.

Page 20: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 20

If the data were not normally distributed, what might the curve look like? You will remember early on that we briefly mentioned

skewed distributions. The figures (below) present both a symmetrical or normal distribution, and a skewed distribution.

You will notice that with the normal distribution, the mean, mode and median are all in the same place, as follows:

Page 21: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 21

But this is not the case with the skewed distribution. If the peak of the graph is off-centre and one tail is longer than the other, the

distribution is said to be skewed.

If the longer tail is pointed towards the right, the distribution is said to be positively skewed.

Page 22: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 22

But if the longer tail points towards the left, the distribution is said to be negatively skewed.

Because the normal distribution is a probability distribution, the area under the curve is equal to 1. Because it is a symmetrical

distribution, 50% the area is on the left of the mean and 50% is on the right. Because the area under the curve is equal to 1, we can use

the curve for calculating probabilities. It is beyond the scope of this course to teach you all about probability theory but you should see

the importance of the normal curve when we explain hypothesis testing later in this topic.

8.2(7) Summary

This concludes the section of Topic 8 on descriptive statistics. As you can see, descriptive statistics help us to describe our data

using measures of central tendency and dispersion. We can use tables and graphs to help us to communicate the results from research.

We also discussed the important statistical theory of probability distributions. As you will see in the last two sections of Topic 8, the

normal distribution plays an important role in helping us to decide which statistical tests we should use to analyse our sample data and

how we interpret the results from the sample back to the study population.

The next section in this topic looks at measuring associations between variables.

Page 23: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 23

Topic 8.3: Measures of Association

Key Terms (Check the Glossary)

Binary scales (and binary data)

Categorical variables

Continuous variables

Correlation Coefficient (r)

Inference

Interval scale (and interval data)

Measures of association

Nominal scale (and nominal data)

Null hypothesis

Numerical scale (and numerical data)

Odds Ratio (OR)

Ordinal scale (and ordinal data)

Relative Risk (RR)

Statistic

Variables

Variables (independent, dependent and confounding)

Page 24: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 24

8.3(1) Introduction

The aim of Topic 8 is for you have a basic understanding of the principles that underpin statistical analysis and interpretation. In

this unit, we do not expect you to be able to calculate the statistical formulae, but to be able to understand the information in health

research journals that publish research results.

Where are we in this course?

In this section of Topic 8 we will introduce you the common measures of association including Relative Risk (RR) and Odds Ratio (OR) and

Correlation Coefficient (r). These are statistical tests that allow us to make inferences from a sample to the population from which the

sample was drawn. Statistical tests like these are examples of some quantitative methods of analysis and interpretation, which is the final

step of the research process, which we introduced to you in Topic 6.

If you review the Glossary, you will notice many terms that you have already encountered in Topic 7. In this section of Topic 8, we will be

building on the information presented in Topics 5, 6 and 7. It is especially important that you understand measurement and descriptive

statistics in quantitative studies because we will be using many of these concepts to explain measures of association.

What are “Measures of association”? Measures of association are statistical techniques conducted on sample data to determine

the strengths the relationships between exposures and outcomes. The measure of association used depends on the study design and the

type of variables analysed in the sample data. Measures of association include the relative risk (RR), odds ratio (OR) and Correlation

Coefficient (r).

The associations between variables exist when changed values in one variable are linked to changes in the values of another

variable. In most quantitative research studies the association between the independent variable (or study variable) and the dependent

variable (or outcome variable) is assessed. We usually want to measure this to help us decide causation. We discussed this in Topic 7.

One of the common methods of evaluating an association between two variables is through correlation procedures. For example,

is there an association between peoples‟ height and weight? As children grow and get taller, does their weight increase? If we looked at a

range of different aged children we would find that generally, yes, height and weight change together in the same direction.

Page 25: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 25

In a population of adults is there an association between height and weight? Broadly speaking, yes, but there are many exceptions.

So you would say that the association is not as strong as it is in growing children.

How do you establish that there is an association between two variables? There are a number of ways, depending on the level of

measurement of the variables of the type of study design.

If the goal is to be descriptive you can use graphs or tables, depending on the level of measurement of the variables, as well as

many different summary statistics designed to quantify the size and direction of an association observed in the sample studied.

Descriptive analysis does not compare data between samples. We discussed this in the previous section of this topic.

If the goal is to establish causation, then a hypothesis test or confidence interval is used to estimate the likelihood that the

observed association is also generalisable to the population. This is the focus on this section of Topic 8.

The most important point to be aware of at this stage is that showing the existence of an association tells you nothing about how

or why there is a association between the variables being studied. In other words finding that there is an association between two

variables does not give you any information about the link between the two variables. Correlation or association does not equal

causation. It is really tempting to see the “why” that links two variables, and correlation only tells you that they are associated - but not

why. That needs further research.

Correlation is not the only method of testing the relationship between an independent and dependent variable in quantitative

studies. Different methods are used in epidemiological studies. In Topic 5 we discussed how epidemiology studies disease and health

rates in populations (e.g., measuring incidence and prevalence). This is often the first step into investigating the causes of specific health

outcomes in populations.

Epidemiological data often throw up research questions that can then be tested using more rigorous quantitative study designs

that investigate disease causation (e.g. cohort and case-control studies). These types of studies compare the occurrence of disease in two

or more groups of people whose exposures to the disease risk have differed. An unexposed group is often used as the control group and

Page 26: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 26

compared to the exposed group. We examined this in detail in Topic 6. In these epidemiological studies the Relative Risk (RR) is the used

to test the existence and strength of the risk of the occurrence of the disease among the exposed group compared to the unexposed

group. RR is used mainly on data from cohort studies. Odds Ratio (OR) is a similar technique to RR but is used mainly to analyse data

from case-control studies. The OR gives an approximation to the RR, and is used when RR is inappropriate (e.g., when examining rare

diseases in the population). Both OR and RR are also accompanied by Confidence Intervals (CI) which allows us to test the research

hypothesis in epidemiological studies. We will be discussing CI in more detail in the final section of this topic.

How do we decide which measure of association we should use in our research studies? Consider the link between smoking and

illness. It seems a plausible idea that there is a link. How would we go about assessing it quantitatively? It depends on how we measured

the variables. Are they measured as categorical or continuous variables? Let‟s look at an example:

Independent Variable - Smoking could be measured as:

(a) “yes/no”, which is categorical on a nominal scale;

(b) None, light, heavy which is also categorical, but on an ordinal scale; or

(c) Number of cigarettes per day is continuous on a ratio scale.

Dependent Variable - Illness could be measured as:

(a) “lung cancer/no lung cancer” which is categorical on a nominal scale;

(b) Normal, mild problems, serious problems is also categorical but on an ordinal scale; or

(c) Lung capacity or oxygen saturation in the blood would be continuous on a ratio scale.

How we go about assessing the existence and strength of an association between smoking and illness would depend on the

research question, how the variables were measured in the study and the chosen study design. As you can imagine there are a number of

combinations of scales of measurement possible and as a result different measures of association.

We are now going to discuss the commonly used measures of association in quantitative research studies including:

Relative Risk (RR);

Page 27: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 27

Odds Ratio (OR); and

Correlation Coefficient (r).

8.3(2) Relative Risk (RR)

The Relative Risk (RR) is a measure of association between variables that is often calculated from data in cohort studies and RCTs.

RR is sometimes also called risk ratio. RR is the ratio of risk of the condition of interest (e.g. disease) in the treatment (or exposed) group,

to the risk in the control group. RR can also be calculated using data from cohort studies where it is defined as the ratio or the risk of a

given disease, in the exposed or at-risk group, compared to the risk of the disease in unexposed group. In cohort studies, data examining

disease risk the incidence of the disease is used to describe the disease patterns and then the RR is calculated in further analysis. {You will

remember we looked at incidence and prevalence in Topic 5. Incidence is the number of newly diagnosed cases of a disease during a

specific time period.}

Example of RR calculation:

Let us look at a worked example of how RR is calculated from data from a cohort study that examined the risk of lung cancer from

smoking2. You are not expected to learn how to complete this calculation in any of your assessments for HBS108, however this example

hopefully will help you gain an understanding of the concept of RR as an important measure of association that is often used in cohort

studies.

We will calculate the risk of getting lung cancer (disease) from smoking (risk factor) by comparing a sample of 10,000 people who

smoke (exposed) with a sample of 10,000 people who do not smoke. In this cohort study, all 20,000 people in the study are followed for

20 years and the incidence of lung cancer is measured in both and a ratio is calculated.

2 This example is adapted from The University of Cincinnati (The Ohio State University). (2010) NetWellness Consumer Health Information: “What’s the Risk It’s all Relative”. Available from URL: http://www.netwellness.org/healthtopics/help/risk.cfm. Date accessed 26 February 2010.

Page 28: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 28

The study results show:

At the end of 20 years, let's say 1,500 of the smokers developed lung cancer.

Of the 10,000 non-smokers, 100 got cancer in the 20-year follow-up.

The formula for RR is: a/(a+b) (the risk of disease in the exposed group)

c/(c+d) (the risk of disease in the un-exposed group)

We can now use a 2X2 table to help us to calculate RR with this formula by using the study data:

2 X 2 table to calculate RR from a cohort study

Exposure to

risk factor

Disease status

Totals

Smoking

Present Absent

a

b

Total

exposed

(a+b)

Non-

smoking

c

d

Total not

exposed

(c+d)

Totals

Total with

disease

(a+c)

Total

without

disease

(b+d)

Total in

Sample

(a+b+c+d)

Page 29: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 29

Now we insert the data from the study into the table as follows:

Exposure to

risk factor

Disease status

Totals

Smoking

Present Absent

1,500

(a)

8,500

(b)

10,000

(a+b)

Non-

smoking

100

(c)

9,900

(d)

10,000

(c+d)

Totals

1600

(a+c)

8,400

(b+d)

20,000

(a+b+c+d)

To calculate the 20-year risk of lung cancer among smokers, we divide 1,500 lung cancer cases by the total of 10,000 smokers,

and get 0.15, or 15 per cent. So the 20-year risk of lung cancer among smokers would be 15 per cent.

To calculate the risk of lung cancer among non-smokers over the 20 years, we divide 100 lung cancer cases by the total of

10,000 non-smokers and get 0.01 or 1 per cent. This gives a "20-year risk" of lung cancer among non-smokers of 1 per cent.

We then relate these two risks to each other to give an indication of the effect of smoking on the risk of lung cancer.

Page 30: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 30

These are steps in the RR calculation based on our original formula:

a/(a+b) = 1,500/10,000 = 0.15 (or 15%) = RR = 15

c/(c+d) 100/10,000 0.01 (or 1%)

The conclusion is that the smokers were 15 times more likely to develop lung cancer than the non-smokers in the 20-year follow-up study3.

This is how the RR is calculated from cohort study data.

Interpretation of RR

The size of the RR is an effective index of the strength of a causal relationship between the exposure and the disease being

investigated (given that all other aspects of the study had been implemented to limit bias etc). The rule is that the closer the RR is to 1,

the risk reduces and the higher the RR is (over 1) the greater is the risk of disease from the exposure. When the RR is less than 1, the RR

is said to be protective or positive in terms of the disease and the exposure. This is best illustrated by the table below, where there are

some examples of RR of different sizes and the interpretation of the result:

RR Interpretation using an example

RR = 1 Exposure does not affect the outcome of disease (i.e. no

association). An example is a study examining autism and measles

vaccine. The study concluded there was no risk of autism from

measles vaccination (RR was 1).

RR = 0.5 Whooping cough (or pertussis) vaccine is half as likely to be

associated with the disease. In other words children vaccinated with

pertussis vaccine are 50% LESS likely to be at risk from pertussis

disease

3 We have used hypothetical data in our example but the results are adapted from a famous cohort study called the British Doctors Study British conducted between

1951 and 2001. In 1956 Richard Doll and Austin Bradford Hill published rigorous evidence from the study demonstrating that tobacco smoking significantly increased the risk of lung cancer.

Page 31: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 31

RR = 0.05 Children vaccinated with measles vaccine are 20 times LESS likely to

be infected with measles.

RR = 1.8 People exposed to blue asbestos have an 80% increased risk of

respiratory cancer than those not exposed (i.e., nearly twice and

high).

RR = 15 People who smoke are 15 times more at risk from lung cancer than

those that do not smoke

In summary: you can see from these examples that a RR of 1 is interpreted as no risk; if the RR is >1, there is an increased risk but

if the RR is <1, the exposure is protective (as in the two vaccine examples). So this means the RR is a very useful measure of association

because it can provide information on “no risk”, “increased risk” or that exposure is protective (less than 1). The figure below is a visual

representation of the RR examples from the table (note the figure is not to scale):

The next step in the analysis of any study data is to ask if the RR is a statistically significant4 result. In other words could this result

have happened by chance, or is it a “real” effect? To determine whether or not the RR of 15 is “real” or not, we can use confidence

intervals (CI) to examine the statistical significance. The CI will not only tell us whether the results might be a chance effect or not, but

4 The term “significance” has a specific statistical meaning in this context. We mean “level of significance” which is the probability or incorrectly rejecting the null hypothesis in the test of hypothesis within a study.

Page 32: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 32

also illustrate how small or how large the true effect size might be. We will return to this example in section 5 of this topic when we

explain statistical inference and hypothesis testing (i.e. confidence intervals and p-values).

At this stage you need to note that in all study results the researchers use a measure of association (like RR), to illustrate the

association between the independent variable (i.e. smoking in this example) and the dependent variable (i.e. lung cancer in this example)

and also include the information that determines the significance of that association (i.e. confidence intervals or p-values).

We will now examine Odds Ratio. The OR is very similar to RR and is used in studies where the use of RR is statistically

inappropriate.

8.3(3) Odds Ratio (OR)

The OR is the ratio of the odds of exposure among cases to the odds of exposure among the controls. OR is a measure of

association between variables that is often calculated from data from case-control studies. It is an estimate or approximation of the

relative risk that is usually calculated from data collected in during a case control study. In other words, it is a simple calculation that

yields an approximate value for relative risk of the exposure that has been examined. Why use OR rather than RR? There are a number of

reasons:

Because RR cannot be calculated in case-control studies, because these studies usually examine rare and uncommon diseases;

Unlike a cohort study, we do not know the risk for exposure in the population. This is because the starting point of a case-

control study is a set of cases (i.e. with the condition or disease of interest) and we compare these cases with a set of controls (a

sample of people without the disease but with similar characteristics), and look back in time to find the exposure.

Another reason why we need to understand the OR is that this measure of association is most often calculated in case-control

studies. It can also be used in cohort studies but is calculated using multiple logistic regression analysis5 rather than the steps

we used to calculate RR (above). We will not be examining multiple logistical regression models in this course however.

5

Multiple Logical Regression Analysis is a procedure used when there are three or more variables (measures), and the level of measurement is interval ratio. It assesses the degree to which scores for a subset of these variables predict scores for another variable in the set (e.g. a variable that might be a confounder). This method is also referred to as modelling.

Page 33: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 33

Example of an OR calculation:

Let‟s us look at an example of how OR is calculated in a case control study. We will use the same data we used in the above

example of a RR calculation

In this smoking example, we had to wait for 20 years before we could analyse the study results and calculate the RR. However if a

disease was very rare, it would take even longer than 20 years to accumulate enough disease cases to calculate a statistically valid RR.

Case-control studies are used to assemble the required number of cases more quickly. We start with a group of people with the

condition of interest (e.g. cases of lung cancer) and then measure the various risk factors that might be associated with the disease

(smoking and other risk factors). We then calculate the odds of smoking among the cases (i.e. those with lung cancer) and the odds of

smoking among the controls (i.e. those who do not have lung cancer). This will give us the OR, which is not the same as a RR, but is a

statistically valid estimate.

The formula for OR is: (ratio of the odds of exposure among the cases)

(ratio of the odds of exposure among the controls)

Another way to state this is: OR = a/b

c/d

Where:

a = The number of individuals who are have the disease (cases with lung cancer) and are exposed to the risk factor

(smoking);

b = The number of individuals who do not have the disease (controls) but are exposed to the risk factor (smoking);

c = The number of individuals who are not exposed to the risk factor (smoking) and have the disease (cases of lung cancer);

and

d = The number of individuals who do not have the disease (lung cancer) and are also not exposed to the risk (smoking).

Page 34: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 34

We go through the same steps to calculate OR, by using a 2X2 table:

2 X 2 table to calculate OR for smoking and lung cancer

Exposure to

risk factor

Disease status

Totals

Smokers

Lung cancer

cases

Healthy

controls

a

b

Total

exposed

(a+b)

Non-smokers

c

d

Total not

exposed

(c+d)

Totals

Total with

disease

(a+c)

Total

without

disease

(b+d)

Total in

Sample

(a+b+c+d)

Now we insert the data from the study into the table as follows:

Page 35: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 35

Exposure to

risk factor

Disease status

Totals

Smokers

Lung cancer

cases

Healthy

controls

(no-lung

cancer)

1,500

(a)

8,500

(b)

10,000

(a+b)

Non-smokers

100

(c)

9,900

(d)

10,000

(c+d)

Totals

1,600

(a+c)

18,400

(b+d)

20,000

(a+b+c+d)

There are 1,600 cases of lung cancer and 18,400 healthy controls (no lung cancer). Note that it is normally impossible to recruit

so many controls in case-control studies but let‟s not worry about that for the purpose of this example.

Using a questionnaire, we ask all participants (cases and controls) about theirsmoking status and find 1,500 of a total of all

1,600 lung cancer cases had smoked while 8,500 of the 18,400 healthy controls (no lung cancer) had smoked.

We also found 100 of the non-smokers had lung cancer from all 1,600 cases of lung cancer.

Now let us use the data from the 2X2 to complete the calculation for OR as follows:

Page 36: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 36

OR = a/b

c/d

That is: 1500/8,500 = 0.176 = OR is 17.5

100/9,900 0.01

Interpretation of OR result:

What does this result mean? The interpretation is that the odds of smoking among the lung cancer group (cases) are more than 17

times higher than the odds of smoking among the group with no lung cancer (controls). We said at the beginning of this section that the

OR is an approximation to the relative risk (i.e. the RR). Now look back at our RR calculation for this same study. The RR was 15 compared

to the OR for the same study data (OR was 17). This is very close. Remember that the OR is used to calculate risk from case-control

studies but if the study is done well (bias is limited as much as possible and the disease is rare), the OR is a valid measure of association

that approximates the RR from a cohort study.

We interpret the OR in the same way as we did the RR: the larger the OR, the stronger the association between the disease and the

exposure. Confidence Intervals are also used in the same way with OR to provide information about the precision of the association. We

will discuss this again in the final section of this topic.

We will now discuss the next measure of association used in quantitative research studies: the correlation coefficient (r).

Page 37: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 37

8.3(4) Correlation Coefficient (r) to measure the association between two continuous variables

Readings

Read the section on Correlation Coefficients in Pierson J (2010). „Chapter 24: Data Analysis in Quantitative Research‟, in: Liamputtong P (ed.)

Research Methods in Health: Foundations for evidence-based practice, South Melbourne: Oxford University Press, pp. 412-3.

Optional online resource: Easton VJ & McColl JH (2004); „Statistics Glossary‟, STEPS (STatistical Education through Problem Solving. University of

Glasgow (Dept. of Statistics). Available from URL: http://www.stats.gla.ac.uk/steps/glossary/paired_data.html#corrcoeff. Date accessed 23 February

2010.

What you need to know: You should understand the principle of correlation coefficients and how to interpret this statistic if you read it in

the results section of a journal article. You do not need to know the difference between the three methods of correlation coefficients or

how to calculate the formula.

Correlation is a statistical measurement of the association between two variables. A correlation coefficient is the how the degrees

or strength of the association is measured and is symbolised by the letter “r”.

Possible correlations range from +1 to –1:

A zero correlation indicates that there is no association between the variables.

A correlation of –1 indicates a perfect negative correlation, meaning that as one variable goes up, the other goes down.

A correlation of +1 indicates a perfect positive correlation, meaning that both variables move in the same direction together.

Page 38: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 38

There are a number of different correlation coefficients that might be appropriate, depending on the kinds of variables being

studied (e.g. Pearson's Product Moment Correlation Coefficient and Spearman Rank Correlation Coefficient). However we will just be

introducing the general concepts that underpin correlation in general in this course. Let us now look at this in more detail.

Correlation measures the degree to which values for two measures move together in a synchronised manner. As mentioned above,

the farther away from zero and the closer to one (moving together) or to minus 1 (moving opposite each other) the more synchronised

they are. The closer to zero, the more independent they are, but the closer they get to one the more dependent they are.

Correlations have two aspects: strength and direction. Most of the time we are interested in knowing both. It is important to

remember that correlation by itself does not imply causation. Rather, correlations assess the strength of associations.

Example 1:

This scatterplot below shows us the relationship between per cent body fat and age among 18 normal adults aged 23 to 60 years.

We can see that there appears to be some association between the two variables: There is a tendency for the older people to have a

higher percentage of body fat.

Page 39: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 39

The correlation coefficient (r) measures the degree of linearity or “straight-line” association between the values of the two

variables. The correlation coefficient is a value from minus one to positive one.

The correlation between two variables is positive if higher values of one variable are associated with higher values of another

variable. The correlation is negative if lower values of one variable are associated with lower values of another variable.

Here are two hypothetical examples of correlation coefficients between age and percent body fat. They are 1 or -1. The scatterplot

to the left is a case of the correlation coefficient equalling one, which means that as age increases so does the percentage of body fat.

This would be a strong, positive association.

The scatterplot on the right however, is a case of the correlation coefficient equalling minus one, which means that as age

increases the percentage of body fat decreases. This is a strong, negative association.

Page 40: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 40

Similarly, these two scatterplots show correlations of varying strength and direction. In this example, he one on the left shows a

correlation of 0.15, suggesting a very weak or low positive correlation and the one on the right shows a correlation of -0.449, indicating a

moderate correlation.

Page 41: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 41

Rule about strength of correlation:

In general a correlation of less than 0.4 is considered to be a low or weak correlation suggesting that the correlation exists but is

not very strong. A correlation greater than 0.4 but less than 0.7 is considered to be a moderate correlation, which suggests there is a

substantial relationship between the two variables. A correlation of greater than 0.7 is considered to be strong and indicates a very high

correlation between the two variables of interest.

So what is “r”? “r” is a measure of the scatter of the points around an underlying linear trend: the greater the spread of points the

lower the correlation. So a correlation of zero (where r = 0) indicates that there is no linear relationship between the values of the two

variables. They are not correlated. As you can see in this scatterplot, there is no relationship between the two variables.

If we investigate the association between maternal weight and foetal weight of different species of mammals we find that r =

0.985. This is a very strong positive correlation.

Page 42: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 42

The correlation coefficient can be calculated from most data sets of continuous variables. However, there may be restrictions on

the validity of the associations observed depending on whether certain assumptions are met. These assumptions include whether or not

the data are normally distributed and the random selection of variables.

Example 2:

Here is another example, showing the correlation of systolic blood pressure measured by a doctor against daytime ambulatory

systolic pressure showing, as expected, good or moderate correlation (r= 0.46).

Page 43: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 43

In summary, correlation is a measure of the scatter of the points around an underlying linear trend: the greater the spread of

points the lower the correlation. Correlation is denoted by the correlation coefficient „r‟ and r is a value ranging from -1 to +1.

8.3(4) Summary:

Now we have looked at how to measure the association between continuous variables and the measures of association used in

cohort and case-control studies (RR and OR).

The final step in the research process is to work out if the strength of this association is statistically significant and how we can

determine whether or not to accept the null-hypothesis. In the next section of this topic we discuss the important concepts of statistical

inference and hypothesis testing by using p-values and confidence intervals.

Remember that in this course, we do not expect you to be able to calculate the statistical formulae, but to be able to understand

the information in health research journals that publish research results. You should understand the principles that underpin the measures

of association and be able to interpret the results presented in the journal articles related to your chosen profession.

Page 44: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 44

Topic 8.4: Statistical inference and hypothesis testing: Confidence Intervals and P-

Values

Key terms (see Glossary)

Alternative hypothesis (H1)

Causality

Chi square test (X2)

Clinical Significance

Confidence intervals (CI)

Inference

Generalisability (see external validity and study validity)

Hypothesis

Hypothesis testing

Null hypothesis (H0)

Level of significance (also see significance and statistical significance)

P-values

Significance

Sampling errors (or standard errors)

Statistical inference

Study validity

8.4(1) Introduction

The aim of Topic 8 is for you have a basic understanding of the principles that underpin statistical analysis and interpretation. In

this unit, we do not expect you to be able to calculate the statistical formulae, but to be able to understand the information in health

research journals that publish research results.

Page 45: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 45

This section of Topic 8 introduces you to the final aspects of data analysis and interpretation including:

a. The basic concepts of statistical inference and hypothesis testing where we analyse the sample data to make inferences about the

population using Chi-Square (X2), confidence intervals (CI) and p-values; and

b. The issue of causation that assists us to interpret the results from a research studies.

8.4(2) Inference and hypothesis testing

Inference is the process of logical reasoning that combines observed phenomena with accepted truths in order to formulate

generalisable statements. It is also called inductive reasoning. Statistical Inference makes use of information from a sample to draw

conclusions (inferences) about the population from which the sample was taken. Statistical inference applies to this process to data sets

with calculated degrees of uncertainty. We are now at this stage in our explanation of the quantitative research study design. Once we

have established that there is (or is not) an association between the variables we are interested in (independent and dependent), we have

to test this association to find out if this is a significant association. Remember the whole purpose of statistics is to use data from a

sample to make inferences about the whole population from which the sample was drawn because we are usually unable to conduct our

research on the whole population. Simply put, we want to know the answer to this question: “are the results from our analysis of this

sample data, true (or significant) for the whole population?”

Significance has a specific meaning in research statistics and it come in two varieties:

i. Statistical significance: when the p-value is small enough to reject the null hypothesis of no effect; and

ii. Clinical importance: when the effect size is large enough to be potentially considered worthwhile by patients. We will mainly

concentrate on statistical significance in this topic.

Page 46: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 46

The alternative hypothesis is a tentative theory or supposition provisionally adopted to explain certain facts, and to guide the practical

steps we will use to conduct the research study. It is symbolised by H1. The alternate hypothesis is stated at the beginning of the research

process. In research the hypothesis is statistically tested and potentially refuted in the analysis/interpretation step. In quantitative

research we never state the hypothesis as an affirmative statement but it is always stated in the negative and is called the Null

Hypothesis. This is a matter of some importance but it is beyond the scope of this course to provide a full explanation on why this

approach is taken.

The Null Hypothesis typically proposes a general or default position, such as that “there is no relationship between two quantities” or if

used in the various study designs we have looked at, the Null Hypothesis might look like these examples:

1. That “there is no difference between a treatment and the control group” (i.e. in an RCT); OR

2. That “there is no difference between the exposed and unexposed group” (i.e. in a cohort study); OR

3. That “there is no difference between the cases and control groups” (i.e. in a case control study).

In our earlier example of a cohort study looking at the link between smoking and lung cancer, the Null Hypothesis might be stated as

follows: “There is no difference in terms of rates of lung cancer, between the group that smoked (the exposed group) and those that did

not smoke (the non-exposed group). Our cohort study is designed to REFUTE this Null Hypothesis and if we could do this, it would mean

we could accept the alternative hypothesis: that that there IS an actual difference between the group that smoked and those that did

not. Before we could make this claim however, we must apply a test of significance.

Applying the tests of significance is a crucial step in hypothesis testing that is the final step in the research process. Remember the flow

chart called “Steps in the Research Process” which we introduced in section 6.3? We are now at that point where we need to decide

whether the hypothesis is correct. To do this we apply a test of significance (e.g. p-value) that includes deciding on the level of

significance. This is the probability of incorrectly rejecting the null hypothesis in our test of the hypothesis. It is also called the P-value.

We interpret that some result (i.e. a measure of association) is statistically significant when a result that would occur by chance (say one

time in 20), and therefore has a P-value less than or equal to, 0.05. If this occurs, then we can reject the null hypothesis. This process of

Page 47: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 47

applying a test of significance is also called hypothesis testing that results in a decision to reject or not, the Null Hypothesis which is

symbolised as H0. Pictorially, this process looks like this:

Traditionally, researchers distinguish between not rejecting and accepting the null hypothesis because there is an understanding that a better

study may be designed in which the null hypothesis will be rejected. Therefore researchers do not formally state that they “accept the alternative

hypothesis (H1)” from the current evidence but instead, they state, “that the H0 cannot be rejected”. In other words they would say, “we reject the null

hypothesis” OR “we reject that there is no difference between the IV and the DV”. It is important to understand this so you have interpreted the

meaning of a significant or non-significant statistical test of a study hypothesis.

Now let us look at the three tests of significance that are sometimes called measures of precision: Chi Square, P values and Confidence Intervals (CI).

Page 48: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 48

8.4(3) Chi Square (χ2) test

Readings

Read the section on Chi-Squared Tests in Pierson, J. (2010) in textbook. Pages 411-412

Optional online resource: Hobart and William Smith Colleges (Geneva, New York, USA) (Department of Mathematics and

Computer Science). Mathbeans Project. (November 2005) “The Chi Square Statistic”. Accessed 23 January 2010 at:

http://math.hws.edu/javamath/ryan/ChiSquare.html

The Chi-square is a statistical test used to determine whether two or more sets of data or populations differ significantly from one another. The

Chi-square test is based on the comparison of observed (sample) data to see if that data significantly differs from the population from which it was

drawn. The chi-squared mathematical probability distribution is used to calculate the test. We use this test for analysing categorical data. It is a very

commonly used statistical technique in medical research, arising when data are categorised into mutually exclusive groups. For categorical variables

we need to use the Chi-square test before we can apply a p-value. Note that we examined categorical variables in Topic 7.

The simplest type of data collected from observations of individuals within a sample, is the allocation of the data to one of only two possible

categories (called binary data: “yes” or “no”). Often these relate to the presence or absence of some attribute. You might remember from Topic 7

some examples of binary data:

a. Male/female;

b. Pregnant/non pregnant;

c. Hypertensive/ non-hypertensive; and

d. Smoker/non smoker.

Page 49: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 49

Example 1:

Consider an RCT comparing a drug with placebo for nausea caused by chemotherapy among cancer patients. In this case we have two independent

groups:

Group 1 (the study group): Those who received the drug

Group 2 (the control group): Those who did not receive the drug

Page 50: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 50

The research question might be: Is there a significant difference in the proportion of people with nausea in each group?

If the drug made no difference, we could reasonably expect the proportion with nausea in each group would be the same. To see if this is the

case we use the “chi squared test” (pronounced ki-square). The chi squared statistical test is denoted as (χ2).

It tests whether there is a statistical difference between the values observed (in our sample) and those expected (in the population from which

the sample was drawn) if the drug made no difference.

We can present the results in tabular form. Such a table is called a “contingency table” or more commonly a 2X2 table that we used for RR and

OR calculations earlier in Topic 8.

The usual way to pose the question is in the form of a hypothesis (or alternatively a “null” hypothesis). We then will use the χ2 statistic to test

the (null) hypothesis.

Page 51: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 51

From the example, we need to know whether: “the proportion of people who exhibit nausea when the drug is given (p1) is the same as the

proportion of people who exhibit nausea when the placebo is given (p2)”.

The “null” hypothesis can be expressed as: If p1 is the proportion of people who are given the drug and who exhibit nausea; p2 is proportion

of people who are given the placebo and exhibit nausea then p1 = p2 (i.e. no difference in proportions).

If the drug was ineffective, we could reasonably assume that the proportion of people with nausea who were given the drug to be the same as

the proportion with nausea who were given the placebo. If the proportions were the same it is possible to work out how many should be in each cell

of the table i.e. the counts that would be expected if the proportions were the same.

For those given the drug…

Page 52: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 52

Overall the proportion of patients given the drug was 50/100 (= 50%). Therefore, if the drug were ineffective, we would expect that 50% of

those getting nausea would be in the drug group. That is, of the 19 patients who got nausea, we would expect 50% (= 9.5) to be in the drug group.

Here we have observed that 4 people in the drug group developed nausea. The difference between observed and expected values in both the

drug (treatment or study group) and placebo (control) groups is how the chi-square is calculated. The formula for the chi-squared statistic is:

χ2 = (Observed frequency – Expected frequency)2

Expected frequency

Note: you are not expected to remember or use this formula in this course. It is important however to understand the principle that underlies

the use of the Chi-squared test and how to interpret it in the results of a research article.

The chi square test is a statistic that compares the expected value if there is no difference in effect with what was actually observed.

Small values of the statistic indicate that all cells are close to what one would expect if the proportions were the same, suggesting that the

proportions are the same.

Large values of the statistic indicate that at least one of the cells is quite different to what was expected, suggesting that the proportions are not the

same.

The table below is from the output of a statistical software program called SPSS and shows the observed versus the expected counts.

Page 53: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 53

For this example, the χ2 test is computed as:

So the final result for the χ2 for this example is: χ2 = 7.854

Notice that chi-square statistic would have been zero if the observed frequencies had been exactly equal to the expected result in the

population.

Page 54: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 54

Interpretation of Chi-square results:

But how do we know if this difference is statistically significant? In this example the chi square statistic is 7.854. How do we decide how large

the value for the statistic has to be to declare the proportions different? We look at the p value. For a χ2 of 7.854 the p value = 0.005 (This has been

derived from statistical tables. You do not need to look these up for this course!). A p-value = 0.005 means that the probability of getting χ2 as high

as 7.86 if the proportions are equal is 0.005 (5/1000). This is unlikely and we conclude that they cannot be equal. Our interpretation is that there is a

“statistically significant difference” between the proportions. We will discuss p-values is more detail in the next section of this topic.

Summary

So what have we learned about Chi-square? When we compare an adverse event of drug versus no drug, if the drug is ineffective we expect

the proportion of adverse events to be the same in both groups. For example, the proportion (p1) of adverse events in one group will be the same as

in the other group (p2). We calculate a test statistic called the chi-square from the difference between the values we actually observed and what is

expected. We then check the p-value for statistical significance. So for any cross-tabulation, such as the 2 x 2 tables shown in this presentation, it is

possible to test the null hypothesis that there is no relationship between the two variables.

8.4(4) P-values

P-values are used test to see if an association between the two (or more) groups is statistically significant. The P-value is used for hypothesis

testing and is critical in the interpretation of results from quantitative research studies. The p-value is associated with a statistical test and

demonstrates the extent to which the test has “detected” a difference between the sample statistic and population parameter (e.g. mean), at a

specified magnitude. It is the probability of obtaining the same or more extreme result that the one actually observed from chance alone (i.e.

assuming the null hypothesis is true); p-values are generally (but arbitrarily) considered significant if p < 0.05 (i.e. The p-value is less than or equal, to

0.05).

Articles in scientific journals routinely report the results of studies with p-values. If they do not have some test of significance you cannot really

interpret whether or not the results are actually “true” or of any use at all. For example, often studies that test new drugs against standard drugs to

Page 55: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 55

determine if the new drug is more effective and the p-value will provide the answer. Researchers may also publish evidence to determine if there is a

possible link between the effects of one factor (IV) upon an outcome (DV) (e.g., asbestos and lung cancer) as we have discussed earlier.

In quantitative studies, results are commonly summarized by a statistical test like p-values or confidence intervals, and a decision about the

significance of the result is based on either one of them. The reader of an article must decide, like a juror on a criminal case, if the evidence is strong

enough to believe. Assuming the study was designed according to good scientific practice, the strength of the evidence is contained in the p-value.

Therefore, it is important for the reader to know what the p-value is saying.

Example 1: testing the effectiveness of a new headache drug

To describe how the p-value works, imagine we are researchers, and for our study, subjects are randomly assigned to one of two groups. Some

treatment is performed on the subjects in one group, and the other group acts as a control. For this example, suppose the treatment group is given a

new headache medicine and the control group is given a standard headache medicine. The amount of time it takes for the headache to resolve in all

participants is measured for both groups.

Before we continue with this research scenario, you should consider the issues of the meaning of the null hypothesis and the level of

significance for the rejection of the null hypothesis:

A. It is important for us to understand the notion of the null hypothesis (H0), which assumes that any kind of difference or significance you see in

a set of data, is due to chance. It attempts to show that no variation exists between variables, or that a single variable is no different to zero. It

is presumed to be true, until statistical evidence (e.g., via χ2 testing) rejects the null hypothesis for the alternative hypothesis (H1), which states

that there is a significant difference, say between two groups being tested.

For the example above, it is reasonable for the null and alternative hypotheses to state:

H0 = the mean of the two groups are no different

H1 = the mean of the two groups are statistically significantly different

Page 56: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 56

The p-value determines whether we accept or reject the null hypothesis.

B. The researcher decides what significance level to use -- that is, what cut-off point will decide significance in the test they use (in this case the

cut off for the p-value). The most commonly used level of significance is 0.05. When the significance level is set at 0.05, any test resulting in a

p-value equal to, or less than 0.05 would be significant. Therefore, you would reject the null hypothesis in favour of the alternative hypothesis.

C. P-values equal to or less than 0.05 also suggest that the observed associations could be found by chance in 5 out of 100 samples. That is, the

results of 5 in 100 samples are due to chance occurrence. But if the p-value were 0.09 it would suggest that the observed association could be

found by chance in 9 out of 100 samples. Along the same lines, if p = 0.0008, this means that the observed association could be found by

chance in 8 out of 10,000 samples!

The GOLDEN RULE for interpreting p-values is:

If p <= 0.05, the result is statistically significant

If p > 0.05, the result is not statistically significant

Page 57: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 57

The diagram below shows this relationship between the size of the P-value and the strength of the evidence against the null hypothesis6.

6

This diagram is taken directly from Bowling A and Ebrahim S (ed), (2005) Handbook of Health Research and Methods: Investigation, Measurement and Analysis. Open University Press/McGraw-Hill Education, UK: page 500 (Figure 21.3).

Page 58: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 58

P-values do not simply provide you with a “Yes” or “No” answer; they provide a sense of the strength of the evidence against the null

hypothesis. The lower the p-value, the stronger the evidence. Note that a p-value does not tell you how strong the effect is, only how strong the

statistical evidence is for that effect is.

Once you know how to read p-values, you can more critically interpret journal articles, and decide for yourself if you agree with the

conclusions of the author7.

Let us look at an example of the process of testing a null hypothesis using a p-value and the steps the researchers take to do this within the

study process:

Example 2: Lung cancer and smoking

The Null Hypothesis is always formulated in the negative (this is a scientific convention).

If the study is to test whether or not there is an association between lung cancer and smoking the Null Hypothesis (H0) would be: "There is no

association (no true statistical difference) between exposure to smoking and developing the disease of lung cancer". They then move through the

following steps to test this null hypothesis:

1. The researchers must determine what level of probability will be considered small enough to reject the Null Hypothesis (in most health

research studies, a 5% chance or less is considered to be the level at which they will reject the Null Hypothesis). This is usually stated in the

Methods section of the published article.

2. The next step in the process is to collect the data using the most appropriate study design for the research question (et RCT, case-control or

cohort etc). In this example - a very big sample would be used for a cohort study or you could use a case control study).

3. Now the researchers analyse the data they have collected to measure the association between the independent and dependent variables in the

study sample and then apply the statistical significance test (i.e. you test the Null Hypothesis).

7

This information for this presentation was adapted from Statistics Tutorials (2008): http://www.stattutorials.com/p-value-interpreting.html, date accessed 2 February 2009

Page 59: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 59

4. The researchers would calculate the probability (using the p-value) that the observed data would occur if the H0 of "no difference" were true.

To do this the researchers must choose the most appropriate statistical test depending on the type of data (we discussed this earlier in the

course) and then calculate the probability of the H0 (by calculating a P-value).

5. The next step is to decide to reject OR fail to reject, the Null Hypothesis. Having obtained the P-value, the researchers then compare this P-

value with the predetermined statistical cut-off level of significance (p of <= to 0.05).

6. If the P-value is 0.05 or less in the study results (i.e., the probability of the results occurring by chance is less than or equal to p = 0.05, then the

researchers can reject the null hypothesis.

7. If the results give a probability that is greater than p = 0.05, the researchers are unable to reject the null hypothesis - if this happens, we say

that the researchers have failed to reject the null hypothesis.

Example 3: Pain and vaccine injection techniques

This example is taken from Topic 6 where we examined Randomised Controlled Trials. We used the case study by Ipp etal, 2007. Read the

abstract below and concentrate on the results including the presentation of those results from Table 2 in the article (in the table below the abstract):

-values for each of the outcome measures?

Note that in the abstract the researchers report that the mean MBPS scores8 (95% confidence interval (CI)) were higher (p<0.001) for the

standard group compared to the pragmatic group, 5.6 (5 to 6.3) vs 3.3 (2.6 to 3.9). Can you state this result in plain English?

8

The MBPS score was the scale the researchers used to record levels of pain in the subjects.

Page 60: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 60

Page 61: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 61

Table 2 from Ipp etal, 2007: page 1106.

The null hypothesis (H0) for this study would have been that “there is no difference in the pain response of infants who had vaccine injected at

a fast rate (pragmatic group), compared to a group who had the vaccine injected at a slower rate (standard group)”. After developing their study

design (RCT) and collecting the data and analysing it, they were able to finally test this null hypothesis.

Looking at the results in Table 2 from the article (above), let us now review the outcome for “MBPS”:

The infants in the “standard” group had a higher pain response (5.6) compared to the lower score of 3.3 in the “pragmatic” group.

Remember that the standard group had a slower injection of vaccine than the pragmatic group in which the vaccine was injected at a fast

rate.

The level of significance was set at p<= to 0.05. This was included in the last line of the section “Outcome measures” in the article.

The results for MBPS demonstrated that the researchers had to reject the Ho (the p-value was <0.001 for a difference between the two

groups) which is highly significant; and

Page 62: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 62

So they had to accept the alternative hypothesis (H1), that there IS actually a truly significant difference between the fast and slow vaccine

injection and that the fast injection technique caused less pain than the slow injection technique.

Summary:

We have now described the second important test of significance (P-value) that is used in many research articles. The tests of significance

allow the researchers to formally test their null hypothesis and to demonstrate the strength of that evidence in any publication. In many cases, a

research finding is not always accepted from one published article based on a single study. Often many other researchers will retest the null

hypothesis using perhaps stronger or more powerful studies in order to improve the level of evidence until there are a number of published articles

from various types of studies before clinicians will change their practices and adopt the findings from the published research. This is why it is so

important that researchers published fully transparent results from their research. Subsequent researchers will re-test the null hypothesis using the

earlier studies to improve their own research into the same health problem. Now let us look at Confidence Intervals that are also commonly used by

researchers to test the null hypothesis in research studies.

Page 63: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 63

8.4(5) Confidence Intervals (CI)

A Confidence Interval (CI) is a concept used for statistical inferences about the population using data from a sample or samples. It is used

to create reasonable bounds for the population mean or proportion, based on information from the sample. The CI is computed from the sample data

and has a given probability “that the unknown population parameter (e.g. the mean or proportion), is contained within that interval”. CI are usually

reported as 95% CI, which is the range of values within which we can be 95% sure that the true value for the whole population lies. However 90% and

99% CI can also be used.

Confidence intervals are based upon calculated standard errors9, and they give the range of likely values for a population estimate, based on

the observed values from a sample. Standard errors can represent the "average" deviation between actual and predicted observations. Another way of

looking at standard error of the mean is that it “provides a statement of probability about the difference between the mean of the population and the

mean of the sample” (Swinscow 1997).

Confidence intervals are often more useful for the range of values they rule out than for the values they include! The level of confidence of

95% (±1.96 standard errors) is a convenient level for conducting scientific research, so it is used almost universally. Note that the 95% CI is statistically

the same as setting the p-value as p = 0.05.

CI can be set at other levels, e.g. 99% (±2.57 standard errors) but the 95% CI is the standard one used by most researchers. Confidence

intervals are used for hypothesis tested but are also useful because the show how small or large the true effect size might be.

9

Standard errors are also called standard errors: Samples are taken from the population of interest so that inferences can be taken that may be representative of the source population. All samples embody an element of chance. Sampling errors are caused by the sampling design. A sampling error is the discrepancy between the sample mean (usually) and the true population mean. It occurs solely by chance. Once the sample has been selected randomly, we can determine the probable difference between the sample and the population as a whole, as a range. We usually express our results, therefore, with a high degree of confidence (but not total) that our results apply to the entire population, plus or minus a little. It sounds more tentative than we might like but it cannot be more accurate than that. It should be pointed out that stratification of a sample can reduce the standard error.

Page 64: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 64

The following quantities make up and influence a confidence interval and are most important if you are to correctly interpret CI in research articles:

a. The sample mean (or proportion) determines the location or middle of the confidence interval;

b. The sample size (n). That is, as number (n) in the sample increases, the width of the CI gets narrower. This is often described as the “power” of

the study and it reflects the importance of large numbers of participants in a study sample. The narrower the CI, the more certain one can be

about the size of the true effect. If a study reports a 95% CI then means that there is a 95% chance that the true result lies within the CI. . As

the confidence interval gets smaller, the width of the CI gets narrower and your confidence in the results increases.

c. The sample standard deviation (s). As the s increases, the width of the CI gets wider. This is another way of demonstrating the effect of the

sample size on the CI.

There is another important factor that will help you interpret confidence intervals:

I. CI for RR and OR: if the analysis included using relative risk (RR) or odds ratio (OR) as a measure of association and the RR or OR was 1, this

means there was no difference between the study and control groups. If the CI around a RR or OR includes one, this result is NOT statistically

significant. Again this is because RR = 1 and OR = 1, is the null value for the relative risk and odds ratio.

The figure below demonstrates some of these issues pictorially10:

10

Taken from: School of Population Health Health, University of Melbourne (Evidenced-Based Medicine Working Group). Evidenced-Based Clinical Practice (Theme II), Teaching Modules (Tutor Notes 2010), Un-published document.

Page 65: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 65

This figure displays the range of possible results from a hypothetical study. The study is randomised controlled with an intervention and

control group. Subjects in the study are at risk for ischaemic heart disease and the intervention group receives a new therapy aimed at reducing death

from ischaemic heart disease. The control group receives standard treatment. A relative risk of less than 1 suggests that the new treatment has a

protective effect but for this study, a relative risk of less than 0.6 is considered clinically significant.

Page 66: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 66

We can also use CI to state how confident we are that the true population mean (or proportion) will fall between the lower and upper limits

expressed by the confidence intervals. This is important, because as researchers, we usually deal with samples from a population. So confidence

intervals show the extent to which statistical estimates (from the sample) could be accurate (or generalisable to the total population of interest).

Now let us look at some practical examples to demonstrate how CIs are used in research studies.

Example 1 – Diastolic blood pressure of chefs in France

Let‟s look at a research example of a confidence interval in action! You will not be required to calculate confidence intervals (or p-values) for

HBS108. However you need to understand how to interpret them if you read the results in a research article.

The mean diastolic blood pressure of a sample of head chefs in Marseille was found to be 88 mmHg and the standard deviation 4.5 mmHg.

The mean ±1.96 times its standard deviation gives the following two figures:

88 + (1.96 x 4.5) = 96.8 mmHg

88 - (1.96 x 4.5) = 79.2 mmHg.

In other words the 95% CI for this sample were 79.2, 96.8. Note that the scientific convention is that CI is presented by a comma (,) separating

the two intervals. Do not use a “dash” (-) to present CIs.

We can now conclude that only 1 in 20 (or 5%) of head chefs in the population from which the sample is drawn would be expected to have a

diastolic blood pressure below 79 or above about 97 mmHg. These are the 95% limits of this confidence interval.

Page 67: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 67

Example 2: Maternal Smoking in Pregnancy and childhood Asthma

This example is taken from Topic 6 when we examined cohort studies. In our case study by Jaakkola & Gissler (2004) the researchers followed a

group of mothers who smoked during pregnancy and a group of mothers who did not. The groups where then followed-up over an extended period

of time to see if the exposed group was at a greater risk of disease compared to the unexposed group. The researchers looked at outcomes for the

children born to smokers and non-smokers. They studied the effects of exposure of maternal smoking on the risk of their children developing

childhood asthma although the study also looked at foetal growth and pre-term delivery.

Read the abstract below and concentrate on the results including the presentation of those results from Table 5 from page 139 of the article

(in the table below the abstract):

Page 68: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 68

Page 69: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 69

The null hypothesis (Ho) for this study would have been that “there is no difference in the risk of asthma in children born to mothers who had

smoked in pregnancy, compared to those infants born to mothers did not smoke. After developing their study design (Cohort study) and collecting

the data and analysing it, they were able to finally test this null hypothesis.

Looking at the results in Table 5 from the article (above), let us review the outcome for the exposure category smoked in pregnancy “>10

cigarettes per day” Let us look at one result in that row (i.e. the Crude OR in the 3rd column)

The OR was calculated at 1.36 with 95% CI of (1.13, 1.62)

ee that the sample size

was 58,842 births – this a very big sample meaning that the study had a lot of statistical “power”.

e null hypothesis and

accept the alternative (H1) hypothesis. The researchers and 95% confident that there is a statistically significant association between maternal smoking

and the risk of childhood asthma.

Page 70: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 70

– a pictorial description of this result.

Page 71: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 71

Summary

We have now looked at the third test of significance that is used by researchers to test the null hypothesis. While the test of significance are a

very important last step in the research process used in quantitative studies there are is another important step in the process before we can actually

use research findings to change clinical health policy or decision making. We need to look at the issues of generalisability and causation.

8.5(6) Generalisability and causation

Generalisability

Generalisability is the extent to which research findings and conclusions from a study conducted on a sample population can be applied to the

population at large. The first step to check generalisability is to ensure that the result was statistically significant and we have discussed this is the

earlier parts of this section. But let us look more broadly at the logic of generalisability. Generalisability is also called “external validity” in scientific

language.

Study validity is the degree to which an inference from a scientific study is warranted, taking into the account the strengths and weaknesses of the

study design. There are two aspects of study validity: internal and external study validity.

Internal validity is the degree to which observations taken during the study may be attributed solely to the hypothesized effect that is being

studied. We have discussed this earlier in Topic 6 and 7. If the study has been done effectively, for example the researchers have controlled for all the

potential confounding variables, we say that the study has strong internal validity.

External validity is the extent to which the finding of a study can be generalised from the results from the sample being studied, to the population

from which the sample was taken.

The results of any study may not be generalisable to all patients outside the study setting.

Page 72: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 72

For example if we were conducting a RCT, they usually address the question of whether a treatment can work (efficacy) but may not tell us

about whether the treatment will be effective when offered to the broad range of patients seen in day to day clinical practice.

Randomised controlled trials tend to enrol only a small proportion of the potential population of patients with the disease of interest. Often

extensive inclusion and exclusion criteria are used in order to limit sources of heterogeneity. Heterogeneity is the quality of being diverse and not

comparable in kind. If the sample is not heterogeneous, it may influence the study results.

The factors that may affect the applicability of RCT results to individual patient populations include:

Differences in the disease processes experienced by individual people in different populations

Between population differences in the response to treatment (e.g. differences in drug metabolism or immune responses)

Differences in patient compliance (e.g. did the study participant use the drugs in the same way as the general population of patients that

might use this drug);

Differences in baseline risk for adverse events being targeted by the treatment (e.g. where the risks for the disease being studied, different

in the study participants when compared to the general population that might use this drug?);

Presence of other medical conditions that may alter the potential benefits and risks of treatment (e.g. the presence of confounding

variables); and finally

The feasibility of implementing treatments in different clinical settings (e.g. factors such as technical requirements for safe and effective

administration of therapy, availability of trained staff and cost may influence decisions to offer therapies in different settings.)

If a study was conducted using a sample of young single mothers at a university, could this be generalised to the rest of the population? The

findings of research based on random sampling of the population can be fairly applied to the population as a whole, but only to that population. This

means that we must be very clear about the nature of the population we wish to study before drawing down the sample and apply the results to the

whole population. There may be superficial resemblances between various populations and the sample we have used but there may be substantial

differences as well. We simply don't know until we do the research. It is better to claim for your findings only that which can be defended, because this

will earn greater respect for you and your work.

Page 73: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 73

Apart for ensuring that the study results are generalisable from the sample to the general population you need to check to see if the

researchers have discussed causality.

A note on study limitations

All researchers should discuss the limitations of their research in the journal article. You will need to look for this discussion that is

usually found towards the end of the "discussion" section of a research article. The researchers should always point out the shortcomings

of the study, especially those that affect the conclusions drawn. They will then "defend" the results and conclusions drawn in light of these

limitations. You need to assess whether or not these limitations damage the overall nature of these limitations? The limitations might relate to

the overall study design, the response rate, the characteristics of the sample etc. The limitations might affect the degree to which the results can really

be generalised to the target populations from which the study sample was drawn.

Causality

Before you can finally accept the evidence from a research article you need to check if the researchers have discussed their significant results in

terms of causality. Austin Bradford-Hill originally developed this concept in 196511. Just because three brown-haired women were run over by

Melbourne trams in the last week, it does not necessarily mean that brown-haired women are more likely to run over by trams than the rest of the

population. What Bradford Hill was saying is that the presence of an association between A and B does not tell you anything at all about:

a) The presence of the causality; or

b) The direction of causality.

“To show that A has caused B (rather than B causing A, or A and B both being caused by C [a confounder]), you need more than a correlation

coefficient” (….or a RR)12. Bradford Hill “tests for causation” are outlined in the box below (Greenhalgh (1997): 423).

11

Bradford-Hill A, The environment and disease: association or causation? Proc R Soc Med, 1965; 58: 295-300. 12

Greenhalgh T. (1997) How to read a paper: Statistics for the non-statistician. II: “Significant” relations and their pitfalls. BMJ; 315: 422-5.

Page 74: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 74

Example: Does the pill cause life-threatening clots in the arteries (thromboembolism)?

Austin Bradford-Hill set out guidelines that have been used to clarify the issue of causality from that of association. The third-generation pill

story was one of association; so asking questions about causality is useful in determining the underlying credibility of the association between the pill

and clots in the arteries. As the table shows, applying the Bradford-Hill criteria gives an under-whelming conclusion as to causality.

Page 75: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 75

Reference: Bandolier website: Third Generation Pills – Causality, Bandolier Journal (2007): Accessed at: http://www.medicine.ox.ac.uk/bandolier/band64/b64-

5.html#Heading4

8.4(7) Summary

We have now explored every aspect in the quantitative research process. It this section of that exercise we have looked at the final step in the

research process (hypothesis testing). We have now completed the section of this cause that deals with quantitative research studies. In the final we

will return to the practical aspects of evidence based practice.

Page 76: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 76

Topic 8.5: An Introduction to Mixed Methods in Health Research

Key terms (see Glossary)

Mixed methods research

Triangulation

8.5(1) Introduction

Up to this point, our main focus has been on two distinct research methods – quantitative and qualitative research. At this point of the

trimester, it is valuable for you to be aware that there is another paradigm that is becoming increasingly popular across all health, medical, nursing

and behavioural science disciplines and their associated literatures.

According to Giddings & Grant (2006), the shifting complexities of health care and social issues has demanded more creative ways of using

research to find solutions to a diversity of challenges. An effective research approach you may have heard of is to mix aspects of the major research

methodologies to determine such solutions.

Reading

Read the following sections of your text by Taket (2010) in Chapter 20 on Mixed Methods:

- “Introduction”, page 332;

- “Different types of mixed method”, page 332-334;

- “Why use mixed methods”, page 334;

- “A case-study – exploring reasons why women do not use breast screening”, pp. 334-340

Page 77: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 77

8.5(2) What is mixed methods research?

According to Taket (2010, p. 332, text) mixed methods research ….”combines research methods from qualitative and/or quantitative research

approaches within a single study”. Another useful general definition of mixed methods research is taken from Walker (2009, p. 270) in his chapter

“Mixed Methods Research: Quantity Plus Quality”:

“Research in which the investigator collects and analyses data, integrates the findings, and draws inferences using both qualitative and quantitative

approaches or methods in a single study or a program of inquiry.”

From time to time, your research idea or question may require the use of multiple or mixed methods, rather than relying solely on quantitative

or qualitative methods. It is useful when you wish to employ more than one “world view”, suggesting the gathering of both quantifiable information

plus in-depth approaches offered by qualitative research are important to your research.

8.5(3) Why use Mixed Methods?

Careful consideration must be taken to employ any research methods for a research project to succeed. So there must be an appropriate fit

between the chosen methods and the research question. According to Giddings & Grant (2006), mixed methods research has the following strengths:

1. It is useful in survey, evaluation and field research;

2. It has a broader focus than single method designs, therefore can gather more information about the phenomenon of interest. Taket (2010,

text) illustrates the example of using the RCT to test a hypothesis for a particular intervention, yet also using qualitative methods to understand

the processes behind the outcome of the RCT. A real life example can be seen in a study by Ammerman et al (2003), which sought to examine

the expectations and satisfaction of pastors and lay leaders regarding a university research partnership in a randomized controlled trial

(quantitative) guided by community based participatory research methods (qualitative);

3. It can give greater insight into complex social or health phenomena, such as family violence or eating disorders. As an example, Collins and

Dressler (2008) conducted a “mixed methods investigation of human service providers‟ models of domestic violence”.

4. Mixed methods can also to some extent compensate for the shortcomings in each individual method by combining aspects of them.

Page 78: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 78

5. It is useful in clinical settings as its outcomes can guide decisions about best practice. A “concurrent mixed methods approach” was conducted

by Kennett, O‟Hagan & Cezer (2008) to show how learned resourcefulness empowers individuals undergoing a chronic pain management

program.

6. A study was conducted by Strolla, Gans & Risica (2005), using mixed methods, to develop tailored nutrition intervention materials to reach

low-income, ethnic minority populations. Taket (2010, text) elaborated on the theme of mixed methods at times being appropriate in reaching

often difficult to reach subgroups in the population, in this case, low-income Hispanics and non-Hispanics in the US.

8.5(4) Different Types of Mixed Method

Here are a couple of ways the different types of mixed method can be utilized. This is adapted from Taket (2010, p. 332, text), and Cresswell

(2009):

1. A number of different methods are used in a single study (e.g., mixing qualitative and quantitative methods). This is known as “triangulation”

(see diagram below). Also, mixing can occur at different stages in research process (e.g., in data collection, data analysis or both of these

stages).

2. Different modes or ways of mixing methods within single study can be used (e.g. when the research project is divided into different phases and

carried out sequentially, or parallel with each other. The diagram below is an example of a sequential explanatory design (where quantitative

methods predominate) (Cresswell, 2009, p. 209):

Page 79: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 79

8.5(5) Examples of the different purposes for using mixed methods

In your textbook by Liamputtong (ed) (2010, p. 333), Taket has developed a table to illustrate “Examples of different purposes for use of mixed

methods” (see Table 20.1). These include:

Understanding health and quality of life

Understanding health-related behaviour and its relationships with health and health-related outcomes

Designing and developing interventions;

Evaluating interventions; and

Improving research design.

It is beyond the scope of this course to discuss all these aspects of mixed-methods in detail but if you are interested, there is more information

in your textbook (see Chapter 20).

Limitations of mixed methods do exist, notably that they take more time throughout the whole research process; however, on the positive side,

it may be the best design to answer the research question.

For a more in-depth exploration of mixed methods in action, please read the case-study in your textbook reading (pages 334-340) - entitled “A

Case Study – Exploring Reasons Why Women do not use Breast Screening”.

Page 80: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 80

8.5(6) Summary

We have examined briefly mixed-methods research where the research design uses both quantitative and qualitative methods within a single

research study. We summarised the different types of mixed methods research and why this approach is used. We also provided a case-study to

demonstrate the practical application of a mixed method approach to health research.

Page 81: Relative Resource Manager

HBS108 Topic 8 Descriptive Statistics, Analysis & Interpretation. Mixed Methods in Health Research

Deakin University Page 81

Section 8.7: Information about Topic 8 Test

It is now time to see how you went with Topic 8. The test involves 8 multiple choice questions and is available for one hour (1 hour).

You should now:

A. Know and understand methods to summarise and present data from samples including:

1. Commonly used frequency distributions;

2. Measures of central tendency (mean, median and mode);

3. Measures of dispersion (range, percentiles, interquartile range (IQR) and standard deviation (SD;) and

4. Commonly used distribution patterns (also called probability or sampling distributions) including normal and skewed.

B. Understand and interpret some commonly used measures of association including: Relative Risk (RR) and Odds Ratio (OR), and

Correlation Coefficient (r).

C. Understand the rationale and logic of statistical inference and hypothesis testing through the use of Chi Square (χ2), p-values and

Confidence Intervals (CI).

D. Be aware of the principles of generalisability and causation that underpin the interpretation of results from quantitative research studies.

Please do NOT attempt this test until you feel confident that you have covered the course material for Topic 8 thoroughly.

-- Congratulations on completing the Quantitative Research trilogy! --