Dr Kelvin Ng Kuan Huei MBBS MRCP Specialist Registrar in CPT/GIM Crash Course in Statistics.

Dr Kelvin Ng Kuan HueiMBBS MRCP

Specialist Registrar in CPT/GIM

Crash Course in Statistics

‘There are three kinds of lies: lies, damned lies, and statistics.’

-- Benjamin Disraeli

Why understand statistics?

• Statistics help us to see patterns• Bad statistics = Bad Decisions• If you don’t understand statistics, you can’t spot

bad statistics

Quantitative vs Qualitative

Qualitative Quantitative

Complete detailed description Classify, count and analyse statistically

Researcher may only roughly know endpoint

Researcher knows what the endpoint is

Researcher is data gathering instrument

Researcher uses tools

Data in form of pictures, words or objects

Data in the form of numbers

Subjective Objective

‘Rich’ more time consuming and not generalizable

Efficient, hypothesis testing but loss of detail

‘Red apple was the favourite as it was sweeter, crunchier and tastier but on the other hand

green apple was more refreshing! ‘

‘The red apple was the favourite compared with the green apple with

P<0.05’

Observational studies vs RCT

• Experimental and quasi-experimental

• Observational studies– Easy, fast and relatively cheap– Dependent on stratification eg. selection bias,

covariates

• RCT– Balancing of confounding factors– Lack of generalization, not always applicable,

slow

Statistics• Descriptive Statistics

– Describe or summarise data

• Inferential Statistics– Make statistical inferences and draw

conclusions• Estimation

– Confidence interval– Parameter estimation

• Hypothesis testing – Null hypothesis

Descriptive statistics• Measures of central tendency

– Mean, mod, median

• Measures of dispersion and variability– Standard deviation, variance,

• Diagrams eg. stem and leaf, box plots

Descriptive statistics

Sample– 9, 4, 5, 4, 7, 4, 2, 5– 2, 4, 4, 4, 5, 5, 7, 9– Mean = 5– Median = 4.5– Mod = 4– Standard deviation = 2

Inferential Statistics

• Reach conclusion beyond the immediate data alone ie. make inferences on population based on sample

• True state of affairs + chance = sample– Sample error– Central limit theorem ie. normally distributed

Inferential Statistics

• Comparisons analysis– Either compares means or medians between

groups

• Correlation analysis– Correlation does not imply causation

• Regression analysis– Incorporates multiple covariates into equation

Comparisons Analysis

• T-test– Comparisons of means

• Mann Whitney U and Wilcoxon matched pair test– Comparisons of medians

• ANOVA and Kruskal Wallis test– Comparison of means between unrelated

groups (ANOVA)– Comparisons of medians between unrelated

groups (Kruskal Wallis test)

Correlations analysis

• Linear datasets?

• Spearman rank correlation– Ordinal data but no need for normal

distibution

• Pearsons product moment– Interval data

Correlation does not imply cause and effect!

Regression analysis

• Does not assume normal sampling. Allows modeling the dependence of a variable against another (or more)

• Binomial dataset – Chi2 test

• Linear regression

• Multiple regression

Linear regression

Multiple regression

Correlation vs regression

• Correlation – Makes no assumption about association– Test for interdependence

• Regression– Assumes variable is dependent covariates– One way causal relationship (in linear

regression)

Correlation or regression analysis?

The P value

• It is not a measure of the hypothesis ie. • It is the probability of obtaining the result by

chance….• But null hypothesis is not a random event! • P value of <0.05 is a less that 5% chance of

obtaining the result by chance• Pre-test probability

– Bayesian probability

The P value

• High P value– Underpowered– Limited clinical difference

• Low P value– Large enough sample size will find even trivial

differences are associated with statistical significance

– Statistical significance does not equate to clinical significance

P value is no replacement for common sense!

Type I and II Errors

• Type 1 error (α error)– False positive ie. reject null hypothesis when

it is true

• Type 2 error (β error)– False negative ie. fail to reject null hypothesis

when it is false

Type 1 error

Type 2 error

Subgroup analysis

• Not statistically powered

• Multiple testing

• Usually not adjusted for covariates

• Predetermined endpoints

• ISIS-2 and star signs

Hazard ratio• Hazard ratio

– The risk of an event eg. death, composite endpoint

– A value of 1 suggests no difference between comparator groups ie. risk relative to another group

– Often expressed within 95% confidence intervals

Relative vs absolute risk reduction

• Beware of headline grabbing statements!– If I buy two lottery tickets, I double my

chances of winning by 100%– If I buy two lottery tickets, I increase my

chance of winning to 0.0001%

• Significance of effect is dependent on incidence

• Important in health economics assessments.

Summary

Questions?

‘There are three kinds of lies: lies, damned lies, and statistics.’

-- Benjamin Disraeli

Dr Kelvin Ng Kuan Huei MBBS MRCP Specialist Registrar in CPT/GIM Crash Course in Statistics.

Documents

Transcript of Dr Kelvin Ng Kuan Huei MBBS MRCP Specialist Registrar in CPT/GIM Crash Course in Statistics.