MD 5108 Biostatistics for Basic Research

61
MD 5108 Biostatistics for Basic Research Lecturer: Dr K. Mukherjee Office: S16-06-100 Tel: 874 2764 Email: [email protected]

description

MD 5108 Biostatistics for Basic Research. Lecturer: Dr K. Mukherjee Office: S16-06-100 Tel: 874 2764 Email: [email protected]. Objectives To train practitioners of the biomedical sciences in the use and interpretation of statistical data analysis. - PowerPoint PPT Presentation

Transcript of MD 5108 Biostatistics for Basic Research

Page 1: MD 5108 Biostatistics for Basic Research

MD 5108Biostatistics for Basic Research

Lecturer: Dr K. Mukherjee Office: S16-06-100

Tel: 874 2764Email: [email protected]

Page 2: MD 5108 Biostatistics for Basic Research

• explore and present data using tables, charts and graphs• ability to do simple statistical calculations with a calculator• carry out data analysis using a statistical package such as SPSS• pick the right procedure for analysing a set of data• interpret results correctly and report findings• avoid misuse and abuse of statistics• understand statistical contents of papers in medical journals• judge claims and statements critically• discuss and communicate ideas in a quantitative manner

ObjectivesObjectivesTo train practitioners of the biomedical sciences in

the use and interpretation of statistical data analysis.

Page 3: MD 5108 Biostatistics for Basic Research

Teaching approach• nonmathematical introduction• explanation of concepts rather than proofs• emphasis on methodology and procedures• emphasise use of statistical package rather than manual calculation• emphasis on choosing the right procedure• emphasis on correct interpretation of results• examples from clinical research literature

Page 4: MD 5108 Biostatistics for Basic Research

Topic 1: What is statistics?“A branch of mathematics dealing with the analysis and interpretation of masses of numerical data” Merrian-Webster Dictionary“The field of study that involves the collection and analysis of numerical facts or data of any kind” Oxford Dictionary“The study of how information should be employed to reflect on, and give guidance for action, in a practical situation involving uncertainty” Vic Barnett

Biostatistics: Application of statistical methods to biological, medicine and health sciences

Page 5: MD 5108 Biostatistics for Basic Research

Why the need for Statistics in Biomedicine ?

Two main reasons:• Variation

– attributes differ not only among individuals but also within the same individual over time

• Sampling– biomedical research projects mostly carried out on

small numbers of study subjects– challenging problem to project results from small

samples studies to individuals at large

Page 6: MD 5108 Biostatistics for Basic Research

Biological Variation

Necessitates the use of statistical methods in biomedicine to put numerical data into a context by which we can better judge their meaning

Page 7: MD 5108 Biostatistics for Basic Research

From sample to population

Statistical methods used to produce statistical inferences about a population based on information from a sample derived from that population

Population

sample

inductive statistical methods

Page 8: MD 5108 Biostatistics for Basic Research

Altman (1991) Practical Statistics for Medical Research, Chapman and Hall.

Page 9: MD 5108 Biostatistics for Basic Research

Bailar & Mosteller (1986) Medical Uses of Statistics, NEJM Books.

Page 10: MD 5108 Biostatistics for Basic Research

Many studies have been done

on misuse of statistics in medicine

Page 11: MD 5108 Biostatistics for Basic Research

From Altman (1991)

Page 12: MD 5108 Biostatistics for Basic Research

Schor and Karten (1966, J. Am. Med. Assoc.):

• 149 papers classed as “analytical studies” in 3 issues of 11 most frequently read medical journals

• assessment criteria:Validity with respect to:

• Design of experiment?• Type of analysis performed?• Applicability of statistical test used?

Page 13: MD 5108 Biostatistics for Basic Research

Findings of Schor and Karten:

• 28% of papers acceptable• 68% deficient but acceptable if reviewed• 4% unsalvageable

Lesson:CARE

must be exercised when reading scientific papers in biomedical journals!Knowledge of basic biostatistics is required

Page 14: MD 5108 Biostatistics for Basic Research

“ There are three kinds of lies: lies, damned lies and statistics” Benjamin Disraeli 

“ It is easy to lie with statistics, but it is easier to lie without them” Frederick Mosteller 

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” H.G. Wells

Page 15: MD 5108 Biostatistics for Basic Research

1. Descriptive statistical methods    data collection and organization    summarizing data and describing its characteristics    presentation and publication 2. Exploratory data analysis    play around and get a feel of the data    preliminary analysis, often graphical    looking for patterns and possible relationships    are assumptions satisfied?    which model and procedure to use?

Types of statistical methods

Page 16: MD 5108 Biostatistics for Basic Research

3. Inductive (inferential) statistical methods

• estimation, confidence intervals• hypothesis testing• prediction, forecasting• classification

Statistical inferences about a population based on information from a sample derived from that population Population

sample

inductive statistical methods

Page 17: MD 5108 Biostatistics for Basic Research

Sources of data, the raw materials of statistics    Routinely kept records, e.g., hospital medical records    Surveys    Experiments    Clinical trials    Data base    Published reports 

Topic 2: Types of data

Any characteristic that can be measured or classified into categories is called a variable

Page 18: MD 5108 Biostatistics for Basic Research

(1) Qualitative variables    cannot be measured numerically    categorical in nature, e.g., gender    categories must not overlap and must cover all possibilities

Types of variables

  Nominal variables (No inherent ordering of categories)    M/F, Yes/No    Blood group (A, B, AB, O)    Ethnic group (Chinese, Malay, Indian, Others)

  Ordinal variables (Categories are ordered in some sense)    response to treatment: unimproved, improved, much improved    pain severity: no pain, slight pain, moderate pain, severe pain

Page 19: MD 5108 Biostatistics for Basic Research

(2) Quantitative variables    can be measured numerically, e.g., weight, height, concentration    can be continuous or discrete  a continuous variable can take on any value (subject to

precision of measuring instrument) within some range or interval, e.g., weight, height, blood pressure, cholesterol level  a discrete variable is usually a count of something and hence takes on integer values only, e.g., number of admissions to NUH Variable types and measurement types have implications on how data should be displayed or summarized    determines the kind of statistical procedures that should be used

Page 20: MD 5108 Biostatistics for Basic Research

Variable

Qualitativeor categorical

Quantitativemeasurement

Nominal(not ordered)e.g. ethnic group

Ordinal(ordered)e.g. response to treatment

Discrete(count data)e.g. numberof admissions

Continuous(real-valued)e.g. height

Types of variables

Measurement scales

SUMMARY

Page 21: MD 5108 Biostatistics for Basic Research

    Let data speak for itself    Get a good feel of the data before formal analysis    Graphs and plots easier to understand and interpret  Reveal patterns in data which may shed light on the appropriate model/analysis to use e.g., Skewed or symmetric distribution Multiple peaks / mode Are there any outliers ? Relatioship between variables.

Topic 3: Presenting data graphicallyAdvantages of graphical data display

Page 22: MD 5108 Biostatistics for Basic Research

Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & China USA

0

5

10

15

20

25

30

35

Region

% o

f wor

ld s

pend

ings

Bar chart for world pharmaceutical spendings, 1997

Graphs for categorical data

Page 23: MD 5108 Biostatistics for Basic Research

USA (34, 34.0%)

Australasia ( 1, 1.0%)

Europe (29, 29.0%)Af rica ( 1, 1.0%)

Japan (16, 16.0%)

Latin Americ ( 8, 8.0%)

Middle East ( 2, 2.0%)

SE Asia & Ch ( 7, 7.0%)

Canada ( 2, 2.0%)

Pie chart for world pharmaceutical spendings, 1997

Page 24: MD 5108 Biostatistics for Basic Research

Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & ChinUSA

0

10

20

30

40

50

60

70

80

90

100

% o

f wor

ld s

pend

ing

Segmented bar chart for world pharmaceutical spending, 1997

Page 25: MD 5108 Biostatistics for Basic Research

Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & ChinUSA

100

90

80

70

60

50

40

30

20

10

0

Sum

of %

of w

orld

spe

ndin

gUSA (34, 34.0%)

Europe (29, 29.0%)

Australasia ( 1, 1.0%)

Japan (16, 16.0%)

Af rica ( 1, 1.0%)

Canada ( 2, 2.0%)

SE Asia & Ch ( 7, 7.0%)

Middle East ( 2, 2.0%)

Latin Americ ( 8, 8.0%)

World pharmaceutical spending, 1997

Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & China USA

0

5

10

15

20

25

30

35

Region

% o

f wor

ld s

pend

ing

Bar chart for world pharmaceutical spendings, 1997

Page 26: MD 5108 Biostatistics for Basic Research

Comparison of methods    Bar charts can be read more accurately and offer better distinction between close together values Pie charts especially useful for showing percentage distribution    Pie charts can display large and small % simultaneously without scale break    A single bar chart is preferable to a single segmented bar chart    A series of segmented bar charts is easier to read than a series of pie charts or ordinary bar charts

Page 27: MD 5108 Biostatistics for Basic Research

PharmacistsNursesDoctorsDentists

6000

5000

4000

3000

2000

1000

0

Profession

Num

ber

of w

orke

rsBar chart for number of health professionals

Page 28: MD 5108 Biostatistics for Basic Research

PrivatePublic

PharmacistsNursesDoctorsDentists

6000

5000

4000

3000

2000

1000

0

Profession

Num

ber o

f wor

kers

Stacked bar chart for number of health professionals

Variation of the basic bar chart

Page 29: MD 5108 Biostatistics for Basic Research

PrivatePublic

Dentists Doctors Nurses Pharmacists

0

1000

2000

3000

4000

Profession

Num

ber

of w

orke

rsClustered bar chart for number of health professionals

Page 30: MD 5108 Biostatistics for Basic Research

PrivatePublic

PharmacistsNursesDoctorsDentists

100

90

80

70

60

50

40

30

20

10

0

Profession

Per

cent

by

sect

orSegmented bar charts by profession

Page 31: MD 5108 Biostatistics for Basic Research

PrivatePublic

PharmacistsNursesDoctorsDentists

4000

3000

2000

1000

0

Profession

Num

ber

of w

orke

rs

Clustered bar chart for number of health professionals

PrivatePublic

Dentists Doctors Nurses Pharmacists

0

1000

2000

3000

4000

5000

6000

Profession

Num

ber

of w

orke

rs

Stacked bar chart for number of health professionals

PrivatePublic

Dentists Doctors Nurses Pharmacists

0

10

20

30

40

50

60

70

80

90

100

Profession

Per

cent

by

sect

or

Segmented bar charts by profession

Page 32: MD 5108 Biostatistics for Basic Research

Dentists Doctors Nurses Pharmacists

Private Public

0

1000

2000

3000

4000

Sector

Num

ber o

f wor

kers

Clustered bar charts of number of health professionals

Plotting by sector rather than by profession     Look at the data from a different angle     Highlight different aspects of the data

Page 33: MD 5108 Biostatistics for Basic Research

Dentists Doctors Nurses Pharmacists

PublicPrivate

6000

5000

4000

3000

2000

1000

0

Sector

Num

ber

of w

orke

rsStacked bar charts by sector

Page 34: MD 5108 Biostatistics for Basic Research

Dentists Doctors Nurses Pharmacists

PublicPrivate

100

90

80

70

60

50

40

30

20

10

0

Sector

Per

cent

with

in s

ecto

rPercentage bar charts by sector

Page 35: MD 5108 Biostatistics for Basic Research

Dentists Doctors Nurses Pharmacists

PublicPrivate

100

90

80

70

60

50

40

30

20

10

0

Sector

Per

cent

with

in s

ecto

rSegmented bar charts by sector

Page 36: MD 5108 Biostatistics for Basic Research

Dentists Doctors Nurses Pharmacists

PublicPrivate

4000

3000

2000

1000

0

Sector

Num

ber

of w

orke

rsClustered bar chart of number of health professionals

Dentists Doctors Nurses Pharmacists

PublicPrivate

6000

5000

4000

3000

2000

1000

0

Sector

Num

ber

of w

orke

rs

Stacked bar charts by sector

Dentists Doctors Nurses Pharmacists

PublicPrivate

100

90

80

70

60

50

40

30

20

10

0

Sector

Per

cent

with

in s

ecto

r

Percentage bar charts by sector

Dentists Doctors Nurses Pharmacists

PublicPrivate

100

90

80

70

60

50

40

30

20

10

0

Sector

Per

cent

with

in s

ecto

r

Segmented bar charts by sector

Page 37: MD 5108 Biostatistics for Basic Research

A back to back bar chart

Source: JAMA, 1978, vol 239, no 21

Page 38: MD 5108 Biostatistics for Basic Research

Comparison of methodsStacked bar chart is also a bar chart for the combined dataSome of the bars in a stacked bar chart are not alignedBars in clustered bar charts are aligned but it is harder to visualize how the component bars would stack upBack to back bar charts are applicable when there are 2 groups only, the aggregated bars are not alignedSeries of stacked or segmented bar charts useful in showing time trend

Page 39: MD 5108 Biostatistics for Basic Research

Time Trend

Exaggerate visually the increase in # prescriptions written per person by starting at 8 rather than 0

Page 40: MD 5108 Biostatistics for Basic Research

Stacked bar chart of yearly mortality rate per 1000 births

Pagano & Gauvreau (1999) Principles of Biostatistics, Duxbury.

Page 41: MD 5108 Biostatistics for Basic Research

Response under two treatments

Response to Treatment

NonePartial

Complete

Total

A

3159

27

B

22230

54

Treatment

Page 42: MD 5108 Biostatistics for Basic Research

AB

CompletePartialNone

30

20

10

0

Response to treatment

Fre

quen

cy

A misleading bar chart

By design, there are twice as many patients receiving treatment B

Page 43: MD 5108 Biostatistics for Basic Research

NonePartial

Complete

BA

100

90

80

70

60

50

40

30

20

10

0

Treatment

With

in tr

eatm

ent p

erce

ntag

e

treatmentResponse to

Can compare the response type percentages for the two treatments

Page 44: MD 5108 Biostatistics for Basic Research

NonePartialComplete

A B

0

10

20

30

40

50

60

70

80

90

100

Treatment

With

in tr

eatm

ent p

erce

ntag

e

treatmentResponse to

Stacked bar charts for percentage figures

Page 45: MD 5108 Biostatistics for Basic Research

Graphs for quantitative data    Histogram    Frequency polygon    Box plot

Page 46: MD 5108 Biostatistics for Basic Research

HistogramDivide the range of the data into a suitably chosen number of intervals/bins, all of the same widthThe number of observations that fall within each interval is plotted

Relative frequency histogramPlot the proportions of observations that fall within the class intervals

Page 47: MD 5108 Biostatistics for Basic Research

Wild & Seber (2000) Chance Encounters, Wiley.

Page 48: MD 5108 Biostatistics for Basic Research

40 60 80 100 120 140 160 180 200 220

0

10

20

SysVol

Fre

quen

cy

Heart Attack PatientsHistogram of End-Systolic Volume for 45 Male

Page 49: MD 5108 Biostatistics for Basic Research

40 60 80 100 120 140 160 180 200 220

0

10

20

30

40

SysVol

Per

cent

Relative frequency polygon for SysVol

Page 50: MD 5108 Biostatistics for Basic Research

Comparison of methodsHistogramgood at revealing distributional shape such as symmetry, skewness, number of peaks etcdifficult to superimpose or draw side by side

Frequency polygons can be superimposed for easy comparison

Page 51: MD 5108 Biostatistics for Basic Research

Wild & Seber (2000, p.59)

Page 52: MD 5108 Biostatistics for Basic Research

Can be superimposed

Pagano & Gauvreau (1999)

Page 53: MD 5108 Biostatistics for Basic Research

Wild & Seber (2000)

Page 54: MD 5108 Biostatistics for Basic Research

The median is the middle value (if n is odd) or the average of the two middle values (if n is even), it is a measure of the “center” of the dataQuartiles: dividing the set of ordered values into 4

equal parts

Q1 Q2 Q3

first 25% second 25% third 25% fourth 25%

Q2 = second quartile = median

Median and quartilesSort the data in increasing order

IQR = Interquartile range = 13 QQ

Page 55: MD 5108 Biostatistics for Basic Research

Box plotDraw a box from the lower quartile to the upper quartile and a line to mark the position of the medianExtend from both edges of the box by 1.5 IQR, pull back the lines until they hit observationObservations more than 1.5 IQR away from the lower or upper quartile are marked out as outside values for further investigation and checking

Page 56: MD 5108 Biostatistics for Basic Research

How a boxplot is constructed (Wild & Seber, 2000, p.73)

5-Number Summary: min, lower quartile, median, upper quartile, max

Page 57: MD 5108 Biostatistics for Basic Research

20015010050

SysVol

a measure of the size of the heartDotplot for SysVol = End-systolic volume,

22012020

SysVol

Boxplot for SysVol

Page 58: MD 5108 Biostatistics for Basic Research

Advantages of box plotquick visual summary of a data setcapture prominent features like location, spread, skewness and outlierscan easily draw a series of box plots side by side; not so for histograms

Page 59: MD 5108 Biostatistics for Basic Research

Brand nameType Taste  $/oz $/lbProt Cal Sod Prot/FatHappy Hill SupersBeef Bland 0.11 14.23 186 495 1Georgies Skinless BeefBeef Bland 0.17 21.7 181 477 2Special Market's Premium BBeef Bland 0.11 14.49 176 425 1Spike's BeefBeef Medium 0.15 20.49 149 322 1Hungry Hugh's Jumbo BeefBeef Medium 0.1 14.47 184 482 1Great Dinner BeefBeef Medium 0.11 15.45 190 587 1RJB Kosher BeefBeef Medium 0.21 25.25 158 370 2Wonder Kosher Skinless BeeBeef Medium 0.2 24.02 139 322 2Happy Fats Jumbo BeefBeef Medium 0.14 18.86 175 479 1Midwest BeefBeef Medium 0.14 18.86 148 375 1General Kosher BeefBeef Medium 0.23 30.65 152 330 1Wall's Kosher Beef Lower FBeef Medium 0.25 25.62 111 300 3Hickory Natural SmokeBeef Medium 0.07 8.12 141 386 2Smith BeefBeef Medium 0.09 12.74 153 401 1Premium BeefBeef Medium 0.1 14.21 190 645 1Family StoreSkinless BeefBeef Medium 0.1 13.39 157 440 1Sam's Kosher BeefBeef Medium 0.19 22.31 131 317 2Hammer BeefBeef Medium 0.11 19.95 149 319 1Athens BeefBeef Medium 0.19 22.9 135 298 2Regents Kosher BeefBeef Scrumpt. 0.17 19.78 132 253 2Really Big Meat Bland 0.12 14.86 173 458 2Biggest JumboMeat Bland 0.12 17.32 191 506 1Home MadeMeat Bland 0.12 15.2 182 473 1Martha's Jumbo DinnerMeat Bland 0.1 14.01 190 545 1Hammer PremiumMeat Bland 0.11 13.92 172 496 2Willie's WienersMeat Bland 0.13 18.24 147 360 1Premium Hot DogsMeat Medium 0.1 14.12 146 387 1Airport WienersMeat Medium 0.09 11.83 139 386 2Judy's Favorite JumbosMeat Medium 0.11 15.41 175 507 1Stick Lean Supreme JumboMeat Medium 0.15 17.4 136 393 3Stick JumboMeat Medium 0.13 17.32 179 405 1Fat Jack JumboMeat Medium 0.1 15.61 153 372 1Thin Jack VealMeat Medium 0.18 20.4 107 144 3Top Grade Hot DogsMeat Medium 0.09 12.65 195 511 1Blended w/Chicken&BeefMeat Scrumpt. 0.07 11.17 135 405 1Heaven MadeMeat Scrumpt. 0.08 11.75 140 428 1Baked and SmokedMeat Scrumpt. 0.06 9.49 138 339 1Smart Person ChickenPoultry Bland 0.08 10.21 129 430 2Woods Park ChickenPoultry Medium 0.05 6.37 132 375 2Tony TurkeyPoultry Medium 0.07 8.42 102 396 3Rose Garden TurkeyPoultry Medium 0.08 9.37 106 383 3Low Fat TurkeyPoultry Medium 0.08 9 94 387 4Special Market's TurkeyPoultry Medium 0.07 8.07 102 542 5Caloryless TurkeyPoultry Medium 0.09 9.39 90 359 5Heaven Made Lower FatPoultry Medium 0.06 6.59 99 357 4McDowell's Jumbo ChickenPoultry Medium 0.07 8.43 107 528 2

DatasetHotdogs

Page 60: MD 5108 Biostatistics for Basic Research

Graphical Analysis of the “Hotdogs” data.

Page 61: MD 5108 Biostatistics for Basic Research

Parallel Box plots Can Be Quite Revealing

Reduction in concentration through timeHigher during winter monthsSkewed toward higher valueSpread increases with level

1969 1972

(Parallel histograms much harder to visualise)

Rice (1995) Mathematical Statistics & Data Analysis, Duxbury Press.