Intro biostat1&2

60
Introduction to Biostatistics 1 By Dr Babatunde, OA MBBS, PgCertDPMIS, MPH, FWACP Department of Community Medicine, FMC, Ido-Ekiti

Transcript of Intro biostat1&2

Page 1: Intro biostat1&2

Introduction to Biostatistics 1

ByDr Babatunde, OA

MBBS, PgCertDPMIS, MPH, FWACPDepartment of Community Medicine,

FMC, Ido-Ekiti

Page 2: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

2

Definition (C-O-S-A-I-P)CollectionOrganizationSummarizingAnalyzingInterpretingPresenting

Applications of biostatistics

Outline

Page 3: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Introduction A variable is any parameter that can be

observed or measured

Information collected on a variable is usually unrefined and it is called data

The collection, analysis, interpretation and use of data is called statistics

The application of statistics to health-related fields is known as Biostatistics1

3

Page 4: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

4

Biostatistics = Medical statistics

Medical statistics is the scientific method of collecting, organizing, summarizing, analyzing, interpreting, and presenting medical data1

Biostatistics is statistics applied to the biological sciences and to Medicine2

Definition

Page 5: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

5

Biostatistics is all about ‘curiosity’3

Biostatistics is about asking medically relevant questions and getting answers using statistical methods

Which age group dies most? Mortality rate What proportion of University students use

condoms during sexual intercourse? Assignment 1: Each student should ask a

medically related question of personal interest and submit it in the format below

‘Curiosity killed the ‘cat’’

Page 6: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

6

Name: Matriculation Number: Medical question of personal interest Submit it at the end of the lecture Also document in your notebook because

we will always make reference to this question throughout this class

Assignment 1 format 5 minutes

Page 7: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

7

Research is the scientific investigation of facts and relationships to establish dependable solutions to problems through systematic collection, analysis, and interpretation of data

Research is described as systematic in that it involves an organized, formally structured methodology to obtain new knowledge

Biostatistics is the basis for research 4

Research

Page 8: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

8

It is a general phenomenon that many students do not have interest in statistics

Many see it as too abstract to conceptualize However, it is the simplest form of all

sciences being practiced by both literates and illiterates

Grandmother statistics: A big stroke by a grandmother represents a birth while a small stroke represents a death (origin of tally sheet in immunization)

Bio-statistics is simple

Page 9: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

9

Biostatistics center around data Hence what is data? Data is information collected of an

individual or group of individuals When entered into a computer, it is called

dataset Assignment 2: List 5 examples of data you

can collect to answer your question in assignment 1

What is data?

Page 10: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

10

Example: How many students in this class use condom during sexual intercourse:

5 data set:1. Ever had sex2. Age at 1st sexual intercourse3. Number of sexual intercourse in last

3 months4. Number of times used condom5. Number of sexual partners since

sexual initiation

Assignment 2: List 5 examples of data to answer your question

Page 11: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

11

Questionnaires Observations (checklist) Focus Group Discussion Proforma Records Census List other ways you can collect data

Collecting data

Page 12: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

12

4 Levels of measurement are involved in data collection (N-O-I-R)

◦ 1. Nominal◦ 2. Ordinal◦ 3. Interval◦ 4. Ratio

Collecting data requires measurement

Page 13: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

13

Lowest level Mutually unordered category No notion of numerical magnitude Any number assigned has no numerical

value other than to distinguish one category from another.

Examples: Gender, Blood Group, Marital status

Assignment 3: List 5 more examples of Nominal scale

Nominal scale/level of measurement of data

Page 14: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

14

Ability to rank or order phenomenon In addition to nominal propert It is defined by related category Examples: Patients pain coditions desribed

as Mild, Moderate, Severe Assignment 4: List 5 more examples of

Ordinal scale of measurement

Ordinal scale/level of measurement of data

Page 15: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

15

Measurements are expressed in numbers The starting point is arbitrary depending

largely on the units of measurement It is possible to attach physical meanings to

differences of 2 measurements (intervals) but not to their ratios

Examples: Temperature-Centigrade or Fahrenheit

Interval Scale

Page 16: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

16

Measurement on this scale has 3 previously mentioned properties but in addition has a true zero point

The ratio of any 2 measurements on the scale is physically meaningful

Examples: Height in cm, Weight in Kg, Age in years.

Ratio scale

Page 17: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

17

Basic DefinitionsBasic Definitions

Level Summary Example

Nominal Categories only. Data cannot be arranged in an ordering scheme

Student’s car:1 Ford, 2 Toyota, 3 BMW

Ordinal Categories are ordered, but differences cannot be determined or they are meaningless

Student’s car:1 Compact,2 Mid-size, 3 Full size

Interval Differences between values can be found, but there may be no inherent starting point. Ratios are not meaningful

Temperature:45°,80°,90°

Ratio Like interval scale, but with an inherent starting point. Ratios are meaningful

Weights of football players:200 lbs, 300 lbs, 400 lbs

Page 18: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

18

Why does level of measurement matter?Why does level of measurement matter?

Theoretical interest is not the primary reason why researchers and statisticians consider the level of measurement of a variable.

Level of measurement is important because the kinds of statistical procedures that can be appropriately used depend on the level of measurement of the variable studied.

Calculating mean telephone number of a group of people’s telephone number would be possible but ridiculous, since telephone number is a nominal scale level variable.

Page 19: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

19

Raw data is usually not too useful It has to be organized to make sense out of

it This brings us to types of statistics:

◦ Descriptive: Frequency tables, Diagrams◦ Inferential: Use of statistical tests

Organization of data

Page 20: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

20

Primary dataData that is obtained directly from an individual e.g. 2006 Census

Secondary dataData that is obtained from outside source e.g. studying of hospital records 5

Types of data

Page 21: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

21

Types of Data A Special type of Discrete Variable is the

Binary Variable which takes on exactly 2 possible values◦ Gender (M/F)◦ Pregnant? (Y/N)◦ Hypertensive? (Y/N)

Page 22: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

22

Types of Data Sometimes, discrete variables have a

“natural ordering” to them◦ For example, names of consecutive days in a

week (M, Tu, Wed, Thurs, Fri, Sat, Sun) Other types of discrete variables do not

have a natural order and are called Nominal Variables◦ Race (African American, Caucasian, Asian,

Hispanic etc.)

Page 23: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

23

Types of Data If in an experiment you measure a single

variable, it is called a Univariate experiment If you measure 2 variables, it is called a

Bivariate experiment And if you measure multiple variables, it is

called a Multivariate experiment

Page 24: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

24

DESCRIPTIVE STATISTICS

Concerned with summarizing series of measurements or observations

A] Measures of Central tendency B] Measures of Variability/Dispersion C] Measures of Relative standing

Page 25: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

25

Summarizing data: Descriptive Measures

Now that we have displayed our data, we want to be able to characterize it quantitatively◦ Measures of Central Tendency

Mean, Median, Mode

◦ Measures of Variability Range, Variance, Standard Deviation

◦ Measures of Relative Standing Z-Scores, Percentiles, Quartiles

Page 26: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

26

Measures of Central Tendency Mean

◦ Arithmetic Average of a sample of data Median

◦ If you order the data from smallest to highest, the median is the middle value, assuming an odd number of data elements

◦ If you have an even number of elements, it is the average of the 2 middle numbers.

Mode◦ The most common value in a set of values

Page 27: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

27

i. Arithmetic Mean: This is different from other types of mean like geometric mean and harmonic mean.

The arithmetic mean is simply the average, denoted by the symbols shown: [μ,-x, ie miu or x-bar].

These symbols are used to represent arithmetic mean of population [N] and sample [n] respectively.

Arithmetic mean

Page 28: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

28

Median: Here the distribution is arrayed or arranged in a particular pattern.

Then look at the value which cuts this distribution into two equal parts.

That value in array which divides it into two equal parts is called the median.

Median

Page 29: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

29

Mode: This is the most frequently occurring value in a distribution.

Some distributions are described as amodal because they have no mode.

A distribution with one mode is uni-modal and that with two modes is called bimodal distribution.

Mode

Page 30: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

30

If you stop learning you are old, whether you are 20 or 80 years

Thank you

A word for the wise…

Page 31: Intro biostat1&2

Introduction to Biostatistics 2

ByDr Babatunde, OA

MBBS, PgCertDPMIS, MPH, FWACPDepartment of Community Medicine,

FMC, Ido-Ekiti

Page 32: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

32

This is one of the simplest measures of variability.

This is simply the difference between the highest and the lowest values; R=XH-XL.

The range has a problem of looking at two extremes alone and ignores other values.

Measures of variability: Range

Page 33: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

33

In the following distribution; 9, 4, 2, 5, 10 [which has a mean of 6], the total deviation from the mean or the average is always zero.

Since the total or average mean deviation is useless, something is done to get around the problem.

Thus we square the deviations and sum them up and we get 46.

Now the average of the squared deviations is got by dividing by number of observations.

This is called variance [S2, σ2], sample and population variance respectively.

Variance and Standard Deviation

Page 34: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

34

PRESENTATION OF DATA

tables charts diagrams graphs pictures special curves

Page 35: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

35

Numbering eg table 1, table 2, etc Title which must be brief and self explanatory Headings of columns and rows should be clear

and concise Data must be presented according to size or

importance, chronologically, alphabetically or geographically

If percentages or averages are to be compared, they must be placed as close as possible

No table may be too large Footnotes may be given where necessary

Characteristics of a good table

Page 36: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

36

Presentation of data (contd)

Charts and diagrams; These methods of presentation have powerful

impact on the imagination of people. So they are a popular media of exposing statistical data

a. Bar charts; these are a way of presenting a set of numbers by the length of a bar- length of bar being proportional to the magnitude to be represented

Page 37: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

37

Presentation of data contd

simple bar chart; bars may be vertical or horizontal are usually separated by appropriate spaces with an eye on neatness and clear presentation

Multiple bar charts; Here two or more bars are grouped together.

Component bar chart; Here the bar may be divided into two or more parts. Each part represents a certain item and

proportional to the magnitude of that particular item.

Page 38: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

38

Presentation of data contd

b. Histogram; this is a pictorial diagram of frequency distribution

It consists of a series of block

The class intervals are given along the horizontal axis and frequency on the vertical axis

The area of each block or rectangle is proportional to the frequency

The histogram is apt for representing continuous variables.

Page 39: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

39

i. it is like the simple bar chart except that the bars of histogram touch each other

ii. The height of each box is equal to the frequency {ie for equal intervals} of class it represents

iii. The interval with the highest box is called the modal interval ie interval that contains the mode.

Characteristics of histogram

Page 40: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

40

PRESENTATION OF DATA contd

c. Frequency polygon; a frequency distribution may also be represented diagrammatically by the frequency polygon

It’s obtained by joining the midpoints of the histogram blocks.

Page 41: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

41

d. Pie charts; Instead of comparing the length of a bar

the areas of segments of a circle are compared.

The Area of each segment depends upon the angle. A

circle of any considerable large size is divided into the

number of components that make up the total such

that the area of each sector is proportional to the

component it represents.

Page 42: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

42

PRESENTATION OF DATA contd

e. Graphs / scatter diagrams; this comes in when there

are two different factors involved eg age /height. If

after plotting the points, and they are such that the

points cannot be joined by any line, then graphs will

not apply and so we have scatter diagram.

Page 43: Intro biostat1&2

04/12/2023 Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

43

Simple bar chart

4242.5

4343.5

4444.5

4545.5

4646.5

47

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 44: Intro biostat1&2

04/12/2023 Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

44

Multiple bar chart

0

10

20

30

40

50

60

70

80

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

Page 45: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

45

Component bar chart

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 5

Series2

Series1

Page 46: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

46

Pie chart

1

2

3

4

Page 47: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

47

Scattergram

0

10

20

30

40

50

60

0 5 10 15

Series1

Page 48: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

48

Graph

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5

Series1

Series2

Page 49: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Statistical testing This refers to the applications of statistical

tests to study results with a view to ascertain presence of statistical significance

Suppose we find in a study on level of physical activity, 40% of men included in the sample are physically active whereas only 30% of women qualified as active. How should one interpret this result?

49

Page 50: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Statistical testing-2• 1. The observed difference of 10% might be a

TRUE DIFFERENCE, which also exist in the total pop from which the sample was drawn

2. This difference might also be DUE to CHANCE; ie in reality there is no difference b/w men and women but that the sample of men just happened to differ from the sample of women –probably due to sample variation

3. The observed difference of 10% is due to defect in the study design (bias)-ie with an appropriate study design no such difference would have occurred

50

Page 51: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Statistical testing-3• Statistical tests estimate the likelihood that such a

result occur by chance

• If the likelihood or probability is less than 5% it implies that a true difference exist and the notion of chance occurrence is rejected

• This level of 5% is known as the alpha level while the actual likelihood or probability calculated is know as the P-value

• In statistical terms the assumption that in the total population no real difference exists between the groups is called the NULL HYPOTHESIS

51

Page 52: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Statistical testing-4 Once the alpha level has been set and the

statistical test applied to results the P-value is obtained

If the P-value is lower than the alpha value it implies that a true difference exists and the Null Hypothesis is rejected while the result is said to be statistically significant

If the P-value is higher than the alpha value the Null hypothesis is accepted and the result is taken as having occurred by chance and considered not significant

52

Page 53: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Statistical testing-5 If the Null hypothesis is rejected when it is

true ie no true difference exist ( P value > than alpha value) then a type I error is committed

If the Null hypothesis is accepted when a true difference exist (P-value < than alpha value) then a type II error is committed

53

Page 54: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Uses of Biostatistics in Medicine• Clinicians often have to evaluate and use new

information through out their practice lives.• The most important reasons for learning

biostatistics include the following:1. Assessing medical literature-evidence based

information is often made available in journals and clinicians must understanding biostatistics to be able to make sense of such information

2. Patient care- results of research work are often meant for patient care and clinicians want to know best diagnostic procedure, optimal care and how treatment regimens should be designed and implemented

54

Page 55: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Uses of Biostatistics in Medicine

3. Use of vital statistics-effective diagnosis and treatment of patients requires an understanding of how to make sense out of vital statistics which often results from the recording of vital events such as births and deaths

4. Deploying diagnostic procedures-knowing the appropriate diagnostic procedure to use in a given patient is essential for effective care. Clinicians should be conversant with the sensitivity, specificity, positive and negative predictive values of a procedure

55

Page 56: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

Uses of Biostatistics in Medicine

5. Assessing information on drugs and equipment- companies present information on their products in charts, graph and clinical studies and clinicians need to good knowledge of biostatistics to make sense out of such presentation and information

6. Understanding epidemiologic problems-disease prevalence, variation by seasons and by location, and relationship to risk factors constitute epidemiological parameters of utmost importance to the clinician in practice.

56

Page 57: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

57

Public health (Epidemiology, Nutrition etc) Clinical trials Population genetics Genomics analysis Ecology/Ecological forecasting Biological Sequence Analysis Systems biology for gene network inference

Applications of Biostatistics

Page 58: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

58

1. Bamgboye EA. A companion of Medical statistics. Ibipress & Publishing Company, Ibada Nigeria 1st Edition 2006: 1-16.

2. Dunn OJ. Basic statistics: A primer for the Biomedical Sciences. Johm Wiley and Sons Publishers 2nd Edition: 1-11.

3. Kolawole EB. Statistical methods. Bolabay Publications Lagos, Nigeria 1st Edition 2006: 1-12.

4. Taofeek I. Research methodology and dissertation writing for allied professionals. Cress Global Link Limited, Abuja 1st Edition 2006: 1-24

5. Park K. Park’s textbook of Preventive Medicine and Social Medicine. M/s Banarsidas Bhanot Publishers 2004 18th Edition: 608-615

References

Page 59: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

59

6. Dawnson B, Trapp R. Introduction to Medical Research in Basic and Clinical Biostatistics. Fourth Edition. McGraw-Hill Companies Inc: USA, 2004;p1-6

7. Prabhakara GN. Basics of Statistics in Biostatistics. JAYPEE:New Delhi; 2006; p11-16.

8. Dawnson B, Trapp R. Summarising Data and Presenting data in Tables and Graphs in Basic and Clinical Biostatistics. Fourth Edition. McGraw-Hill Companies Inc:USA, 2004;p23-60

References (contd)

Page 60: Intro biostat1&2

04/12/2023Dr Babatunde OA MBBS, PGCertDPMIS, MPH, FWACP

60

What doesn’t kill us makes us stronger

So see challenges as opportunities for personal growth

Thank you

A word for the wise…