Bio Statistics Basics

110
Biostatistics - I Presented by : Kush Pathak

Transcript of Bio Statistics Basics

Page 1: Bio Statistics Basics

Biostatistics - I

Presented by :

Kush Pathak

Page 2: Bio Statistics Basics

Contents

• Introduction• History• Applications and uses of biostatistics in science• Common statistical terms• Common symbols used• Data -

(a) Collection and types

(b) Presentation

(c) Analysis

(d) Interpretation• Limitations• Conclusion • References

Page 3: Bio Statistics Basics

Introduction

• There are three kinds of lies: lies, damn lies, and statistics. (Benjamin Disraeli / Mark Twain).

• The word statistics conveys a variety of meaning to people. It is known for handling data in general and in field of research. The word “statistics” comes from Italian word ‘statista’ meaning ‘statesman’ or the German word “statistik”, each of which means political state.

• It comes from two main sources, that are (1) Government records (2) Mathematics

• John Graunt (1620 - 1674) was the father of health statistics.

Page 4: Bio Statistics Basics

Definitions

Statistics : Science of collecting, summarizing, presentation, analysis and interpretation of data is called statistics.

Biostatistics : Method of collecting, organizing, analyzing, tabulating and interpreting the data, related to living organisms and human

beings is called biostatistics.

[Soben Peter. Essentials of preventive and community

dentistry, 2nd edition. New Delhi : Arya; 2006. 824]

Page 5: Bio Statistics Basics

HISTORY

Page 6: Bio Statistics Basics

Father of Health Statistics

1620 - 1674

Page 7: Bio Statistics Basics

THE HISTORY OF STATISTICS HAS IT’S ROOTS IN BIOLOGY

Page 8: Bio Statistics Basics

Sir Francis Galton

Inventor of fingerprints,

Study of heredity of quantitative traits

Regression & correlation

Page 9: Bio Statistics Basics

Karl Pearson

Polymath

-Studied genetics

-Correlation coefficient

-c2 test

-Standard deviation

Page 10: Bio Statistics Basics

Sir Ronald Fisher

The Genetical Theory of

Natural Selection

Founder of population genetics.

Analysis of variance

Likelihood.

P-value

Page 11: Bio Statistics Basics

APPLICATIONS AND USES OF BIOSTATISTICS IN SCIENCE

• IN PHYSIOLOGY AND ANATOMY :

– To define the limits of normality for variables such as height, weight, Blood Pressure etc. in a population.

– Variation more than natural limits may be pathological i.e. abnormal due to play of certain external factors.

– To find correlation between two variables like height and weight.

Page 12: Bio Statistics Basics

• IN PHARMACOLOGY

– To find the action of drugs

– To compare the action of two drugs or two successive dosages of same drug

– To find the relative potency of a new drug with respect to a standard drug

Page 13: Bio Statistics Basics

• IN MEDICINE

– To compare the efficiency of a particular drug, operation or line of treatment

– To find association between two attributes such as cancer and smoking

– To identify signs and symptoms of disease

Page 14: Bio Statistics Basics

• IN COMMUNITY MEDICINE AND PUBLIC HEALTH

– To test usefulness of vaccine in the field

– In epidemiologic studies the role of causative factors is statistically tested

Page 15: Bio Statistics Basics

• FOR STUDENTS :

– By learning the methods in biostatistics a student learns to evaluate articles published in medical and dental journals or papers read in medical and dental conferences.

– He also understands the basic methods of observation in his clinical practice and research.

Page 16: Bio Statistics Basics

Common Statistical terms

• VARIABLES :

Characteristic that takes different values for different persons, place or things.

A quantity that varies between limits i.e. height, weight, blood pressure, age etc.

Denoted as X and for orderly series as X1, X2, X3…..Xn

Sigma stands for summation of results or observations.

Page 17: Bio Statistics Basics

• CONSTANT :

Quantities that do not vary such as π = 3.141, e = 2.718

These do not require statistical study.

e.g. in biostatistics, mean, standard deviation are considered constant for a population.

Page 18: Bio Statistics Basics

• OBSERVATION :

An event and it’s measurements, such as B.P and 120 mm of Hg

• OBSERVATIONAL UNIT :

Source that gives observations, such as object or person etc.

In medical stats, term individuals or subject, is used more often.

.

Page 19: Bio Statistics Basics

• DATA:

Set of values recorded on one or more observational units.

• POPULATION :

Population includes all persons, events and objects under study.

It may be finite or infinite.

Page 20: Bio Statistics Basics

• SAMPLE :

Defined as a part of a population generally selected so as to be representative of the population whose variables are under study.

• PARAMETER

It is a constant that describes a population e.g. in a college there are 40% girls. This describes the population, hence it is a parameter.

Page 21: Bio Statistics Basics

• STATISTIC

Statistic is a constant that describes the sample e.g. out of 200 students of the same college 45% girls. This 45% will be statistic as it describes the sample

 • ATTRIBUTE

A characteristic based on which the population can be described into categories or classes e.g. gender, caste, religion.

Page 22: Bio Statistics Basics

Commonly used symbols

= Equal to

< Greater than

> Lesser than

Z No. of standard deviations

% Percentage

r Pearson’s correlation coefficient

ρ Spearman’s rank correlation coefficient

Page 23: Bio Statistics Basics

d.f. or f Degree of freedom

K Number of groups or classes

P Probability

O Observed number

E Expected number

Page 24: Bio Statistics Basics

DATA

Set of values recorded on one or more observational units is called data.

It is of two types :

QUALITATIVE (discrete) data

QUANTITIVE (continuous) data

Page 25: Bio Statistics Basics

Collection of health information

A. Census :

United nations define census as “ the total process of collecting, compiling and publishing demographic, economic and social data pertaining at the specified time or times, to all persons in a country or delimited territory”

It is an important source of health information.

First regular census in India was taken in 1881, and others took place at 10 year intervals.

Primary function of census is to provide demographic information such as total count of population and it’s breakdown into groups and sub groups such as age and sex distribution.

Page 26: Bio Statistics Basics

Population census provides basic data (by age and sex) needed to compute vital statistical rates, and other health, demographic and socio economic indicators.

B. Registration of vital events :

United nations define a vital event registration system as including “ legal registration, statistical recording and reporting of the occurrence of, and the collection, compilation, presentation, analysis and distribution of statistics pertaining to vital events i.e. live births, deaths, fetal deaths, marriages, divorces, adoption, legitimations, recognitions, annulments and legal separations.”

It keeps a continuous check on demographic changes.

Page 27: Bio Statistics Basics

In 1873, the Govt. of India had passed the Births, Deaths and Marriages Registration Act. But still the registration system in India tended to be very unreliable, the data being grossly deficient in regard to accuracy, timelines, completeness and coverage.

Due to this other actions were taken :

o The Central Births and Deaths Registration act, 1969 :

Central Births and Deaths Registration Act was promulgated in 1969, which came into force on 1st April 1970.

The time limiting of registering the events of births is 14 days and that of deaths is 7 days. In case of any default, a fine of Rs. 50 was imposed.

Page 28: Bio Statistics Basics

o Lay Reporting :

It is defined as, “Collection of information, it’s use and transmission to other levels of health system by non professional health workers.”

Some countries have attempted to employ first line health workers(e.g. village health guides) to record births and deaths in a community.

Page 29: Bio Statistics Basics

C. Sample Registration system (SRS) :

It’s a dual record system consisting of continuous enumeration of births and deaths by an enumerator and an independent survey every 6 months by an investigator- supervisor.

It was initiated in the mid 1960s to provide reliable estimates of birth and death rates at the national and state levels.

It is a major source of health information.

Page 30: Bio Statistics Basics

D. Notification of diseases :

It’s primary purpose is to effect prevention and/or control of the diseases.

Also a valuable source of morbidity data.

Diseases which are considered to be serious menaces to public health are included in the list of notifiable diseases.

Limitations : (a) covers only a small part of total sickness in the community (b) System suffers from a good deal of under reporting (c) Many cases specially, atypical and subclinical cases escape notification due to non – recognition.

Page 31: Bio Statistics Basics

E. Hospital records :

They constitute a basic and primary source of information about diseases prevalent in the community.

Drawbacks : (a) Provide info. On only those patients who seek medical care. (b) Admission policy may vary from hospital to hospital. (c) Population served by a hospital cannot be defined.

F. Disease Registers :

Provides a permanent record of diseases and morbidity caused due to them.

If reporting system is effective and the coverage is on a national basis, register can provide useful data on morbidity and disease specific mortality.

Page 32: Bio Statistics Basics

G. Record Linkage :

Used to describe the process of bringing together, records relating to one individual and the records originating in different times or places.

Medical record linkage implies the assembly and maintenance for each individual in a population, of a file of the more important records relating to his health.

Problem : Volume of data accumulated. Therefore, in practice, records linkage has been applied only on a limited scale. E.g. twin studies, measurement of morbidity, chronic diseases. Etc.

Page 33: Bio Statistics Basics

H. Environmental health data :

These statistics now provide data on various aspects of air, water and noise pollution; harmful food additives; industrial intoxicants etc.

I. Health manpower statistics :

Relates to physicians, dentists, pharmacists, veterinarians, nurses, technicians etc.

Their records are maintained by state medical/ dental/ nursing counsils and directorates of medial education.

Page 34: Bio Statistics Basics

J. Population surveys :

Carried out for epidemiological studies by trained teams to find incidence or prevalence of health or disease in a community.

Provide useful info on : • Changing trends in health status.• Timely warning of public health hazards.• Feedback expected to modify policy and system.

Health surveys can be classified as :

(a) Health interview (face to face) survey

(b) health examination survey (c) health records surveys (d) Mailed questionnaire survey

Page 35: Bio Statistics Basics

K. Non- quantifiable information :

Health planners also need non quantifiable info. E.g. health policies, health legislations, public attitudes, programme costs, procedures and technologies.

Page 36: Bio Statistics Basics

Types of Data

• Qualitative or discrete data :

When the data is collected on the basis of attributes or qualities like sex, malocclusion and cavities etc., it is called as qualitative data.

The number of person having the same attribute are variable and are measured.

Page 37: Bio Statistics Basics

for e.g. – Out of 100 people, 75 have diabetes, 15 have T.B and 10 have Anemia.

Then diabetes, T.B and Anemia are attributes which can not be measured in figures. Only number of people having it can be determined.

Page 38: Bio Statistics Basics

• Quantitative or continuous data :

When the data is collected through measurement using calipers, etc. it is called quantitative data.

In such classification there are two variables : - Characteristic – such as height

Frequency – i.e. number of persons with same characteristic and in same range

Page 39: Bio Statistics Basics

• e.g. Height of one person is 150 cm and other is 160 cm and both are of

same age and sex.

Persons with 150 cms or in range of 150 – 152 cm may be 10 and that of 160 cm or in range of 160 – 162 cm may be 20.

Thus we find out characteristic and frequency. Both vary from person to person as well as group to group.

Page 40: Bio Statistics Basics

Presentation

Tabulation

Drawings

Page 41: Bio Statistics Basics

Tabulation : • Is the most common method

• Data presentation is in the form of columns and rows

• It can be of the following types– Simple tables – Frequency distribution tables

Page 42: Bio Statistics Basics

• Simple tables :Month and Year Number of biopsies performed in Oral

Pathology department

January 2010 15

June 2010 21

December 2010 26

Page 43: Bio Statistics Basics

• Frequency Distribution tables :

In a frequency distribution table, the data is first split into convenient groups ( class interval ) and the number of items ( frequency ) which occurs in each group is shown in adjacent column

Page 44: Bio Statistics Basics

Year and month

No. of biopsies sent from different departments to Oral Pathology department.

Oral surgery

Oral Medicine

Cons and Endo

Pediatric Dept.

Perio. Private Clinics

January 2010

6 2 3 1 1 2

June 2010

11 NIL 2 2 2 4

Dec 2010 19 NIL 1 2 1 3

Page 45: Bio Statistics Basics

• Charts and Drawings :

Useful method of presenting statistical data

Powerful impact on imagination of the people

Page 46: Bio Statistics Basics

Presentation of quantitative data is done through graphs. They are : Histograms Frequency Polygons Frequency curve Line chart or graph Cumulative frequency diagram Scatter or dot diagram

Page 47: Bio Statistics Basics

• Presentation of qualitative data is done through diagrams. They are : Bar Pie or sector Pictogram or picture diagram Map diagram or spot map

Page 48: Bio Statistics Basics

Histograms

Pictorial presentation of frequency distribution.

Consists of series of rectangles.

Class interval given on vertical axis

Area of rectangle is proportional to the frequency

Page 49: Bio Statistics Basics

Jan/

10

Feb/1

0

Mar

/10

Apr/1

0

May

/10

Jun/

10Ju

l/10

Aug/1

0

Sep/1

0

Oct/10

Nov/1

0

Dec/1

00

2

4

6

8

10

12

14

16

18

20

O.SO.MConsPedoPerioPrivate

Page 50: Bio Statistics Basics

Frequency Polygon

Obtained by joining midpoints of histogram blocks at the height of frequency by straight lines usually forming a polygon.

Page 51: Bio Statistics Basics
Page 52: Bio Statistics Basics

Frequency curve :

When number of observations is very large and class interval is reduced the frequency polygon looses its angulations becoming a smooth curve known as frequency curve.

Page 53: Bio Statistics Basics
Page 54: Bio Statistics Basics

Line Chart

Line diagram are used to show the trends of events with the passage of time.

Jan/

10

Feb/1

0

Mar

/10

Apr/1

0

May

/10

Jun/

10Ju

l/10

Aug/1

0

Sep/1

0

Oct/10

Nov/1

0

Dec/1

002468

101214161820

O.SO.MCONSPEDOPERIOPRIVATE

Page 55: Bio Statistics Basics

Cumulative frequency diagram

Graphical representation of cumulative frequency .

It is obtained by adding the frequency of previous class .

Page 56: Bio Statistics Basics
Page 57: Bio Statistics Basics

Scatter or Dot diagram

Shows relationship between two variables.

If the dots are clustered showing a straight line, it shows a relationship of linear nature.

 

3.5 4 4.5 5 5.5 6 6.5 7012345678

Y-Values

Y-Values

Page 58: Bio Statistics Basics

Bar Chart

• Length of bars drawn vertical or horizontal is proportional to frequency of variable.

• Suitable scale is chosen.

• Bars are usually equally spaced.

• They are of three types :

-Simple bar chart

-Multiple bar chart

-Component bar chart

Page 59: Bio Statistics Basics

Simple bar chart

Jan/10Feb/10Mar/10Apr/10

May/10Jun/10Jul/10

Aug/10Sep/10Oct/10

Nov/10Dec/10

0 5 10 15 20 25 30

biopsies

biopsies

Page 60: Bio Statistics Basics

Multiple bar chart :

Two or more variables are grouped together

Jan/10Feb/10Mar/10Apr/10

May/10Jun/10Jul/10

Aug/10Sep/10Oct/10

Nov/10Dec/10

0 2 4 6 8 10 12 14 16 18 20

PRIVATEPERIOPEDOCONSO.MO.S

Page 61: Bio Statistics Basics

Component bar chart :

Bars are divided into two or more parts.

Each part representing certain item and proportional to magnitude of that item.

Page 62: Bio Statistics Basics

Jan/10

Feb/10

Mar/10

Apr/10

May/10

Jun/10

Jul/10

Aug/10

Sep/10

Oct/10

Nov/10

Dec/10

0 5 10 15 20 25 30

O.SO.MCONSPEDOPERIOPRIVATE

Page 63: Bio Statistics Basics

Pie chart

• In this frequencies of the group are shown as segment of circle.

• Degree of angle denotes the frequency.

• Angle is calculated by class frequency x 360

total observations

Biopsies

01/01/201006/01/2010

Page 64: Bio Statistics Basics

Pictogram

• Popular method of presenting data to the common man.

Page 65: Bio Statistics Basics
Page 66: Bio Statistics Basics

Spot map or Map diagram

• These maps are prepared to show geographic distribution of frequencies of characteristics.

Page 67: Bio Statistics Basics

1

Page 68: Bio Statistics Basics

Analysis

• Average value in a distribution is the one central value around which all the other observations are concentrated.

• Average value helps : To find most characteristic value of a set of measurements.

To find which group is better off by comparing the average of one group with that of another.

[K.park. Preventive and social medicine, 20th edition:

McGraw-Hill Medical; 2009. 749]

Page 69: Bio Statistics Basics

• Most commonly used averages are Mean Median Mode

Page 70: Bio Statistics Basics

Mean

• Refers to arithmetic mean.

• Individual observations are first added together, and then divided by the number of observations.

• Addition of the observations is called ‘summation’ and is denoted by ∑ or S.

• Individual observations are denoted by ƞ and the mean is denoted by xZ ( ‘X’ bar).

 

Page 71: Bio Statistics Basics

• xZ = x1 + X2 + X3 …. Xƞ / ƞ

• eg. The diastolic blood pressure of 10 individuals was 83, 75, 81, 79, 71, 95, 75, 77, 84, 90. The total was 810, which was then divided by 10, resulting into 81.0

• Advantages – It is easy to calculate.

• Disadvantages – Influenced by extreme values.

Page 72: Bio Statistics Basics

Median

• When all the observation are arranged either in ascending order or descending order, the middle observation is known as median.

• In case of even number the average of the two middle values is taken.

• Median is better indicator of central value as it is not affected by the extreme values.

Page 73: Bio Statistics Basics

Diastolic Blood Pressure (unarranged)

83

75

81

79

71

95

75

77

84

Diastolic Blood Pressure (arranged)

71

75

75

77

79 (median)

81

83

84

95

Page 74: Bio Statistics Basics

Diastolic Blood Pressure (unarranged)

83

75

81

79

71

95

75

77

84

90

Diastolic Blood Pressure (arranged)

71

75

75

77

7981

83

84

90

95

79 +81/2 =80

In case there are 10 values instead of 9

Page 75: Bio Statistics Basics

Mode

• Most frequently used observation or most ‘fashionable’ value in a series of observation, is called mode.

• E.g. diastolic blood pressure of 20 individuals is 85, 75, 81, 79, 71, 95, 75, 77, 75, 90, 71, 75, 79, 95, 75, 77, 84, 75, 81, 75.

• Here the most frequently occurring value is 75.

Page 76: Bio Statistics Basics

• Advantages :

It is easy to understand. Not affected by extreme items.

• Disadvantages : Exact location is often uncertain and not clearly defined.

[Therefore, mode is not often used in biological or medical statistics.]

Page 77: Bio Statistics Basics

Interpretation

• Test of Significance :

• Whatever be the sampling procedure or the care taken while selecting sample, the sample statistics will differ from the population parameters.

• Variations between 2 samples drawn from the same population may also occur.

• But differences in the results between two research workers for the same investigation may be observed.

Page 78: Bio Statistics Basics

• So, it becomes important to find out the significance of this observed variation• • i.e. whether it is due to

– chance or biological variation (statistically not significant) OR – due to influence of some external factors ( statistically significant)

• To test whether the variation observed is of significance, various tests of significance are done.

• Tests of significance can be broadly classified as

Parametric tests

Non parametric tests

Page 79: Bio Statistics Basics

Parametric Tests

• Parametric tests are those tests in which certain assumptions are made about the population :

Population from which sample is drawn has normal distribution.

The variances of sample do not differ significantly.

The observations found are truly numerical thus arithmetic procedure such as addition, division, and multiplication can be used.

• Since these test make assumptions about the population parameters, they are called parametric tests .

Page 80: Bio Statistics Basics

• These are usually used to test the difference.

• They are:– Student T test( paired or unpaired)– ANOVA

Page 81: Bio Statistics Basics

ANOVA

Analysis of variance • Investigations may not always be confined to comparison of 2 samples

only • e.g. we might like to compare the difference in vertical dimension

obtained using 2 or more methods like phonetics, swallowing.• In such cases where more than 2 samples are used ANOVA can be

used• Also when measurements are influenced by several factors playing

there role e.g. factors affecting retention of a denture, ANOVA can be used.

• ANOVA helps to decide which factors are more important

Page 82: Bio Statistics Basics

• Requirements

– Data for each group are assumed to be independent and normally distributed

– Sampling should be at random

• One way ANOVA :

– -Where only one factor will effect the result between 2 groups

Page 83: Bio Statistics Basics

• Two way ANOVA

– Where we have 2 factors that affect the result or outcome.

• Multi way ANOVA

-Three or more factors affect the result or outcomes between groups

-

 

Page 84: Bio Statistics Basics

Student t test

• It was given by WS Gossett whose pen name was student .

• There are two types of student t Test.

1. Unpaired t test

2. Paired t test

Page 85: Bio Statistics Basics

Unpaired t test

• Applied to unpaired data of observation made on individuals of 2 separate groups to find the significance of difference between 2 means.

• Sample size is less than 30.

• e.g. difference in accuracy in an impression using two different impression materials

• Steps in unpaired t Test are :

Calculate the mean of two samples.

Calculate combined standard deviation

Page 86: Bio Statistics Basics

Calculate the standard error of mean which is given by

SEM = SD √1/n1 + 1/n2.

Calculate observed difference between means X1 – X2

Calculate t value = observed difference / Standard error of mean

Determine the degree of freedom which is one less than no of observation in a sample (n -1)

Here combined degree of freedom will be = (n1 – 1) + (n2 – 1)

Page 87: Bio Statistics Basics

• Refer to table and find the probability of the t value corresponding to degree of freedom

• P< 0.05 states difference is significant

• P> 0.05 states difference is not significant

Page 88: Bio Statistics Basics

Paired t test

• It is applied to paired data of observation from one sample only.

• Used in sample less than 30

• The individual gives a pair of observation i.e. observation before and after taking a drug

• The steps involved are :

Calculate the difference in paired observation i.e. before and after = x1 – x2 = y

Calculate the mean of this difference = y

Page 89: Bio Statistics Basics

• Calculate SD

• Calculate SE = SD / √ n

• Determine t = y / SE

• Determine the degree of freedom.

• Since there is one sample df = n-1

• Refer to table and find the probability of the t value corresponding to degree of freedom

• P< 0.05 states difference is significant

• P> 0.05 states difference is not significant

Page 90: Bio Statistics Basics

Non Parametric tests

• In many biological investigation the research worker may not know the nature of distribution or other required values of the population.

• Also some biological measurements may not be true numerical values hence

arithmetic procedures are not possible in such cases.

• In such cases distribution free or non parametric tests are used in which no assumption are made about the population parameters e.g.– Mann Whitney test – Chi square test – Phi coefficient test – Fischer’s Exact test– Sign Test– Freidman's Test

Page 91: Bio Statistics Basics

Chi square test

• Chi square test unlike z and t test is a non parametric test.

• The test involves calculation of a quantity called chi square .

• Chi square is denoted by X2

• It was developed by Karl Pearson

• The most important application of chi square test in medical statistics are – Test of proportion – Test of association – Test of goodness of fit

Page 92: Bio Statistics Basics

• Test of proportion – Used as an alternate test to find the significance of difference in 2

or more than 2 proportions

• Test of association – To measure the probability of association between 2 discreet

attributes e.g smoking and cancer

• Test of goodness of fit – Tests whether the observed values of a character differ from the

expected value by chance or due to play of some external factor

Page 93: Bio Statistics Basics

Stages in performing Tests of Significance

• State the null hypothesis

• State the alternative hypothesis

• Accept or reject the null hypothesis

• Finally determine the p value

Page 94: Bio Statistics Basics

State the null hypothesis

• State the null hypothesis :

Null Hypothesis, is a hypothesis of no difference between statistics of a sample and parameter of the population or between statistics of two samples.

It nullifies the claim that the experimental result is different from or better than the one observed already

Page 95: Bio Statistics Basics

State the alternative hypothesis

• State the alternative hypothesis :

It states, that the sample result is different i.e. larger or smaller than the value of population or statistics of one sample is different from the other.

• Accept or reject the null hypothesis :

Null Hypothesis is accepted or rejected depending on whether the result falls in zone of acceptance or zone of rejection.

If the result of a sample falls in the area of mean ± 2SE the null hypothesis is accepted.

Page 96: Bio Statistics Basics

This area of normal curve is called zone of acceptance for null hypothesis.

If the result of sample falls beyond the area of mean ± 2 SE.

Null hypothesis of no difference is rejected and alternate hypothesis accepted.

This area of normal curve is called zone of rejection for null hypothesis

• Finally determining the P value :

P value is determined using any of the previously mentioned methods.

If p> 0.05, the difference is due to chance and is not statistically different but if p < 0.05 the difference is due to some external factor and statistically significant.

Page 97: Bio Statistics Basics

Probability or p value

Concept of probability is very important in statistics.

Probability is the chance of occurrence of any event or permutation combination.

It is denoted by p for sample and P for population.

In various tests of significance we are often interested to know whether the observed difference between 2 samples is by chance or due to sampling variation.

At this time, probability or p value is used to find out the difference.

Page 98: Bio Statistics Basics

P ranges from 0 to 1

0 = there is no chance that the observed difference could not be due to sampling variation

1 = it is absolutely certain that observed difference between 2 samples is due to sampling variation

However such extreme values are rare.

P = 0.4 i.e. chances that the difference is due to sampling variation is 4 in 10

Page 99: Bio Statistics Basics

Obviously the chances that it is not due to sampling variation will be 2 in 10.

The essence of any test of significance is to find out p value and draw inference.

If p value is 0.05 or more It is customary to accept that difference is due to chance (sampling

variation) . The observed difference is said to be statistically not significant.

Page 100: Bio Statistics Basics

If p value is less than 0.05

Observed difference is not due chance but due to role of some external factors.

The observed difference here is said to be statistically significant.

Page 101: Bio Statistics Basics

Sampling

• When a large proportion of individuals are to be studied, it is impossible to include each and every member, as it will be time consuming, costly, laborious. So, sampling is done.

• Sampling is a process by which some unit of a population are selected for the study and by subjecting it to statistical computation, conclusions are drawn about the population from which these units are drawn.

Page 102: Bio Statistics Basics

• The sample taken will be a representative of entire population.

• It is sufficiently large.

• It is unbiased.

• Such sample will have its statistics almost equal to parameters of entire population.

Page 103: Bio Statistics Basics

• Two main characteristics of a representative sample are :

Precision

Unbiased character

Page 104: Bio Statistics Basics

Precision

• Precision depends on a sample size.

• Ordinarily sample size should not be less than 30.

• Precision = √n/s

• n = sample size , s = standard deviation

Page 105: Bio Statistics Basics

• Precision is directly proportional to square root of sample size. Greater the sample size greater the precision.

• Thus, to obtain precision, sample size needs to be increased

Page 106: Bio Statistics Basics

Unbiased character

• The sample should be unbiased i.e. every individual should have an equal chance to be selected in the sample.

• Thus a standard random sampling method should be used.

• Non sampling errors can be taken care of by Using standardized instruments and criteria. By single, double, triple blind trials Use of a control group

Page 107: Bio Statistics Basics

Limitations

• Statistics has several limitations :

• It gives statistical and not substantive answers.

• The statistical conclusion refers to groups and not individuals.

• It only summarizes but does not interpret data.

 • Statistics can be misused by selective presentation of desired results.

• Computation is not an end in itself. It is a tool that can be used well or can be misused.

 

Page 108: Bio Statistics Basics

• A human must have a clear idea of what is required of the computer and must instruct it accordingly.

• The human must also be able to intelligently interpret the output from the computer. 

•  All who tinker with computers must remember the adage ‘rubbish in/rubbish out’.

Page 109: Bio Statistics Basics

Conclusion

• Health information systems are the best means of getting reliable, relevant, up to date, adequate and reasonably complete information for health managers at all levels.

• Although, being a very helpful source for collection of data, it has been very difficult to get information where it matters most i.e. at community level.

• So, actions should be taken in this direction and this system should be used more frequently for better and clear results, mainly in cases of researches involving large masses.

Page 110: Bio Statistics Basics

References

• K.park. Preventive and social medicine, 20th edition : Mc Graw – Hill Medical ; 2009 .743 – 756

• Soben Peter. Essentials of preventive and community dentistry, 2nd edition. New Delhi : Arya; 2006. 21 – 50

• B.k.Mahajan. Methods in Biostatistics for medical students and research workers, 6th edition. New Delhi : Jaypee brothers ; 2006. 1- 39