Biostatistics 2015

12
Biostatistics ( 2015 ) by Woo Lendly Kitten Chui Shan Unit 1 : Introduction to Statistics Definition Statistics - A body of mathematical techniques or processes for gathering, organizing , analyzing and interpreting numerical data. - It is also a basic tool for measurement, evaluation, and research. Population - Refers to a group of aggregate people, objects or events. Parameters - Data obtained about a population Sample - Collection of some elements in a population Statistics or Estimate - Data about samples Purposes 1. Research relies on statistics to organize, summarize and interpret numerical data generated by researchers 2. Enables to assign precise and universally accepted quantitative values to the properties of objects, people and events 3. Provides measurement and evaluation of quantitative data Categories of Data Qualitative Data - Attributes or characteristics as labels Quantitative data - Numerical information Kind of Quantitative Data Discrete - Whole numbers Continuous - Fractions and Decimals Scales or Levels of Measurements ( Quantitative ) Nominal - Labeling or Catergorizing Ordinal - Ranking Interval - 0 has no absolute value Ratio - 0 means total absence of the property measured Classification of Statistics Descriptive of Statistics - Characteristics Inferential Statistics - Test hypothesis and can draw conclusion I. Measures to condense data Woo

description

biostat

Transcript of Biostatistics 2015

Page 1: Biostatistics 2015

Biostatistics ( 2015 ) by Woo Lendly Kitten Chui ShanUnit 1 : Introduction to StatisticsDefinitionStatistics

- A body of mathematical techniques or processes for gathering, organizing , analyzing and interpreting numerical data.

- It is also a basic tool for measurement, evaluation, and research.Population

- Refers to a group of aggregate people, objects or events.Parameters

- Data obtained about a populationSample

- Collection of some elements in a populationStatistics or Estimate

- Data about samplesPurposes

1. Research relies on statistics to organize, summarize and interpret numerical data generated by researchers

2. Enables to assign precise and universally accepted quantitative values to the properties of objects, people and events

3. Provides measurement and evaluation of quantitative dataCategories of Data Qualitative Data

- Attributes or characteristics as labelsQuantitative data

- Numerical informationKind of Quantitative DataDiscrete

- Whole numbersContinuous

- Fractions and Decimals Scales or Levels of Measurements ( Quantitative )Nominal

- Labeling or CatergorizingOrdinal

- RankingInterval

- 0 has no absolute valueRatio

- 0 means total absence of the property measured Classification of StatisticsDescriptive of Statistics

- CharacteristicsInferential Statistics

- Test hypothesis and can draw conclusion

I. Measures to condense data- Tabular and graphical data presentation

II. Measures of central tendency- Mean , Median , Mode

III. Measures of variability- Standard deviation , variance , range , coefficient of variation

IV. Measures of location- Percentile, Quartile, Decile

Parametric

Woo

Page 2: Biostatistics 2015

- Interval, ratioNon-Parametric

- Nominal , ordinal Measures of Condense DataData Presentation

A. Tabular method- Process of presenting data in a form of a table

Guidelines 1. Every table must be self-explanatory2. Title should be clear and descriptive. In general, the titles give information about what ,

where ,how and when the data were taken. It is placed at the top of the table3. Each characteristics may be summarized and compared separately by using % or any other

appropriate procedure4. Each column should be properly labeled

Parts of the table: ( Table # and Title ) – Table Heading E.g 1 . Uerm College of Nursing Environment for Sy 2014-2015

Year Male % Female % Total %

FirstSecondThirdFourthTotal

* Box head- What contains Stubs- Classification Body- Substance

B. Graphical Presentation - Presentation of data in the form of a graph or diagram Graph Description ExampleLine Relationship between 2 or

more sets of quantitiesFor time and trend

Annual population growth rate from 2005-2010

Bar Comparing data taken at a particular timeQualitative and discrete quantitative

Top 10 morbidity / mortality in Brgy. R,2010

Pie ChartCircle graphComponentBar

Qualitative variables( At least 5 variables)

Educational attainment of female population ages 25-60 in Brgy.R,2010

Pictograph Actual pictures represent values

Immunization( Pictured as syringe representing 1 syringes for 10 children immunized) of children ages 0-6 years in Regions I,II,III,2010

Histogram and Frequency Polygon

Continuous Data Age Distribution in Brgy. R,2010

Woo

Page 3: Biostatistics 2015

Unit 2: Measures of Central Tendency, Dispersion, and LocationMeasures of Central Tendency

1. Mean - The average of your data

(a) For ungrouped data

x=X1+X2+X 3…… …+ Xn+ Xn

n where X1+ X2+ X3…… … are a set of data and n is the quantity of data

(b) For grouped data

x=f 1 x1+ f 2 x2+…… …+f n X n

f 1+ f 2+… ..+f n

where X1+ X2+ X3…… … are the class marks and f 1 , f 2 , … …. , f n are corresponding frequencies.

2. Weighted Mean

Weighted Mean =x1 w1+x2 w2+x3 w3+…+ xn xn

w1+w2+w3+…+wn

Where w1+w2+w3+…+wn are the corresponding weights of the data.3. Median- Middle value of random variables

(a) For ungrouped data in ascending order(i) Odd number of data

- Median = Value of middle datum(ii) Even number of data

- Median = The mean of values of the two middle data(b) For grouped data in ascending order

Median is the value of the datum corresponding to a cumulative frequency of ( n2), where n is

the total number of data.4. Mode- Value that appears most often in a set of data

(a) For ungrouped data- The datum that occurs most frequently

(b) For grouped data- The modal class is the class that occurs most frequently

5. Effects on the central tendency with change in dataAdd a common constant k to all data New Mean= Original Mean + k

New Median = Original Median + kNew Mode = Original Mode + k

Multiply a common constant h to all data New Mean = Original Mean x hNew Median = Original Median x hNew Mode = Original Mode x h

Woo

Page 4: Biostatistics 2015

Insert ( Remove ) the largest datum Mean : Increase ( Decrease )Median : Increase/Unchanged Decrease/UnchangedMode: Unchanged ( if it is not the largest datum)

Insert ( Remove ) the smallest datum Mean: Decrease ( Increase )Median: Decrease/Unchanged Increase/UnchangedMode: Unchanged ( if it is not the smallest datum)

Measures of Dispersion 1. Range

(a) For ungrouped dataRange = Largest Datum-Smallest Datum

(b) For grouped dataRange = Highest Class boundaries – Lowest Class boundaries

2. Inter-quartile range ( IQR )IQR = Upper quartile ( Q3 ) – Lower quartile( Q1)

3. Standard deviation ( SD)(a) For ungrouped data

σ=√ √( x1−x¿)2+ ( x2−x )+…+( xn−x ) 2n

¿

(b) For grouped data

σ=√ √ f 1(x1−x¿)2+f 2 ( x2−x )+…+ f n ( xn−x ) 2f 1+¿ f 2+…+ f n

¿¿

4. Effects on the dispersion with change in dataAdd a common constant k to all data Range: Unchanged

IQR: UnchangedSD: Unchanged

Multiply a common constant h to all data New range = Original Range x hNew IQR = Original IQR x hNew SD = Original SD x h

Insert ( Remove ) the largest datum Range : Increase ( Decrease )IQR: UndeterminedSD: Undetermined

Insert ( Remove )the smallest datum Range : Increase ( Decrease )IQR: UndeterminedSD: Undetermined

Woo

Page 5: Biostatistics 2015

Insert a zero in the data set Range : UndeterminedIQR: UndeterminedSD: Undetermined

Measures of LocationPi = one of the 99 values of a variable which divides the distribution into 100 equal partsDi = one of the 99 values of a variable which divides the distribution into 10 equal partsQi = one of the 3 values of a variable which divides the distribution into 4 equal parts

Pi=I i+W i(

nx i100

−cf i

f i

)

where ,Pi = the percentile to be computed Ii = true lower limit or the lower boundary of the percentile classWi = width of the percentile classn = total number of observationscfi = cumulative frequency of the class before the percentile classfi = frequency of the percentile classPercentile class = the first class whose cumulative frequency is greater than or equal to ¿)Unit 3 : The Normal Distribution

1. Normal Curve- If the frequency curve of appears like a bell-shaped curve, it is a normal curve.- The corresponding frequency distribution is called a normal distribution:

2. Properties of a normal curve(a) Normal Curve is symmetrical about the mean(b) Mean, median and mode are the same (c)

Woo

Page 6: Biostatistics 2015

(i) About 68% of data lie between the range of x ¿(ii) About 95% of data lie between the range of x ¿(iii) About 99.7% of data lie between the range of x ¿

Statistical inference- Obtaining information from a sample of data about the population from which the sample is drawn

and setting up a model to describe this population- 2 types

(a) Parameter estimation - Point estimation ( e.g sample mean, median,variance and standard deviation- Interval estimate ( e.g confidence interval & upper and lower limit of the range of values )(b) Hypothesis Testing

Random selection- When a random sample is drawn from the population, every member has an equal chance of

being drawn - If there’s available sampling frame, a table of random numbers can be used in selecting

random sample dependent on the needed size- Nonprobability samples are very unlikely to represent the population under study

Standard ScoresThe standard score z of a given value x from a set of data with mean x and standard deviation σ is given

by : z= x−xσ

Note: The standard score can be used to compare one’s performance with the whole classE.g : The marks of Jack and Tony in an examination are 80 and 65 respectively. If the average marks and the standard deviation are 70 and 5 respectively, find the standard score of Jack and Tony.

Sol.: Jack’s standard score = 80−70

5= 2

Tony’s standard score = 65−70

5=−1

Application of the Normal DistributionThe following table shows Jack’s marks in 3 subjects.

Chinese English Mathematics

Jack’s mark 86 65 78

Mean 80 60 72

Standard Deviation 10 5 12

Sol.:

For Chinese, standard score = 86−80

10=0.6

For English, standard score = 65−60

5 = 1

Woo

Page 7: Biostatistics 2015

For Mathematics , standard score = 78−72

12= 0.5

∴ Since 1>0.6>0.5, Jack performed better in English

Measures of Skewness or Symmetry - When a variable is not symmetrical, it is skewed- Positive skewness means there is a pileup of cases to the left and the right tail of distribution

is too long- Negative skewness means there is a pileup of cases to the right and the left tail of the

distribution is too long.- Pearson’s skewness Coefficient

- Skewness = ( Mean – median )/SD- If the distribution is positively skewed, the mean will be more than the median , the

coefficient will be positive - In general , skewness values will fall between -1 to +1SD units.- If the resulting value of 0.06, it may indicate a minor, not severe skewness- If th resulting value of -0.23, this may indicate severe skewness

Measures of Kurtosis or Peakedness- It measures whether the bell shape is too flat or too peaked- If the kurtosis value is large positive number, the distribution is too peaked to be

normal( leotokurtic) - If too flat, the kurtosis value is negative- The curve is too flat to be normal( platykurtic)- If a distribution is markedly skewed, there is no particular need to examine kurtosis because

the distribution is not normal.Central Limit Theorem

- When many samples are drawn from a population, the means of these samples to tend to be normally distributed.

- If it is charted along a baseline it will follow a normal curve- The larger the sample, the more the distribution approximates the normal curve.- If the average (mean) is very close to the actual mean of the population- Standard error of the mean- new SD which is obtained from calculating the SD of the

distribution of means by treating each mean as a raw score and applying the regular formula- Error means that die to sampling error, each sample,each sample mean is likely to deviate

from the true population mean.- Formula for standard error of mean:

Standard deviationSquare root of n

- It indicates that the standard error is being estimated given the SD of a sample of size nE.g It is given that students’ test scores are normally distributed. The mean and the standard deviation are 70 and 5 respectively.Find the percentage of students that(a) the marks lie between 60 and 75(b)the marks are over 80

Sol.: (a) Since 60 = 70 - 2(5) =x−2 σ and 75 = 70 + 5 = x+σ ,

The required percentage = 34% + 47% = 81.5%

(b) Since 80 = 70 + 2(5) = x+2 σ , the required percentage = 50% - 47.5% = 2.5%

Unit 4 : Hypothesis Testing Null and Alternative Hypothesis

1. Null Hypothesis: It is the hypothesis of “ no difference”2. Alternative Hypothesis: It is the hypothesis that the investigator believes in

Types of Error

Woo

Page 8: Biostatistics 2015

Decision Based on Statistical test True SituationHo is True Ho is False

Accept Ho No error Type II errorReject Ho Type 1 error No error

Sensitivity, Specificity, Predictive Value , and EfficiencyDisease StateTest Result Number of Persons with

Disease No Disease

Positive True Positive False Positives

Negative False Negative True Negative

Sensitivity- the probability that a person who got the disease is considered positive of the test

Specificity- the probability that a person who is free of the disease is considered negative of the test

Positive Predictive value

- the probability that a person who tested positive has the disease

Negative Predictive Value

- the probability that a person who tested negative disease free

j

Woo

State the Null and alternative hypothesis

Set the level of significance

Identify the test statistics to be used

Steps in Hypothesis Testing

Page 9: Biostatistics 2015

One-Tailed and Two tailed TestOne tailed Test

- It’s a directional test with the region of rejection lying on either left or right tail of the normal curve(i) Right directional test:

- the region of rejection is on the right tail.- Used when the alternative hypothesis uses comparatives such as greater than, higher than ,better than, superior to, exceeds, etc

(ii) Left directional test- Use less than, smaller than, inferior to, lower than, below, etc

Two tailed Test- Non-directional test with the region of rejection lying on both tails.- It is used when the alternative hypothesis uses words such as not equal to,significantly,

different , etcUnit 5: Sample Size DeterminationFactors in the Determination of Sample

- Precision of the parameter being studied- Reliability of the estimate- Standard error of the estimates of the parameter

Estimating the average value for a continuous variable:

Woo

Conduct statistical decision

Make conclusion

Page 10: Biostatistics 2015

n=z2 S2a/2❑

e2

where - n = required sample size

- z2 ❑a /2❑ = confidence level

- S2 = Population variance- e=¿ Absolute precision deemed acceptable by the researcher

Woo