Estimation of authenticity of results of statistical research (part II)

73
Estimation of authenticity of results of statistical research (part II)

Transcript of Estimation of authenticity of results of statistical research (part II)

Page 1: Estimation of authenticity of results of statistical research (part II)

Estimation of authenticity of results of statistical

research (part II)

Page 2: Estimation of authenticity of results of statistical research (part II)

Biostatistics

Commonly the word statistics means the arranging of data into charts, tables, and graphs along with the computations of various descriptive numbers about the data. This is a part of statistics, called descriptive statistics, but it is not the most important part.

Page 3: Estimation of authenticity of results of statistical research (part II)
Page 4: Estimation of authenticity of results of statistical research (part II)

Why Do Statistics?Why Do Statistics? Extrapolate from data collected to make Extrapolate from data collected to make

general conclusions about larger population general conclusions about larger population from which data sample was derivedfrom which data sample was derived

Allows general conclusions to be made from Allows general conclusions to be made from limited amounts of datalimited amounts of data

To do this we must assume that all data is To do this we must assume that all data is randomly sampled from an infinitely large randomly sampled from an infinitely large population, then analyse this sample and usepopulation, then analyse this sample and use results to make inferences about the results to make inferences about the population population

Page 5: Estimation of authenticity of results of statistical research (part II)
Page 6: Estimation of authenticity of results of statistical research (part II)

The most important part

The most important part is concerned with reasoning in an environment where one doesn’t know, or can’t know, all of the facts needed to reach conclusions with complete certainty. One deals with judgments and decisions in situations of incomplete information. In this introduction we will give an overview of statistics along with an outline of the various topics in this course.

Page 7: Estimation of authenticity of results of statistical research (part II)

Basic criteria of authenticity (representation):

Error of representation (w) Confiding scopes The coefficient of authenticity (the student

criterion) is authenticity of difference of middle or relative sizes (t)

Page 8: Estimation of authenticity of results of statistical research (part II)
Page 9: Estimation of authenticity of results of statistical research (part II)

Basic criteria of authenticity (representation):

The errors of representation of /m/ are the degree of authenticity of average or relative value shows how much the results of selective research differ from results which it is possible to get from continuous study of general aggregate.

Page 10: Estimation of authenticity of results of statistical research (part II)

Basic criteria of authenticity (representation):

Confiding scopes – properties of selective aggregate are carried on general one, probability oscillation of index is shown in the general aggregate, its extreme values of minimum and maximal possibility, which the size of general aggregate can be within the limits of.

Page 11: Estimation of authenticity of results of statistical research (part II)

Basic criteria of authenticity (representation):

The coefficient of authenticity (the Student’s criterion) is authenticity of difference of middle or relative sizes (t). The student’s Criterion shows the difference of the proper indexes in two separate selective aggregates.

Page 12: Estimation of authenticity of results of statistical research (part II)

Measuring the Occurrence of Disease

Counting

Comparisons

Inference

Action

Cases and populations

Measurement

Risk

Methods - descriptive

- analytic

Association and causality

Generalisability

Clinical/health policy

Further research

Page 13: Estimation of authenticity of results of statistical research (part II)
Page 14: Estimation of authenticity of results of statistical research (part II)
Page 15: Estimation of authenticity of results of statistical research (part II)

Descriptive StatisticsDescriptive Statistics::

concerned with summarising or concerned with summarising or describing a sample eg. mean, mediandescribing a sample eg. mean, median

Inferential StatisticsInferential Statistics::

concerned with generalising from a concerned with generalising from a sample, to make estimates and sample, to make estimates and inferences about a wider population eg. inferences about a wider population eg. T-Test, Chi Square testT-Test, Chi Square test

Page 16: Estimation of authenticity of results of statistical research (part II)
Page 17: Estimation of authenticity of results of statistical research (part II)

Meaning of PMeaning of P P Value: the probability of observing a P Value: the probability of observing a

result as extreme or more extreme than result as extreme or more extreme than the one actually observed from chance the one actually observed from chance alonealone

Lets us decide whether to reject or Lets us decide whether to reject or accept the null hypothesisaccept the null hypothesis

• P > 0.05P > 0.05 Not significantNot significant• P = 0.01 to 0.05P = 0.01 to 0.05 SignificantSignificant• P = 0.001 to 0.01P = 0.001 to 0.01 Very significantVery significant• P < 0.001P < 0.001 Extremely significantExtremely significant

Page 18: Estimation of authenticity of results of statistical research (part II)
Page 19: Estimation of authenticity of results of statistical research (part II)

Epidemiological Measurements

Rates,Ratios,and Proportions Incidence RatesPrevalence Rates Mortality Rates Fatality Rates Infection Rates

Page 20: Estimation of authenticity of results of statistical research (part II)

RatiosA ratio expresses the relationship

between two numbers in the form x:y or x/y.

Page 21: Estimation of authenticity of results of statistical research (part II)

Ratios

1. The ratio of male to female births in the United States in 1979 was 1,791,000 : 1,703,000 or 1.052:1.

2. Sex ratio=

number of live born males

number of live born females

Page 22: Estimation of authenticity of results of statistical research (part II)

Proportions

A proportion is a specific type of ratio in which the numerator is included in the denominator, and the result value is expressed as a percentage.

For example,the proportion of all births that were male is :

Male births 179×104

= Male+female births (179+170)×104

=51.3%

Page 23: Estimation of authenticity of results of statistical research (part II)

The proportion of male students of the current class is %.

Page 24: Estimation of authenticity of results of statistical research (part II)
Page 25: Estimation of authenticity of results of statistical research (part II)

51. 41 51. 33 51. 07 48. 70

48. 39 48. 67 48. 93 51. 30

0

20

40

60

80

100

120

Han Hui Uygur Kazak

Mal e Femal e

n=41640 n=4736 n=6362 n=2770

Page 26: Estimation of authenticity of results of statistical research (part II)

Proportion of Overweight in children from 7-18 year old, Urumqi, 2003

Gi r l

0. 0

2. 0

4. 0

6. 0

8. 0

Obesi ty Overwei ght

Prev

alen

ce,%

Han Hui Uygur Kazak

Page 27: Estimation of authenticity of results of statistical research (part II)

A rate measures the occurrence of some particular events in a

population during a given time period.

Particular event: development of disease or the occurrence of death

Rates

Page 28: Estimation of authenticity of results of statistical research (part II)

Rates are defined as follows:

Number of events in a specified period

×K

Population at risk of these events in a specified period

K=100%, 1000‰ …

Page 29: Estimation of authenticity of results of statistical research (part II)

Five components of rateDenominator

is the population at risk of

total events

Place specification

at a given time

Time specification

Constant

multiplier K

Numerator is the number of

People,Episodes

Page 30: Estimation of authenticity of results of statistical research (part II)

Rate is

The rate is the measure that most clearly expresses probability or risk of disease in a defined population over a specified period of time.

In a rate numerator is part of denominator.

Page 31: Estimation of authenticity of results of statistical research (part II)

What does Rate tell us

Rates tell us how fast the disease is occurring in a population.

Proportion tell us what fraction of the population is affected.

Page 32: Estimation of authenticity of results of statistical research (part II)

For example, the death rate from cancer in the United States in 1980 was 186.3 per 100,000 population, the formula:

Deaths from cancer among U.S residents in 1980 100,000

×

U.S. population in 1980 100,000

Page 33: Estimation of authenticity of results of statistical research (part II)

Incidence Rates

Incidence is defined as the number of new cases of a disease that occur during a specified period of time in a population at risk for developing the disease.

Page 34: Estimation of authenticity of results of statistical research (part II)

1. Time of onset and the numerator

Page 35: Estimation of authenticity of results of statistical research (part II)

Denominator is population at risk.

Average Population

We can get this number in two ways. (population in 12.31 of last year+this year)/2

midyear population: 7.31 24:00

3.Specification of Denominator

Page 36: Estimation of authenticity of results of statistical research (part II)

Prevalence Rates

Prevalence measures the number of people in a population who have disease at a given time.

Point prevalence Period prevalence

Page 37: Estimation of authenticity of results of statistical research (part II)

Formula:

number of existing cases of a disease

at a point in time ×K

total population

Page 38: Estimation of authenticity of results of statistical research (part II)

5 points

1.NumeratorIt refers to existing cases, currently

affected, including new and old cases.No matter when did he get the disease, if

only he has disease at the study time,he is one of numerator.

Page 39: Estimation of authenticity of results of statistical research (part II)

2.Denominator

Total population.

Not population at risk.

Page 40: Estimation of authenticity of results of statistical research (part II)

3.A point in time

In survey of prevalence rate, time should be very short.

Generally, time should be no more than 1 month, such as 1 week or 2 weeks. (point prevalence)

Page 41: Estimation of authenticity of results of statistical research (part II)

Coefficient of variation is the relative measure of variety; it is a percent correlation of standard deviation and arithmetic average.

Page 42: Estimation of authenticity of results of statistical research (part II)

Terms Used To Describe The Quality Of Measurements

Reliability is variability between subjects divided by inter-subject variability plus measurement error.

Validity refers to the extent to which a test or surrogate is measuring what we think it is measuring.

Page 43: Estimation of authenticity of results of statistical research (part II)

Measures Of Diagnostic Test Accuracy

Sensitivity is defined as the ability of the test to identify correctly those who have the disease.

Specificity is defined as the ability of the test to identify correctly those who do not have the disease.

Predictive values are important for assessing how useful a test will be in the clinical setting at the individual patient level. The positive predictive value is the probability of disease in a patient with a positive test. Conversely, the negative predictive value is the probability that the patient does not have disease if he has a negative test result.

Likelihood ratio indicates how much a given diagnostic test result will raise or lower the odds of having a disease relative to the prior probability of disease.

Page 44: Estimation of authenticity of results of statistical research (part II)

Measures Of Diagnostic Test Accuracy

Page 45: Estimation of authenticity of results of statistical research (part II)

Expressions Used When Making Inferences About Data

Confidence Intervals- The results of any study sample are an estimate of the true value

in the entire population. The true value may actually be greater or less than what is observed.

Type I error (alpha) is the probability of incorrectly concluding there is a statistically significant difference in the population when none exists.

Type II error (beta) is the probability of incorrectly concluding that there is no statistically significant difference in a population when one exists.

Power is a measure of the ability of a study to detect a true difference.

Page 46: Estimation of authenticity of results of statistical research (part II)

Multivariable Regression Methods

Multiple linear regression is used when the outcome data is a continuous variable such as weight. For example, one could estimate the effect of a diet on weight after adjusting for the effect of confounders such as smoking status.

Logistic regression is used when the outcome data is binary such as cure or no cure. Logistic regression can be used to estimate the effect of an exposure on a binary outcome after adjusting for confounders.

Page 47: Estimation of authenticity of results of statistical research (part II)

Survival Analysis

Kaplan-Meier analysis measures the ratio of surviving subjects (or those without an event) divided by the total number of subjects at risk for the event. Every time a subject has an event, the ratio is recalculated. These ratios are then used to generate a curve to graphically depict the probability of survival.

Cox proportional hazards analysis is similar to the logistic regression method described above with the added advantage that it accounts for time to a binary event in the outcome variable. Thus, one can account for variation in follow-up time among subjects.

Page 48: Estimation of authenticity of results of statistical research (part II)

Kaplan-Meier Survival Curves

Page 49: Estimation of authenticity of results of statistical research (part II)

Why Use Statistics?

Cardiovascular Mortality in Males

0

0,2

0,4

0,6

0,8

1

1,2

'35-'44 '45-'54 '55-'64 '65-'74 '75-'84

SMR Bangor

Roseto

Page 50: Estimation of authenticity of results of statistical research (part II)

Descriptive Statistics

Identifies patterns in the data Identifies outliers Guides choice of statistical test

Page 51: Estimation of authenticity of results of statistical research (part II)

Percentage of Specimens Testing Positive for RSV (respiratory syncytial virus)

Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun

South 2 2 5 7 20 30 15 20 15 8 4 3

North-east

2 3 5 3 12 28 22 28 22 20 10 9

West 2 2 3 3 5 8 25 27 25 22 15 12

Mid-west

2 2 3 2 4 12 12 12 10 19 15 8

Page 52: Estimation of authenticity of results of statistical research (part II)

Descriptive Statistics

Percentage of Specimens Testing Postive for RSV 1998-99

05

101520253035

South

Northeast

West

Midwest

Page 53: Estimation of authenticity of results of statistical research (part II)

Distribution of Course Grades

0

2

4

6

8

10

12

14

Number of Students

A A- B+ B B- C+ C C- D+ D D- F

Grade

Page 54: Estimation of authenticity of results of statistical research (part II)

SAMPLING AND ESTIMATION

Let us take Louis Harris and Associates for an example

It conducts polls on various topics, either face-to-face, by telephone, or by the internet. In one survey on health trends of adult Americans conducted in 1991 they contacted 1;256 randomly selected adults by phone and asked them questions about diet, stress management, seat belt use, etc.

Page 55: Estimation of authenticity of results of statistical research (part II)

SAMPLING AND ESTIMATION

One of the questions asked was “Do you try hard to avoid too much fat in your diet?” They reported that 57% of the people responded YES to this question, which was a 2% increase from a similar survey conducted in 1983. The article stated that the margin of error of the study was plus or minus 3%.

Page 56: Estimation of authenticity of results of statistical research (part II)

This is an example of an inference made from incomplete information. The group under study in this survey is the collection of adult Americans, which consists of more than 200 million people. This is called the population.

SAMPLING AND ESTIMATION

Page 57: Estimation of authenticity of results of statistical research (part II)

The people or things in a population are called units. If the units are people, they are sometimes called subjects. A characteristic of a unit (such as a person’s weight, eye color, or the response to a Harris Poll question) is called a variable.

SAMPLING AND ESTIMATION

Page 58: Estimation of authenticity of results of statistical research (part II)

If a variable has only two possible values (such as a response to a YES or NO question, or a person’s sex) it is called a dichotomous variable. If a variable assigns one of several categories to each individual (such as person’s blood type or hair color) it is called a categorical variable. And if a variable assigns a number to each individual (such as a person’s age, family size, or weight), it is called a quantitative variable.

SAMPLING AND ESTIMATION

Page 59: Estimation of authenticity of results of statistical research (part II)

A number derived from a sample is called a statistic,

whereas a number derived from the population is called a parameter.

SAMPLING AND ESTIMATION

Page 60: Estimation of authenticity of results of statistical research (part II)

Parameters are is usually denoted by Greek letters, such as π, for population percentage of a dichotomous variable, or μ, for population mean of a quantitative variable. For the Harris study the sample percentage p = 57% is a statistic. It is not the (unknown) population percentage π, which is the percentage that we would obtain if it were possible to ask the same question of the entire population.

SAMPLING AND ESTIMATION

Page 61: Estimation of authenticity of results of statistical research (part II)

SAMPLING AND ESTIMATION

Page 62: Estimation of authenticity of results of statistical research (part II)

Inferences we make about a population based on facts derived from a sample are uncertain. The statistic p is not the same as the parameter π. In fact, if the study had been repeated, even if it had been done at about the same time and in the same way, it most likely would have produced a different value of p, whereas π would still be the same. The Harris study acknowledges this variability by mentioning a margin of error of ± 3%.

SAMPLING AND ESTIMATION

Page 63: Estimation of authenticity of results of statistical research (part II)

Consider a box containing chips or cards, each of which is numbered either 0 or 1. We want to take a sample from this box in order to estimate the percentage of the cards that are numbered with a 1. The population in this case is the box of cards, which we will call the population box. The percentage of cards in the box that are numbered with a 1 is the parameter π.

SIMULATION

Page 64: Estimation of authenticity of results of statistical research (part II)

In the Harris study the parameter π is unknown. Here, however, in order to see how samples behave, we will make our model with a known percentage of cards numbered with a 1, say π = 60%. At the same time we will estimate π, pretending that we don’t know its value, by examining 25 cards in the box.

SIMULATION

Page 65: Estimation of authenticity of results of statistical research (part II)

We take a simple random sample with replacement of 25 cards from the box as follows. Mix the box of cards; choose one at random; record it; replace it; and then repeat the procedure until we have recorded the numbers on 25 cards. Although survey samples are not generally drawn with replacement, our simulation simplifies the analysis because the box remains unchanged between draws; so, after examining each card, the chance of drawing a card numbered 1 on the following draw is the same as it was for the previous draw, in this case a 60% chance.

SIMULATION

Page 66: Estimation of authenticity of results of statistical research (part II)
Page 67: Estimation of authenticity of results of statistical research (part II)

Reducing Sample SizeReducing Sample Size Same results but using much smaller sample size (one tenth)Same results but using much smaller sample size (one tenth)

ALIVEALIVE DEAD TOTAL % DEAD DEAD TOTAL % DEAD

PLACEBO 58 (69.2%) 26 (30.8%) 84 (100%)PLACEBO 58 (69.2%) 26 (30.8%) 84 (100%) 30.8 30.8

DEADDEAD 64 (75.3%) 64 (75.3%) 21 (24.7%) 85 (100%) 21 (24.7%) 85 (100%) 24.7 24.7

TOTALTOTAL 122 (72.2%) 122 (72.2%) 47 (27.8%) 169 (100%) 47 (27.8%) 169 (100%)

Reduction in death rate = 6.1% (still the same)Reduction in death rate = 6.1% (still the same) Perform Chi Square test Perform Chi Square test P = 0.39 P = 0.39 39 in 100 times this difference in mortality could have 39 in 100 times this difference in mortality could have happened by chance therefore results not significant happened by chance therefore results not significant

Again, power of a study to find a difference depends a lot Again, power of a study to find a difference depends a lot on sample size for binary data as well as continuous data on sample size for binary data as well as continuous data

Page 68: Estimation of authenticity of results of statistical research (part II)
Page 69: Estimation of authenticity of results of statistical research (part II)
Page 70: Estimation of authenticity of results of statistical research (part II)

On repetition of such an experiment one will typically obtain a different measurement or observation. So, if the Harris poll were to be repeated, the new statistic would very likely differ slightly from 57%. Each repetition is called an execution or trial of the experiment.

ERROR ANALYSIS

Page 71: Estimation of authenticity of results of statistical research (part II)

The RMS is a more conservative measure of the typical size of the

random sampling errors in the sense that MA ≤ RMS.

ERROR ANALYSIS

Page 72: Estimation of authenticity of results of statistical research (part II)

For a given experiment the RMS of all possible random sampling errors is called the standard error (SE). For example, whenever we use a random sample of size n and its percentages p to estimate the population percentage π, we have

ERROR ANALYSIS

Page 73: Estimation of authenticity of results of statistical research (part II)

SummarySummary

Size matters=BIGGER IS BETTERSize matters=BIGGER IS BETTER Spread matters=SMALLER IS BETTERSpread matters=SMALLER IS BETTER Bigger difference=EASIER TO FINDBigger difference=EASIER TO FIND Smaller difference=MORE DIFFICULT TO Smaller difference=MORE DIFFICULT TO

FINDFIND To find a small difference you need a big To find a small difference you need a big

studystudy