Umeapresjr

Post on 29-May-2015

106 views 2 download

Tags:

Transcript of Umeapresjr

What is quantified in quantitative research, and how do you publish the findings?

Jonas Ranstam PhD

Question: What is science?

Answer: Science is1 generalizable knowledge.

generalizable = reproducible and predictive

1. US National Science Foundation

One major problem:

Sampling uncertainty

Plan1. Medical research and uncertainty

2. The consequence of study design

3. Publishing uncertain results

1. Medical research and uncertainty

Anecdotal evidence, case reports)

Cohort study of smoking and lung cancer (1954)(Bradford Hill)

Case-control study of smoking and lung cancer (1950)(Bradford Hill)

Randomised clinical trial of streptomycin and tubercolosis (1948)(Bradford Hill)

Evaluation of sampling uncertainty

What is sampling uncertainty?

Observed sample 354 consecutive patients with hip fracture

treated at the Department of Orthopedics, Umeå University Hospital

Unobservedpopulation

Observed sample 354 consecutive patients with hip fracture

treated at the Department of Orthopedics, Umeå University Hospital

All potential hip fracture patients all over the world, now, earlier and future.

Unobservedpopulation

Observed sample

Another observed sample

Another observed sample

A third observed sample

Now consider a laboratory experiment

To what population do experiment A belong?

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

The mother of all possible realizations of

Experiment A

To what population do experiment A belong?

The mother of all possible repetitions of

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

Sampling variability

To what population do experiment A belong?

The mother of all possible repetitions of

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

Sampling variability

μ

To what population do experiment A belong?

The mother of all possible repetitions of

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

What is the sampling variability of these experiments?

Observed sampling variability after thousands of experiments

μ

Experiment A

Sampling uncertainty?

μ

SDn

Do we need to repeat each experiment thousands of times?

Experiment A SDn

SEM = SD/√n

+1.96SEM-1.96SEM

Sampling uncertainty

Can we say anything about sampling uncertainty if only one experiment is performed?

Now consider a ranking of hospitals

Hospital A Hospital B Hospital C Hospital D Hospital E

Do different ranks in league tables represent differences in “hospital quality”?

Sampling variability?

The mother of all possible repetitions of

Hospital A

Hospital A Hospital A Hospital A Hospital A Hospital A

Sampling variability

μ

Or do the differences just reflect sampling variation?

It depends on the degree of uncertainty!

Sampling variability?

Hospital A Hospital B Hospital C Hospital D Hospital E

ICC ≈ 1.0

Sampling variability?

Hospital A Hospital B Hospital C Hospital D Hospital E

ICC = 0

It depends on the degree of uncertainty!

What is the difference between quantitative and qualitative science?

(sampling uncertainty)

Qualitative research

Sampling uncertainty is irrelevant for the generalization

Quantitative research

Generalization requires quantification of sampling uncertainty

Qualitative research: 100% of all crows are black

One white crow is sufficient to refute the statement.

Quantitative research: 99% of all crows are black

All crows cannot be studied simultaneously, but the proportion black crows can be estimated from a random sample of crows.

Samples are characterized by sampling uncertainty. This must be quantified to assess the empirical support of the findings.

Journal examples

What statements describe sampling uncertainty?

Generalization

Generalizable knowledge

Observation Generalization

P-values and confidence intervals are used to quantify the uncertainty.

They help us generalize.

Generalization

Generalization

How is the uncertainty assessed?

Statistical precision

Statistical precision depends on:

a) the variability (SD) between independent observations

b) the number (n) of independent observations

The standard error of an estimate (SE) = SD/√n

With the same variability, a greater sample size is needed to detect a lower effect.

Example: Vaccine trial

Protection of pandemic vaccine: 30% ill without vaccine.

Sample size for max 5% risk of false positive and 20% of false negative result.

Protection Nr patients 90 % 72 80 % 94 70 % 128 60 % 180 50 % 268 40 % 428

Example: Observational safety study

Guillain-Barrés syndrome: Incidence = 1x10-5 pyears

Sample size for max 5% risk of false positive and 20% of false negative result

Relative risk Nr patients Nr affected 100 1 098 9 000 50 2 606 4 500 20 9 075 1 800 10 26 366 900 5 92 248 450 2 992 360 180

Statistical precision

The p-value

The probability of by chance obtaining a result at least as extreme as that observed, when no effect exists.

If |Diffmean

/SEDiff

| > 1.96 then p < 0.05

and Diffmean

is considered statistically significant

Statistical precision

Confidence interval

A range of values, which with specified confidence includes the estimated population parameter.

Diffmean

±1.96 SEDiff

gives a 95% confidence interval

P-values are usually misconstrued

They do not

- describe clinical relevance, because they depend on sample size

- show that a difference “does not exist”, because statistical insignificance indicates absence of evidence, not evidence of absence

- present the uncertainty in the magnitude of an effect or difference, because the just relate to null effect (the null hypothesis)

Results

There was no difference in BMI (p = 0.09), see Table 1.

Table 1 BMI (mean ±SD)

Group 1. 29.2 ±6.9Group 2. 33.8 ±7.1

Confidence intervals are better than p-values

In contrast to p-values they do facilitate

- assessment of clinical significance

- show when a difference “does not exist”, because they present lower and upper limits ofpotential clinical effects/differences

Results

There was a difference in BMI of 4.1 (-0.3, 9.0) kg/m2, see Table 1.

Table 1 BMI (mean ±SD)

Group 1. 29.2 ±6.9Group 2. 33.8 ±7.1

0Effect

Statistically significant effect

Inconclusive

p < 0.05

n.s.

Information in p-values Information in confidence intervals[2 possibilities] [2 possibilities]

P-value and confidence interval

0Effect

Clinically significant effects

Statistically and clinically significant effect

Statistically, but not necessarily clinically, significant effect

Inconclusive

Neither statistically nor clinically significant effect

Statistically significant reversed effect

p < 0.05

p < 0.05

n.s.

n.s.

p < 0.05

P-value Conclusion from confidence intervals

P-value and confidence interval

0Control better

Margin of non-inferiorityor equivalence

Superiority shown

Superiority shown less strongly

Superiority not shownNon-inferiority not shown

Superiority not shown

Superiority vs. non-inferiority

New agent better

Non-inferiority shown Superiority not shown

Equivalence shown

Science as “significant observations”

Data

P < 0.05 [There is a difference]

NS [There is no difference]

Science as “significant observations”

Data

P < 0.05 [There is a difference]

A p-value can be meaningfully interpreted only when the hypothesis is defined a priori and when multiplicity issues are considered.

NS [There is no difference]

No, statistical insignificance indicates absence of evidence, not evidence of absence.

Science as “significant observations”

Data

What should not be askedIs there a statistically significant difference in the studied group of patients?

What should be askedIs there an indication of a clinically significant difference among patients in general?

CMAJ 1989;141:881–883.

2. The consequence of study design

Evidence based medicine

1. Strong evidence from at least one systematic review of multiple well-designed randomized controlled trials.

2. Strong evidence from at least one properly designed randomized controlled trial of appropriate size.

3. Evidence from well-designed trials such as pseudo-randomized or non-randomized trials, cohort studies, time series or matched case-controlled studies.

4. Evidence from well-designed non-experimental studies from more than one center or research group or from case reports.

5. Opinions of respected authorities, based on clinical evidence, descriptive studies or reports of expert committees.

Any claim coming from an observational study is most likely to be wrong

12 randomised trials have tested 52 observational claims (about the effects of vitamine B6, B12, C, D, E, beta carotene, hormone replacement therapy, folic acid and selenium).

“They all confirmed no claims in the direction of the observational claim. We repeat that figure: 0 out of 52. To put it in another way, 100% of the observational claims failed to replicate. In fact, five claims (9.6%) are statistically significant in the opposite direction to the observational claim.”

Stanley Young and Allan Karr, Significance, September 2011

Even good observational research...

A series of observational studies published in the Lancet and the NEJM generated and tested during the 1980s the hypothesis that AIDS was caused by the side effect of a drug (amyl nitrite).

The authors of these publications also claimed to have identified the biological mechanism and urged for preventive measures.

Then the virus was detected.

Vandenbroucke JP and Pardoel VP. An autopsy of epidemiologic methods: the case of “poppers” in the early epidemic of the acquired immunodeficiency syndrome (AIDS). Am J Epidemiol 1989;129:455-457.

What is the most important methodological difference between observational and

experimental studies?

Experimental vs. observational studies

Experiments

Bias is eliminated by design (“Block what you can, randomize what you cannot”)

Statistical analysis: Focus on precision

Observation

Blocking and randomization is not possible. Bias must be taken into consideration in the statistical analysis.

Statistical analysis: Focus on validity

Experimental studies

- Randomized clinical trials

- Laboratory experiments

Tests for baseline imbalance

Baseline imbalance after randomization is often tested. This is not meaningful.

The purpose of randomization is to avoid systematic imbalance (bias), not random errors (reduced precision).

The method to avoid random baseline imbalance is to use randomization stratification.

Multiplicity

In contrast to many other forms of precision, statistical precision depends on the number of measurements performed (the number of hypotheses tested).

The probability of a false positive finding increases with the number of performed tests.

Multiplicity

The risk of getting at least one false positive finding can be calculated as 1 - (1 - α)k

where k is the number of performed comparisons and α the significance level (usually 0.05).

Number of tests Risk of at least one false positive

1 0.05 2 0.10 10 0.40 20 0.60

Multiplicity

Adjustments of p-values can be made, but these reduce the type 1 error rate on the expense of the type 2 error rate, which means that a greater patient number will be needed, which in turn means higher cost.

Recommendation: Avoid multiplicity adjustments.

Laboratory experimenters often use Bonferroni correction to address multiplicity issues within endpoints, but hardly ever to correct for the multiplicity of endpoints. The work is therefore hypothesis generating rather than confirmatory.

Statistical analyses

Type of test Result

Confirmatory Empirical support for a claim of superiority, equivalence or non-inferiority.

Hypothesis A new hypothesis, which needs to be testedgenerating in a new hypothesis test.

How can I avoid multiplicity adjustments?

Most trials include more than 1 outcome.

Define a structure or hierarchy of endpoints: primary, secondary and safety. Define primary endpoint(s) as confirmatory and secondary as hypothesis generating.

No adjustment is necessary when statistical significance is required for all multiplicities or for supporting or explanatory hypothesis tests.

Endpoints

Primary The variable capable of providing themost clinically relevant evidencedirectly related to the primary objectiveof the trial

Secondary Effects related to secondary objectives, measurements supporting primary

endpoint(s) or hypothesis generating tests.

Validity issues in randomized trials

External validity

Inclusion/exclusion criteria affects the representativity of the results (efficacy vs. effectiveness).

Internal validity

Some subjects withdraw, from follow up. The withdrawal may depend on treatment and on the patient's characteristics. This can bias both efficacy and effectiveness.

Study populationsIntention-to-treat Analyze all randomized subjects(ITT) principle according to randomized treatment.

Full analysis set The set of subjects that is as close(FAS) as possible to the ideal implied by

the ITT-principle.

Per protocol The set of subjects who complied(PP) set with the protocol sufficiently to ensure

that they are likely to exhibit the effects of treatment according to the

underlying scientific model.

FAS vs. PP-setFAS + no selection bias

- misclassification problem (effect dilution)

PP-set + no contamination problem- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the sameconclusions, confidence in the trial is supported.

Clinical trialsInternational regulatory guidelines

ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates - missing data - multiplicity issues - etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

Observational studies

Main types

- Cross-sectional studies

- Cohort studies (prospective or historic)

- Case-control studies (always retrospective)

Observational studies

Validity

Selection bias (systematic differences between comparison groups caused by

non-random allocation of subjects)

Information bias (misclassification, measurement errors, etc.)

Confounding bias (inadequate analysis, flawed interpretation of results)

Testing for confounding

Screening for statistically significant effects, or stepwise regression, is often used to select covariates for inclusion in a regression model.

However, confounding is a property of the sample, not of the population. Hypothesis tests have no relevance.

The selection of covariates to adjust for must be based on clinical knowledge and considerations of cause and effect.

All study designs are (more or less) problematic

Observational studies- Post hoc hypothesis tests, multiple testing- Multiple modeling, protopatic bias, confounding- Recycling of data

Experimental studies (laboratory experiments)- Multiple testing (Bonferroni correction within endpoints)- Small sample problems (often n=3)- Pseudoreplication and pooling of samples

Experimental studies (randomized clinical trials)- External validity- No long term effects- No infrequent events

Independent observations and replicates

Two rats are sampled from a population with a mean (μ) of 50 and a standard deviation (σ) of 10, and ten measurements of an arbitrary outcome variable are made on each rat.

3. Publishing uncertain results

A scientific report

The idea is to try and give all the information to help others to judge the value of your contributions, not just the information that leads to judgment in one particular direction or another.

Richard P. Feynman

It is impossible to do clinical research so badly that it cannot be published

“There seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.”

Drummond Rennie 1986 (editor of NEJM and JAMA)

Changes in publication practice

1658 – first scientific journals1858 – the IMRAD structure1957 – the abstract1978 – Vancouver convention (ICMJE)1987 – the structured abstract

Randomized clinical trials1997 – Reporting guidelines (CONSORT)1998 – Analysis guidelines (ICH)2005 – Trial registration (Clinicaltrials.gov)

Observational studies2007 – Reporting guidelines (STROBE)2011 – Analysis guidelines (NARA, ICRS, etc.)

Clinical Trial Registration

In this editorial, published simultaneously in all member journals, the International Committee of Medical Journal Editors (ICMJE) proposes comprehensive trials registration as a solution to the problem of selective awareness and announces that all 11 ICMJE member journals will adopt a trials-registration policy to promote this goal.

The ICMJE member journals will require, as a condition of consideration for publication, registration in a public trials registry. Trials must register at or before the onset of patient enrollment. This policy applies to any clinical trial starting enrollment after July 1, 2005. For trials that began enrollment prior to this date, the ICMJE member journals will require registration by September 13, 2005,before considering the trial for publication. We speak only for ourselves, but we encourage editors of other biomedical journals to adopt similar policies.

Thank you for your attention!