Umeapresjr

104
What is quantified in quantitative research, and how do you publish the findings? Jonas Ranstam PhD

Transcript of Umeapresjr

Page 1: Umeapresjr

What is quantified in quantitative research, and how do you publish the findings?

Jonas Ranstam PhD

Page 2: Umeapresjr

Question: What is science?

Page 3: Umeapresjr

Answer: Science is1 generalizable knowledge.

generalizable = reproducible and predictive

1. US National Science Foundation

Page 4: Umeapresjr

One major problem:

Sampling uncertainty

Page 5: Umeapresjr

Plan1. Medical research and uncertainty

2. The consequence of study design

3. Publishing uncertain results

Page 6: Umeapresjr

1. Medical research and uncertainty

Page 7: Umeapresjr

Anecdotal evidence, case reports)

Page 8: Umeapresjr
Page 9: Umeapresjr

Cohort study of smoking and lung cancer (1954)(Bradford Hill)

Case-control study of smoking and lung cancer (1950)(Bradford Hill)

Randomised clinical trial of streptomycin and tubercolosis (1948)(Bradford Hill)

Evaluation of sampling uncertainty

Page 10: Umeapresjr

What is sampling uncertainty?

Page 11: Umeapresjr

Observed sample 354 consecutive patients with hip fracture

treated at the Department of Orthopedics, Umeå University Hospital

Page 12: Umeapresjr

Unobservedpopulation

Observed sample 354 consecutive patients with hip fracture

treated at the Department of Orthopedics, Umeå University Hospital

All potential hip fracture patients all over the world, now, earlier and future.

Page 13: Umeapresjr

Unobservedpopulation

Observed sample

Another observed sample

Another observed sample

A third observed sample

Page 14: Umeapresjr

Now consider a laboratory experiment

Page 15: Umeapresjr

To what population do experiment A belong?

Experiment A

Page 16: Umeapresjr

Experiment A Experiment A Experiment A Experiment A Experiment A

The mother of all possible realizations of

Experiment A

To what population do experiment A belong?

Page 17: Umeapresjr

The mother of all possible repetitions of

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

Sampling variability

To what population do experiment A belong?

Page 18: Umeapresjr

The mother of all possible repetitions of

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

Sampling variability

μ

To what population do experiment A belong?

Page 19: Umeapresjr

The mother of all possible repetitions of

Experiment A

Experiment A Experiment A Experiment A Experiment A Experiment A

What is the sampling variability of these experiments?

Observed sampling variability after thousands of experiments

μ

Page 20: Umeapresjr

Experiment A

Sampling uncertainty?

μ

SDn

Do we need to repeat each experiment thousands of times?

Page 21: Umeapresjr

Experiment A SDn

SEM = SD/√n

+1.96SEM-1.96SEM

Sampling uncertainty

Can we say anything about sampling uncertainty if only one experiment is performed?

Page 22: Umeapresjr

Now consider a ranking of hospitals

Page 23: Umeapresjr

Hospital A Hospital B Hospital C Hospital D Hospital E

Do different ranks in league tables represent differences in “hospital quality”?

Sampling variability?

Page 24: Umeapresjr

The mother of all possible repetitions of

Hospital A

Hospital A Hospital A Hospital A Hospital A Hospital A

Sampling variability

μ

Or do the differences just reflect sampling variation?

Page 25: Umeapresjr

It depends on the degree of uncertainty!

Sampling variability?

Hospital A Hospital B Hospital C Hospital D Hospital E

ICC ≈ 1.0

Page 26: Umeapresjr

Sampling variability?

Hospital A Hospital B Hospital C Hospital D Hospital E

ICC = 0

It depends on the degree of uncertainty!

Page 27: Umeapresjr

What is the difference between quantitative and qualitative science?

(sampling uncertainty)

Page 28: Umeapresjr

Qualitative research

Sampling uncertainty is irrelevant for the generalization

Quantitative research

Generalization requires quantification of sampling uncertainty

Page 29: Umeapresjr

Qualitative research: 100% of all crows are black

One white crow is sufficient to refute the statement.

Page 30: Umeapresjr

Quantitative research: 99% of all crows are black

All crows cannot be studied simultaneously, but the proportion black crows can be estimated from a random sample of crows.

Samples are characterized by sampling uncertainty. This must be quantified to assess the empirical support of the findings.

Page 31: Umeapresjr

Journal examples

Page 32: Umeapresjr
Page 33: Umeapresjr
Page 34: Umeapresjr
Page 35: Umeapresjr

What statements describe sampling uncertainty?

Page 36: Umeapresjr
Page 37: Umeapresjr
Page 38: Umeapresjr
Page 39: Umeapresjr
Page 40: Umeapresjr

Generalization

Generalizable knowledge

Observation Generalization

P-values and confidence intervals are used to quantify the uncertainty.

They help us generalize.

Generalization

Generalization

Page 41: Umeapresjr
Page 42: Umeapresjr

How is the uncertainty assessed?

Page 43: Umeapresjr

Statistical precision

Statistical precision depends on:

a) the variability (SD) between independent observations

b) the number (n) of independent observations

The standard error of an estimate (SE) = SD/√n

With the same variability, a greater sample size is needed to detect a lower effect.

Page 44: Umeapresjr
Page 45: Umeapresjr

Example: Vaccine trial

Protection of pandemic vaccine: 30% ill without vaccine.

Sample size for max 5% risk of false positive and 20% of false negative result.

Protection Nr patients 90 % 72 80 % 94 70 % 128 60 % 180 50 % 268 40 % 428

Page 46: Umeapresjr

Example: Observational safety study

Guillain-Barrés syndrome: Incidence = 1x10-5 pyears

Sample size for max 5% risk of false positive and 20% of false negative result

Relative risk Nr patients Nr affected 100 1 098 9 000 50 2 606 4 500 20 9 075 1 800 10 26 366 900 5 92 248 450 2 992 360 180

Page 47: Umeapresjr

Statistical precision

The p-value

The probability of by chance obtaining a result at least as extreme as that observed, when no effect exists.

If |Diffmean

/SEDiff

| > 1.96 then p < 0.05

and Diffmean

is considered statistically significant

Page 48: Umeapresjr

Statistical precision

Confidence interval

A range of values, which with specified confidence includes the estimated population parameter.

Diffmean

±1.96 SEDiff

gives a 95% confidence interval

Page 49: Umeapresjr

P-values are usually misconstrued

They do not

- describe clinical relevance, because they depend on sample size

- show that a difference “does not exist”, because statistical insignificance indicates absence of evidence, not evidence of absence

- present the uncertainty in the magnitude of an effect or difference, because the just relate to null effect (the null hypothesis)

Page 50: Umeapresjr

Results

There was no difference in BMI (p = 0.09), see Table 1.

Table 1 BMI (mean ±SD)

Group 1. 29.2 ±6.9Group 2. 33.8 ±7.1

Page 51: Umeapresjr

Confidence intervals are better than p-values

In contrast to p-values they do facilitate

- assessment of clinical significance

- show when a difference “does not exist”, because they present lower and upper limits ofpotential clinical effects/differences

Page 52: Umeapresjr

Results

There was a difference in BMI of 4.1 (-0.3, 9.0) kg/m2, see Table 1.

Table 1 BMI (mean ±SD)

Group 1. 29.2 ±6.9Group 2. 33.8 ±7.1

Page 53: Umeapresjr

0Effect

Statistically significant effect

Inconclusive

p < 0.05

n.s.

Information in p-values Information in confidence intervals[2 possibilities] [2 possibilities]

P-value and confidence interval

Page 54: Umeapresjr

0Effect

Clinically significant effects

Statistically and clinically significant effect

Statistically, but not necessarily clinically, significant effect

Inconclusive

Neither statistically nor clinically significant effect

Statistically significant reversed effect

p < 0.05

p < 0.05

n.s.

n.s.

p < 0.05

P-value Conclusion from confidence intervals

P-value and confidence interval

Page 55: Umeapresjr

0Control better

Margin of non-inferiorityor equivalence

Superiority shown

Superiority shown less strongly

Superiority not shownNon-inferiority not shown

Superiority not shown

Superiority vs. non-inferiority

New agent better

Non-inferiority shown Superiority not shown

Equivalence shown

Page 56: Umeapresjr

Science as “significant observations”

Data

P < 0.05 [There is a difference]

NS [There is no difference]

Page 57: Umeapresjr

Science as “significant observations”

Data

P < 0.05 [There is a difference]

A p-value can be meaningfully interpreted only when the hypothesis is defined a priori and when multiplicity issues are considered.

NS [There is no difference]

No, statistical insignificance indicates absence of evidence, not evidence of absence.

Page 58: Umeapresjr

Science as “significant observations”

Data

What should not be askedIs there a statistically significant difference in the studied group of patients?

What should be askedIs there an indication of a clinically significant difference among patients in general?

Page 59: Umeapresjr

CMAJ 1989;141:881–883.

Page 60: Umeapresjr

2. The consequence of study design

Page 61: Umeapresjr

Evidence based medicine

1. Strong evidence from at least one systematic review of multiple well-designed randomized controlled trials.

2. Strong evidence from at least one properly designed randomized controlled trial of appropriate size.

3. Evidence from well-designed trials such as pseudo-randomized or non-randomized trials, cohort studies, time series or matched case-controlled studies.

4. Evidence from well-designed non-experimental studies from more than one center or research group or from case reports.

5. Opinions of respected authorities, based on clinical evidence, descriptive studies or reports of expert committees.

Page 62: Umeapresjr

Any claim coming from an observational study is most likely to be wrong

12 randomised trials have tested 52 observational claims (about the effects of vitamine B6, B12, C, D, E, beta carotene, hormone replacement therapy, folic acid and selenium).

“They all confirmed no claims in the direction of the observational claim. We repeat that figure: 0 out of 52. To put it in another way, 100% of the observational claims failed to replicate. In fact, five claims (9.6%) are statistically significant in the opposite direction to the observational claim.”

Stanley Young and Allan Karr, Significance, September 2011

Page 63: Umeapresjr

Even good observational research...

A series of observational studies published in the Lancet and the NEJM generated and tested during the 1980s the hypothesis that AIDS was caused by the side effect of a drug (amyl nitrite).

The authors of these publications also claimed to have identified the biological mechanism and urged for preventive measures.

Then the virus was detected.

Vandenbroucke JP and Pardoel VP. An autopsy of epidemiologic methods: the case of “poppers” in the early epidemic of the acquired immunodeficiency syndrome (AIDS). Am J Epidemiol 1989;129:455-457.

Page 64: Umeapresjr

What is the most important methodological difference between observational and

experimental studies?

Page 65: Umeapresjr

Experimental vs. observational studies

Experiments

Bias is eliminated by design (“Block what you can, randomize what you cannot”)

Statistical analysis: Focus on precision

Observation

Blocking and randomization is not possible. Bias must be taken into consideration in the statistical analysis.

Statistical analysis: Focus on validity

Page 66: Umeapresjr
Page 67: Umeapresjr

Experimental studies

- Randomized clinical trials

- Laboratory experiments

Page 68: Umeapresjr
Page 69: Umeapresjr
Page 70: Umeapresjr

Tests for baseline imbalance

Baseline imbalance after randomization is often tested. This is not meaningful.

The purpose of randomization is to avoid systematic imbalance (bias), not random errors (reduced precision).

The method to avoid random baseline imbalance is to use randomization stratification.

Page 71: Umeapresjr

Multiplicity

In contrast to many other forms of precision, statistical precision depends on the number of measurements performed (the number of hypotheses tested).

The probability of a false positive finding increases with the number of performed tests.

Page 72: Umeapresjr

Multiplicity

The risk of getting at least one false positive finding can be calculated as 1 - (1 - α)k

where k is the number of performed comparisons and α the significance level (usually 0.05).

Number of tests Risk of at least one false positive

1 0.05 2 0.10 10 0.40 20 0.60

Page 73: Umeapresjr
Page 74: Umeapresjr

Multiplicity

Adjustments of p-values can be made, but these reduce the type 1 error rate on the expense of the type 2 error rate, which means that a greater patient number will be needed, which in turn means higher cost.

Recommendation: Avoid multiplicity adjustments.

Laboratory experimenters often use Bonferroni correction to address multiplicity issues within endpoints, but hardly ever to correct for the multiplicity of endpoints. The work is therefore hypothesis generating rather than confirmatory.

Page 75: Umeapresjr

Statistical analyses

Type of test Result

Confirmatory Empirical support for a claim of superiority, equivalence or non-inferiority.

Hypothesis A new hypothesis, which needs to be testedgenerating in a new hypothesis test.

Page 76: Umeapresjr

How can I avoid multiplicity adjustments?

Most trials include more than 1 outcome.

Define a structure or hierarchy of endpoints: primary, secondary and safety. Define primary endpoint(s) as confirmatory and secondary as hypothesis generating.

No adjustment is necessary when statistical significance is required for all multiplicities or for supporting or explanatory hypothesis tests.

Page 77: Umeapresjr

Endpoints

Primary The variable capable of providing themost clinically relevant evidencedirectly related to the primary objectiveof the trial

Secondary Effects related to secondary objectives, measurements supporting primary

endpoint(s) or hypothesis generating tests.

Page 78: Umeapresjr

Validity issues in randomized trials

External validity

Inclusion/exclusion criteria affects the representativity of the results (efficacy vs. effectiveness).

Internal validity

Some subjects withdraw, from follow up. The withdrawal may depend on treatment and on the patient's characteristics. This can bias both efficacy and effectiveness.

Page 79: Umeapresjr

Study populationsIntention-to-treat Analyze all randomized subjects(ITT) principle according to randomized treatment.

Full analysis set The set of subjects that is as close(FAS) as possible to the ideal implied by

the ITT-principle.

Per protocol The set of subjects who complied(PP) set with the protocol sufficiently to ensure

that they are likely to exhibit the effects of treatment according to the

underlying scientific model.

Page 80: Umeapresjr

FAS vs. PP-setFAS + no selection bias

- misclassification problem (effect dilution)

PP-set + no contamination problem- possible selection bias (confounding)

When the FAS and PP-set lead to essentially the sameconclusions, confidence in the trial is supported.

Page 81: Umeapresjr
Page 82: Umeapresjr

Clinical trialsInternational regulatory guidelines

ICH Topic E9 - Statistical Principles for Clinical Trials

EMEA Points to consider: baseline covariates - missing data - multiplicity issues - etc.

and similar documents from the FDA

These guidelines can all be found on the internet.

Page 83: Umeapresjr

Observational studies

Main types

- Cross-sectional studies

- Cohort studies (prospective or historic)

- Case-control studies (always retrospective)

Page 84: Umeapresjr
Page 85: Umeapresjr

Observational studies

Validity

Selection bias (systematic differences between comparison groups caused by

non-random allocation of subjects)

Information bias (misclassification, measurement errors, etc.)

Confounding bias (inadequate analysis, flawed interpretation of results)

Page 86: Umeapresjr
Page 87: Umeapresjr
Page 88: Umeapresjr
Page 89: Umeapresjr

Testing for confounding

Screening for statistically significant effects, or stepwise regression, is often used to select covariates for inclusion in a regression model.

However, confounding is a property of the sample, not of the population. Hypothesis tests have no relevance.

The selection of covariates to adjust for must be based on clinical knowledge and considerations of cause and effect.

Page 90: Umeapresjr

All study designs are (more or less) problematic

Observational studies- Post hoc hypothesis tests, multiple testing- Multiple modeling, protopatic bias, confounding- Recycling of data

Experimental studies (laboratory experiments)- Multiple testing (Bonferroni correction within endpoints)- Small sample problems (often n=3)- Pseudoreplication and pooling of samples

Experimental studies (randomized clinical trials)- External validity- No long term effects- No infrequent events

Page 91: Umeapresjr
Page 92: Umeapresjr

Independent observations and replicates

Two rats are sampled from a population with a mean (μ) of 50 and a standard deviation (σ) of 10, and ten measurements of an arbitrary outcome variable are made on each rat.

Page 93: Umeapresjr

3. Publishing uncertain results

Page 94: Umeapresjr

A scientific report

The idea is to try and give all the information to help others to judge the value of your contributions, not just the information that leads to judgment in one particular direction or another.

Richard P. Feynman

Page 95: Umeapresjr

It is impossible to do clinical research so badly that it cannot be published

“There seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.”

Drummond Rennie 1986 (editor of NEJM and JAMA)

Page 96: Umeapresjr

Changes in publication practice

1658 – first scientific journals1858 – the IMRAD structure1957 – the abstract1978 – Vancouver convention (ICMJE)1987 – the structured abstract

Randomized clinical trials1997 – Reporting guidelines (CONSORT)1998 – Analysis guidelines (ICH)2005 – Trial registration (Clinicaltrials.gov)

Observational studies2007 – Reporting guidelines (STROBE)2011 – Analysis guidelines (NARA, ICRS, etc.)

Page 97: Umeapresjr
Page 98: Umeapresjr

Clinical Trial Registration

In this editorial, published simultaneously in all member journals, the International Committee of Medical Journal Editors (ICMJE) proposes comprehensive trials registration as a solution to the problem of selective awareness and announces that all 11 ICMJE member journals will adopt a trials-registration policy to promote this goal.

The ICMJE member journals will require, as a condition of consideration for publication, registration in a public trials registry. Trials must register at or before the onset of patient enrollment. This policy applies to any clinical trial starting enrollment after July 1, 2005. For trials that began enrollment prior to this date, the ICMJE member journals will require registration by September 13, 2005,before considering the trial for publication. We speak only for ourselves, but we encourage editors of other biomedical journals to adopt similar policies.

Page 99: Umeapresjr
Page 100: Umeapresjr
Page 101: Umeapresjr
Page 102: Umeapresjr
Page 103: Umeapresjr
Page 104: Umeapresjr

Thank you for your attention!