RESEARCH Open Access Screening for esophageal adenocarcinoma · 2020. 1. 29. · Lise Bjerre2,...

25
RESEARCH Open Access Screening for esophageal adenocarcinoma and precancerous conditions (dysplasia and Barretts esophagus) in patients with chronic gastroesophageal reflux disease with or without other risk factors: two systematic reviews and one overview of reviews to inform a guideline of the Canadian Task Force on Preventive Health Care (CTFPHC) Candyce Hamel 1* , Nadera Ahmadzai 1 , Andrew Beck 1 , Micere Thuku 1 , Becky Skidmore 1 , Kusala Pussegoda 1 , Lise Bjerre 2 , Avijit Chatterjee 3 , Kristopher Dennis 4 , Lorenzo Ferri 5 , Donna E. Maziak 6 , Beverley J. Shea 1 , Brian Hutton 1,7 , Julian Little 7 , David Moher 1,7 and Adrienne Stevens 1 Abstract Background: Two reviews and an overview were produced for the Canadian Task Force on Preventive Health Care guideline on screening for esophageal adenocarcinoma in patients with chronic gastroesophageal reflux disease (GERD) without alarm symptoms. The goal was to systematically review three key questions (KQs): (1) The effectiveness of screening for these conditions; (2) How adults with chronic GERD weigh the benefits and harms of screening, and what factors contribute to their preferences and decision to undergo screening; and (3) Treatment options for Barretts esophagus (BE), dysplasia or stage 1 EAC (overview of reviews). Methods: Bibliographic databases (e.g. Ovid MEDLINE®) were searched for each review in October 2018. We also searched for unpublished literature (e.g. relevant websites). The liberal accelerated approach was used for title and abstract screening. Two reviewers independently screened full-text articles. Data extraction and risk of bias assessments were completed by one reviewer and verified by another reviewer (KQ1 and 2). Quality assessments were completed by two reviewers independently in duplicate (KQ3). Disagreements were resolved through discussion. We used various risk of bias tools suitable for study design. The GRADE framework was used for rating the certainty of the evidence. (Continued on next page) © Her Majesty the Queen in Right of Canada. 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. * Correspondence: [email protected] 1 Ottawa Hospital Research Institute, Knowledge Synthesis Group, 501 Smyth Road, Ottawa, ON, Canada Full list of author information is available at the end of the article Hamel et al. Systematic Reviews (2020) 9:20 https://doi.org/10.1186/s13643-020-1275-2

Transcript of RESEARCH Open Access Screening for esophageal adenocarcinoma · 2020. 1. 29. · Lise Bjerre2,...

  • RESEARCH Open Access

    Screening for esophageal adenocarcinomaand precancerous conditions (dysplasia andBarrett’s esophagus) in patients withchronic gastroesophageal reflux diseasewith or without other risk factors: twosystematic reviews and one overview ofreviews to inform a guideline of theCanadian Task Force on Preventive HealthCare (CTFPHC)Candyce Hamel1* , Nadera Ahmadzai1, Andrew Beck1, Micere Thuku1, Becky Skidmore1, Kusala Pussegoda1,Lise Bjerre2, Avijit Chatterjee3, Kristopher Dennis4, Lorenzo Ferri5, Donna E. Maziak6, Beverley J. Shea1,Brian Hutton1,7, Julian Little7, David Moher1,7 and Adrienne Stevens1

    Abstract

    Background: Two reviews and an overview were produced for the Canadian Task Force on Preventive Health Careguideline on screening for esophageal adenocarcinoma in patients with chronic gastroesophageal reflux disease(GERD) without alarm symptoms. The goal was to systematically review three key questions (KQs): (1) Theeffectiveness of screening for these conditions; (2) How adults with chronic GERD weigh the benefits and harms ofscreening, and what factors contribute to their preferences and decision to undergo screening; and (3) Treatmentoptions for Barrett’s esophagus (BE), dysplasia or stage 1 EAC (overview of reviews).

    Methods: Bibliographic databases (e.g. Ovid MEDLINE®) were searched for each review in October 2018. We alsosearched for unpublished literature (e.g. relevant websites). The liberal accelerated approach was used for title andabstract screening. Two reviewers independently screened full-text articles. Data extraction and risk of biasassessments were completed by one reviewer and verified by another reviewer (KQ1 and 2). Quality assessmentswere completed by two reviewers independently in duplicate (KQ3). Disagreements were resolved throughdiscussion. We used various risk of bias tools suitable for study design. The GRADE framework was used for ratingthe certainty of the evidence.

    (Continued on next page)

    © Her Majesty the Queen in Right of Canada. 2020 Open Access This article is distributed under the terms of the CreativeCommons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use,distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source,provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public DomainDedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,unless otherwise stated.

    * Correspondence: [email protected] Hospital Research Institute, Knowledge Synthesis Group, 501 SmythRoad, Ottawa, ON, CanadaFull list of author information is available at the end of the article

    Hamel et al. Systematic Reviews (2020) 9:20 https://doi.org/10.1186/s13643-020-1275-2

    http://crossmark.crossref.org/dialog/?doi=10.1186/s13643-020-1275-2&domain=pdfhttp://orcid.org/0000-0002-5871-2137http://creativecommons.org/licenses/by/4.0/http://creativecommons.org/publicdomain/zero/1.0/mailto:[email protected]

  • (Continued from previous page)

    Results: Ten studies evaluated the effectiveness of screening. One retrospective study reported no difference inlong-term survival (approximately 6 to 12 years) between those who had a prior esophagogastroduodenoscopyand those who had not (adjusted HR 0.93, 95% confidence interval (CI) 0.58–1.50). Though there may be higherodds of a stage 1 diagnosis than a more advanced diagnosis (stage 2–4) if an EGD had been performed in theprevious 5 years (OR 2.27, 95% CI 1.00–7.67). Seven studies compared different screening modalities, and showedlittle difference between modalities. Three studies reported on patients’ unwillingness to be screened (e.g. due toanxiety, fear of gagging). Eleven systematic reviews evaluated treatment modalities, providing some evidence ofearly treatment effect for some outcomes.

    Conclusions: Little evidence exists on the effectiveness of screening and values and preferences to screening.Many treatment modalities have been evaluated, but studies are small. Overall, there is uncertainty inunderstanding the effectiveness of screening and early treatments.

    Systematic review registrations: PROSPERO (CRD42017049993 [KQ1], CRD42017050014 [KQ2], CRD42018084825[KQ3]).

    Keywords: Esophageal adenocarcinoma, Gastroesophageal reflux disease, Barrett’s esophagus, Dysplasia, Screening,Patient values and preferences, Treatment, Systematic review, Overview of reviews

    IntroductionThere are two main types of esophageal cancer. Theseare, esophageal adenocarcinoma (EAC) where malignantcells form in the tissues of the lower third of the esopha-gus, primarily in glandular cells where Barrett’s Esopha-gus (BE) also develops [1], and esophageal squamous cellcarcinoma (ESCC), where malignant cells form in thesquamous cells of the esophagus. ESCC is the mostprominent form of esophageal neoplasm worldwide, with398,000 cases of ESCC compared to 52,000 cases ofEAC in 2012 [2]. However, EAC is more common thanESCC in Canada and nearly 50% of the worldwide casesof EAC occur in Northwestern Europe and North Amer-ica [3]. From 1986 to 2006, EAC incidence in Canadarose by 3.9% (1.8 to 3.5 per 100,000) in males and 3.6%(0.2 to 0.5 per 100,000) in females per year [3]. Rates inCanada, provided by the Canadian Cancer Society, re-port the overall rates of esophageal cancer (combinedEAC and ESCC). In 2017, projected new cases ofesophageal cancer were 2330 cases (1800 among menand 530 among women) with 2130 deaths from the dis-ease (1650 among men and 480 among women). Al-though esophageal cancer has a lower incidence thanother cancers (ranked 13th among men and 19th amongwomen), it has a high mortality rate and a low 5-yearsurvival rate (14%), the second lowest survival rate afterpancreatic cancer [4]. About 20% of EAC cases are diag-nosed at an early stage where treatment with surgeryleads to a 5-year survival rate of 90% [5].

    Risk factorsIncreases in incidence of EAC may be dependent on theincreasing prevalence of related risk factors such asobesity and gastroesophageal reflux disease (GERD) [3].Other risk factors for the development EAC are BE, age

    50 years and older, male sex, European descent, currentor past smoking, a family history of BE or EAC and adiet low in fruits and vegetables [1, 6–8].The prevalence of GERD in Western countries has in-

    creased over the past few decades and is one of the mostcommonly encountered conditions in primary care prac-tice with an estimated prevalence of between 18–27% inthe USA and 9–26% in Europe [9]. Extrapolating theseprevalence estimates to the Canadian population, sinceno Canadian incidence studies exist, would mean that3.4–6.8 million persons in Canada experience GERD[10]. GERD is a chronic disease with varying definitions[10–13]. The Montreal definition has been adopted byclinicians and researchers, and defines GERD as “a con-dition which develops when the reflux of stomach con-tents causes troublesome symptoms (e.g., retrosternalburning (heartburn), regurgitation) and/or complications(e.g., esophagitis, esophageal stricture)” [14]. Accordingto the American Society for Gastrointestinal Endoscopy,chronic, long-standing GERD is defined as frequent se-vere GERD symptoms for over 5 years and requiringregular acid suppression therapy [15]. However, expertsdiffer in the definition of the duration of symptoms andwhether acid suppression therapy is considered in defin-ing chronic GERD [16–18].

    The most common complications of GERD areesophagitis, esophageal stricture, BE and EAC [10]. Ap-proximately 60% of people with EAC have experiencedsymptoms of GERD and there is an association betweenthe frequency and severity of symptoms and increasedrisk of EAC [19, 20]. In BE, the tissue lining the esopha-gus transforms into tissue resembling the lining of theintestines. Generally, this transformation is called intes-tinal metaplasia, and in the esophagus, it is called BE. Itis currently not known how the transformation occurs;

    Hamel et al. Systematic Reviews (2020) 9:20 Page 2 of 25

  • however, it has been suggested that the acid regurgita-tion associated with GERD may assist changes at thecellular level [19]. BE is known to develop in around 6–14% of people with GERD, and among those with BE(with or without GERD), 0.2–0.5% develop EAC [21].However, not all individuals with BE will experiencechronic GERD symptoms, and it is still unclear why such asmall percentage of people with GERD develop BE [22, 23].Once an individual is diagnosed with BE, regular surveil-lance using endoscopy should be considered, as BE canprogress over time from low- to high-grade dysplasia andinto EAC [24, 25]. Patients who have EAC discovered as aresult of endoscopic screening or as part of a surveillanceprogram for BE are diagnosed with earlier-stage tumours,are less likely to have lymph node involvement, and havebetter short-term life expectancies than those who presentwith alarm symptoms such as dysphagia and weight loss[26]. It has also been found that the longer the length of BE(e.g. short segment vs. long segment), the higher the riskfor EAC [27].

    TreatmentThe goal of treatment for BE and/or low- or high-gradedysplasia is to slow or halt GERD symptoms, reducemucosal inflammation, control dysplasia and preventprogression to adenocarcinoma [28]. The treatments forEAC depend on the stage of the disorder (0 to 4). Forstage 0, the disease is considered precancerous and issynonymous with high-grade dysplasia. Endoscopictherapies (e.g. radiofrequency ablation (RFA) or endo-scopic mucosal resection (EMR)) are typically per-formed, followed by endoscopic surveillance [29]. Forstage 1, the disease is generally treated with mechanicalmethods to remove tissue (e.g. endoscopic mucosal re-section) followed by an ablative technique to destroy anyremaining abnormal areas in the esophagus lining [29].

    There are four main categories for managing and/ortreating the conditions of interest (i.e. stage 1 EAC, BEor dysplasia): (1) pharmacological therapies; (2) surveil-lance (endoscopic); (3) endoscopic or endoscopic-assisted therapies; and (4) surgery (see Additional file 1).These strategies may overlap with some of the condi-tions of interest. For example, proton pump inhibitor(PPI) therapy is not a treatment for EAC but may reducethe risk of developing dysplasia and EAC among peoplewith BE. These therapies may also be used in com-bination (e.g. pharmacological therapy and surveil-lance procedures for BE) depending on the diseaseprogression.

    ObjectivesWith Canada’s increasing senior population and longerlife expectancy, there is an expected increase in the inci-dence rates of GERD and EAC, and, therefore, increaseddemand for gastrointestinal endoscopies [10, 30]. Fromthe Canadian Institute for Health Information NationalPhysician Database, between 2004 and 2008 the numberof upper endoscopies performed in Canada has in-creased by approximately 16% [31]. However, the reasonfor the endoscopy was not detailed. In order to deter-mine the effectiveness of screening for EAC amongGERD patients, the following three key questions (KQs)(Table 1) were addressed through two systematic reviews(SRs) (KQ1 and KQ2) and one overview of reviews(KQ3).

    MethodsThese SRs were developed, conducted and prepared ac-cording to the Canadian Task Force for Preventive HealthCare (CTFPHC) Procedure Manual [32] or as methodswere updated by the CTPHFC. The protocols for these SRshave been published with PROSPERO (CRD42017049993,

    Table 1 Key questions

    Key question Question

    1a In adults (≥ 18 years) with chronic gastroesophageal reflux disease (GERD)a with orwithout other risk factorsb, what is the effectiveness (benefits and harms) of screeningfor esophageal adenocarcinoma (EAC) and precancerous conditions (Barrett’s Esophagus(BE) and low- and high-grade dysplasia)? What are the effects in relevant subgrouppopulations?

    1b If there is evidence of effectivenessc, what is the optimal time to initiate and to endscreening, and what is the optimal screening interval (includes single and multiple testsand ongoing ‘surveillance’)?

    2 In adults with chronic GERD with or without other risk factors,b who have been offered,received, or allocated to receive screening for EAC and precancerous conditions (BE andlow- and high-grade dysplasia), how do they weigh the benefits and harms of screening,and what factors contribute to these preferences and to their decisions to undergoscreening?

    3 What is the effectiveness (benefits and harms) of treatment for stage 1 EAC andprecancerous conditions (BE and low- and high-grade dysplasia) in adults?

    aAs defined by study authorsbRisk factors will be deemed so by included studiescIf there is evidence of at least moderate certainty of evidence of benefit, according to GRADE

    Hamel et al. Systematic Reviews (2020) 9:20 Page 3 of 25

  • CRD42017050014, CRD42018084825) and are available onthe CTFPHC website (https://canadiantaskforce.ca/).These reviews are reported according to the Pre-

    ferred Reporting Items for Systematic Reviews andMeta-Analyses (PRISMA) statement [33] (Additionalfile 2) and includes a PRISMA flow diagram for eachkey question. We also used AMSTAR (A Measure-ment Tool to Assess the Methodological Quality ofSystematic Reviews) for additional quality control[34]. Any amendments made to the protocols whenconducting the reviews have been outlined in Additionalfile 3.

    Analytic frameworksThe analytic framework for these reviews is presented inFig. 1.

    Inclusion and exclusion criteriaTable 2 presents the eligibility criteria for each KQ,using the PICOTS framework.

    Literature searchAll search strategies (Additional file 4) were developedand tested through an iterative process by an experi-enced medical information specialist in consultationwith the review teams. In addition, the search strategyfor the MEDLINE database was peer-reviewed by an-other experienced librarian using the Peer Review ofElectronic Search Strategies (PRESS) checklist [35](Additional file 5). Table 3 presents an overall descrip-tion of the searching for each KQ.

    Study selectionFor each KQ, duplicates across searches were identifiedand removed using Reference Manager [36]. Theremaining articles were uploaded into Distiller System-atic Review (DistillerSR) Software© [37] for title and ab-stract screening and full-text screening of the remainingpotential relevant articles.Reviewers performed a pilot testing phase of randomly

    selected title and abstracts (n = 50) and potentially rele-vant full-text articles (n = 25) prior to commencing

    Fig. 1 Guideline analytic framework

    Hamel et al. Systematic Reviews (2020) 9:20 Page 4 of 25

    https://canadiantaskforce.ca/

  • Table

    2Po

    pulatio

    n,interven

    tions,com

    parison

    s,ou

    tcom

    es,tim

    eframe,stud

    yde

    sign

    (PICOTS)

    Keyqu

    estio

    n1

    Keyqu

    estio

    n2

    Keyqu

    estio

    n3

    Popu

    latio

    nInclusion

    Adu

    lts(≥

    18yearsold)

    awith

    chronicgastroesop

    hage

    alrefluxdisease(GERD)bwith

    orwith

    outothe

    rriskfactorsc

    foresop

    hage

    aladen

    ocarcino

    ma(EAC).

    Adu

    lts(≥

    18yearsold)

    awith

    chronicGERDwith

    orwith

    outothe

    rriskfactorscforEA

    Cwho

    have

    been

    offered,

    received

    ,orallocatedto

    receivescreen

    ing,

    depe

    ndingon

    thede

    sign

    ofthestud

    y.

    Adu

    lts(≥

    18yearsold)

    awith

    stage1EA

    C,Barrett’s

    Esop

    hagu

    s(BE)

    orlow-or

    high

    -grade

    dysplasia,with

    orwith

    outchronicGERDas

    defined

    inthesystem

    aticre-

    view

    s(SRs)d

    Exclusion

    -Expe

    riencingalarm

    symptom

    sforEA

    C:d

    ysph

    agia,recurrent

    vomiting

    ,ano

    rexia,weigh

    tloss,g

    astrointestin

    albleeding

    orothe

    rsymptom

    siden

    tifiedby

    authorsas

    ‘alarm

    ’.-Diagn

    osed

    with

    othe

    rgastro-esoph

    agealcon

    ditio

    ns(e.g.gastriccancer,esoph

    agealatresia,other

    lifethreaten

    ing

    esop

    hage

    alcond

    ition

    s)or

    pre-existin

    gdisease(BE,dysplasia,or

    EAC).

    Thosediagno

    sedwith

    othe

    rgastro-esoph

    agealcon

    di-

    tions

    (e.g.g

    astriccancer,esoph

    agealatresia,and

    othe

    rlife-threaten

    ingesop

    hage

    alcond

    ition

    s).

    Interven

    tion

    /com

    parator

    Inclusion

    KQ1a:

    -Screen

    ingversus

    noscreen

    ing

    -One

    screen

    ingmod

    ality

    versus

    anothe

    rscreen

    ing

    mod

    ality

    Allscreen

    ingmod

    alities

    forBE,d

    ysplasiaor

    EACwillbe

    includ

    ed,suchas

    esop

    hago

    gastrodu

    oden

    oscopy

    (EGD)e,f

    EGDfplus

    adjuncttechniqu

    esg,transnasalend

    oscopy,

    cytologicexam

    ination

    KQ1b

    :-One

    screen

    ingmod

    ality

    vs.ano

    ther

    screen

    ingmod

    ality

    -One

    intervalof

    screen

    ingvs.ano

    ther

    intervalof

    screen

    ing

    -Timep

    oint

    atwhich

    toinitiatescreen

    ingvs.ano

    ther

    timep

    oint

    -Timep

    oint

    atwhich

    toceasescreen

    ingversus.ano

    ther

    timep

    oint

    Screen

    ingforEA

    Candothe

    rprecancerous

    lesion

    swith

    anyscreen

    ingmod

    ality

    Dep

    ending

    onstud

    yde

    sign

    ,com

    paratorsmay

    be:

    -Noscreen

    ingh

    -Differen

    tscreen

    ingmod

    ality

    -Differen

    tscreen

    ingintervals

    -Differen

    tleng

    ths/du

    ratio

    nof

    screen

    ing

    -Offeredscreen

    ingbu

    tdidno

    treceivescreen

    ing

    -Nocomparison

    Managem

    ent/treatm

    entforstage1EA

    C,low-or

    high

    -gradedysplasiaor

    BEinclud

    ing:

    -Ph

    armacolog

    icaltherapiesi

    -Surveillancemetho

    dssuch

    as:EGDe,fplus

    biop

    sy,EGDf

    plus

    biop

    syplus

    adjuncttechniqu

    esj

    -Endo

    scop

    icor

    Endo

    scop

    icAssistedtherapiesk

    -Surgery,includ

    ingfund

    oplicationandesop

    hage

    ctom

    yCom

    parator:Nomanagem

    ent/treatm

    entcomparedto

    anothe

    rmanagem

    ent/treatm

    entregimen

    ,ora

    combinatio

    nof

    managem

    ent/treatm

    entstrategies.

    Exclusion

    Any

    follow-updiagno

    stictests,such

    as24-h

    esop

    hage

    alpH

    testor

    anytestforstagingpu

    rposes,suchas

    compu

    terized

    tomog

    raph

    yandmagne

    ticresonanceim

    aging.

    Outcomes

    Inclusion

    Criticalforde

    cision

    -making

    1.Mortality—

    all-cause

    andEA

    C-related

    (1,5

    and10

    year

    oras

    available)l,m

    2.Survival(1,5

    and10

    year

    oras

    available)l

    3.Life

    threaten

    ing,

    severe,ormed

    icallysign

    ificant

    conseq

    uences

    (suchas

    requ

    iring

    hospitalizationor

    prolon

    gatio

    nof

    hospitalization;disabling(limiting

    self-

    care

    oractivities

    ofdaily

    living)

    Impo

    rtantforde

    cision

    -making

    4.Incide

    nceof

    EAC(bystage),BE,low-andhigh

    -grade

    dysplasiam

    5.Qualityof

    life(validated

    scales

    only;e.g.SF-36,

    WHOQUAL)

    6.Psycho

    logicaleffects(e.g.anxiety

    andde

    pression

    )7.Major

    orminor

    med

    icalproced

    ures

    m

    8.Overdiagn

    osisn

    1.How

    patientsweigh

    thebe

    nefitsandharm

    sof

    screen

    ing(e.g.ranking

    /ratingof

    bene

    fitsandharm

    sou

    tcom

    es)

    2.Willingn

    essto

    bescreen

    ed3.Uptakeof

    screen

    ing

    4.Factorsconsidered

    inde

    cision

    tobe

    screen

    ed:w

    hat

    compo

    nents/ou

    tcom

    esof

    screen

    ingdo

    patientsplace

    morevalueon

    whe

    nde

    ciding

    whe

    ther

    tobe

    screen

    edor

    not(e.g.p

    oten

    tialcom

    plications

    resulting

    from

    screen

    ing)

    5.Intrusiven

    essof

    thescreen

    ingmod

    ality

    Criticalforde

    cision

    -making

    1.Mortality—

    all-cause

    andEA

    C-related

    (1,5

    and10

    years,

    oras

    available)l

    2.Survival(1,5

    and10

    years,or

    asavailable)l

    3.Prog

    ressionfro

    mno

    n-dysplasticBE

    toBE

    with

    dyspla-

    sia,prog

    ressionfro

    mlow-grade

    tohigh

    -grade

    dysplasia,

    prog

    ressionto

    EAC

    4.Life

    threaten

    ing,

    severe,ormed

    icallysign

    ificant

    conseq

    uences

    (suchas

    requ

    iring

    hospitalizationor

    prolon

    gatio

    nof

    hospitalization;disabling(limiting

    self-

    care

    oractivities

    ofdaily

    living)

    Impo

    rtantforde

    cision

    -making

    5.Qualityof

    life(validated

    scales

    only;e.g.SF-36,

    WHOQUAL)

    6.Major

    orminor

    med

    icalproced

    ures

    7.Psycho

    logicaleffects(e.g.,anxiety,stress)

    8.Overtreatmen

    tPo

    st-hoc

    outcom

    es:

    9.Com

    pleteeradicationof:intestin

    almetaplasia/BE,

    dysplasia,high

    -grade

    dysplasia,ne

    oplasia

    Hamel et al. Systematic Reviews (2020) 9:20 Page 5 of 25

  • Table

    2Po

    pulatio

    n,interven

    tions,com

    parison

    s,ou

    tcom

    es,tim

    eframe,stud

    yde

    sign

    (PICOTS)(Con

    tinued)

    Keyqu

    estio

    n1

    Keyqu

    estio

    n2

    Keyqu

    estio

    n3

    10.Reductio

    n/regressio

    nof

    BE:inleng

    th(cm),inarea

    (%)

    11.Treatmen

    tFailure

    (noablatio

    n)12.EACrecurren

    ce

    Timing

    Nolim

    itsNolim

    itsNolim

    its

    Setting

    Settings

    werelim

    itedto

    prim

    arycare

    orsettings

    inwhich

    aprim

    arycare

    physiciancouldreferapatient

    for

    esop

    hage

    alscreen

    ing.

    Prim

    arycare

    orothersettings

    gene

    ralizableto

    prim

    ary

    care.

    Any

    setting.

    Stud

    yde

    sign

    sInclusion

    Rand

    omized

    controlledtrials(RCTs),includ

    ingcluster

    RCTs.

    Ifno

    orfew

    RCTs

    (i.e.<5trials)areavailable:Non

    -RCT,

    controlledbe

    fore-afte

    r,interrup

    tedtim

    esseries,coho

    rtstud

    ies,case-con

    trol

    stud

    ies,lim

    iting

    tohigh

    erlevelsof

    eviden

    cede

    pend

    ingon

    thenature

    andvolumeof

    spe-

    cific

    stud

    yde

    sign

    s.Ifno

    orfew

    RCTs

    areavailablefortheoverdiagno

    sis

    outcom

    e,ecolog

    icalandcoho

    rtstud

    ieswillbe

    considered

    forallo

    utcomes

    used

    forthejudg

    emen

    tof

    overdiagno

    sis.

    Rand

    omized

    controlledtrials

    Ifinsufficien

    tdata

    exists:

    Con

    trolledclinicaltrials,con

    trolledbe

    fore-after,case-

    controls,coh

    ort,interrup

    tedtim

    eseries(ITS),and

    cross-sectional(e.g.

    surveys)

    Ifinsufficien

    tdata

    existsfortheabove:

    Qualitativestud

    iesandmixed

    -metho

    dsstud

    ies

    System

    aticreview

    sof

    RCTs

    o

    Tobe

    defined

    asaSR,a

    review

    musthave

    met

    allfou

    rof

    thefollowingcriteria:(1)

    searched

    atleaston

    edatabase;(2)

    repo

    rted

    itsselectioncriteria;(3)

    cond

    ucted

    quality

    orriskof

    bias

    assessmen

    ton

    includ

    edstud

    ies;

    and(4)provided

    alistandsynthe

    sisof

    includ

    edstud

    ies.

    SRsthat

    iden

    tifiedob

    servationalstudies

    wereinclud

    edif

    results

    from

    RCTs

    wereprovided

    separately.

    Exclusion

    Cross-sectio

    nalstudies,caseseries,case

    repo

    rts,and

    othe

    rpu

    blicationtype

    s(editorials,com

    men

    taries,no

    tes,

    letter,opinion

    s).

    Com

    men

    taries,op

    inion,ed

    itorialsandreview

    sSRsthat

    combine

    results

    from

    RCTs

    with

    non-RC

    Ts,con

    -trolledbe

    fore-afte

    r,interrup

    tedtim

    esseries,coho

    rtstud

    -ies,case-con

    trol

    stud

    ies,cross-sectionalstudies,case

    series,case

    repo

    rtsandothe

    rpu

    blicationtype

    s(edito-

    rials,com

    men

    taries,no

    tes,letter,opinion

    s)or

    SRsthat

    onlyinclud

    eno

    n-RC

    Tandob

    servationalstudies.

    Lang

    uage

    Nolang

    uage

    restrictio

    nsin

    thesearch;how

    ever,onlyEnglishandFren

    charticleswillbe

    includ

    edat

    full-text.

    Databases

    MED

    LINE,Em

    base,C

    ochraneLibrary

    MED

    LINE,Em

    base,C

    INAHL,CochraneLibrary

    MED

    LINE,Em

    base,C

    ochraneLibrary(CDSR,D

    ARE,H

    TA)

    a Studies

    addressing

    both

    adults

    andchild

    ren,

    ifda

    taprov

    ided

    forad

    ults

    arerepo

    rted

    sepa

    rately

    bChron

    icGER

    D,asde

    fined

    bystud

    yau

    thors

    c Riskfactorswillbe

    asde

    emed

    soby

    includ

    edstud

    ies

    dWedidno

    tuseapred

    efined

    metho

    dfordiag

    nosis(e.g.h

    istopa

    tholog

    ical

    exam

    s,ICDcode

    )an

    dreliedon

    how

    itwas

    defin

    edin

    theSR

    se Alsokn

    ownas

    pane

    ndoscopy

    andup

    perGIe

    ndoscopy

    f With

    orwith

    outbiop

    syprotocol

    gFo

    rexam

    ple,

    chromen

    doscop

    yan

    dna

    rrow

    -ban

    dim

    aging

    hAlth

    ough

    wewillconsider

    compa

    rativ

    estud

    iesthat

    includ

    eano

    screen

    ingarm,w

    eun

    derstand

    that

    theou

    tcom

    esof

    interest

    dono

    tap

    plyto

    peop

    lewho

    dono

    treceiveor

    have

    notbe

    enofferedscreen

    ing.

    Forsuch

    stud

    ies,wewillon

    lyconsider

    data

    forthosewho

    receiveor

    areofferedscreen

    ing

    i Suchas

    PPI,H2receptor

    antago

    nists,Cox-2

    inhibitors,P

    rokine

    ticsan

    dan

    tacids,N

    SAIDs

    j Suchas

    high

    -definition

    /high-resolutio

    nwhite

    light

    endo

    scop

    y,chromoe

    ndoscopy

    ,electronicchromoe

    ndoscopy

    ,autofluorescensce

    imag

    ing,

    confocal

    laseren

    domicroscop

    y,lig

    htscatterin

    gspectroscopy

    ,diffuse

    refle

    ctan

    cespectroscopy

    k Suchas

    ablativ

    etechniqu

    es(the

    rmal

    orchem

    ical),an

    dmecha

    nicalm

    etho

    ds(EMR,

    ESDor

    combine

    dop

    tions)

    l From

    thetim

    eof

    allocatio

    nto

    screen

    ingor

    controla

    rmmTh

    eseou

    tcom

    eswillbe

    used

    tojudg

    etheextent

    ofov

    erdiag

    nosis,which

    isde

    fined

    asthediag

    nosisof

    diseasewhich

    wou

    ldne

    verha

    vebe

    comeclinically

    appa

    rent

    inape

    rson

    'slifetim

    e(i.e.,cau

    sing

    neith

    ersymptom

    sno

    rde

    ath)

    nAsjudg

    edby

    thestud

    yau

    thor

    orwillbe

    judg

    edby

    theCTFPH

    Cworking

    grou

    pusinginform

    ationprov

    ided

    byau

    thors,whe

    reavailable

    oSystem

    aticreview

    sthat

    combine

    RCTan

    dno

    n-RC

    Tswillbe

    includ

    edifresults

    forRC

    Tsareprov

    ided

    sepa

    rately

    from

    non-RC

    Tstud

    ies

    Hamel et al. Systematic Reviews (2020) 9:20 Page 6 of 25

  • broad screening. Screening forms can be found in Add-itional file 7. Titles and abstracts were independentlyscreened for relevance by two reviewers, using the liberalaccelerated method, which requires one user to includefor further assessment at full-text and two reviewers toexclude [38]. References were reviewed in random order,with each reviewer unaware if the reference had alreadybeen assessed and excluded by the other reviewer. Sub-sequently, full-texts were retrieved and two reviewers in-dependently assessed the article for relevancy. Conflictsat full-text were resolved by consensus or a third teammember. Articles not available for download were or-dered from the library through interlibrary loans. Thosethat were not received within 30 days were excluded andlabelled accordingly. For articles with abstracts only, asearch was performed to locate any full-textpublications.Where chronic GERD was not defined in a study

    (KQ1 and KQ2), we attempted to contact the study au-thors twice over 2 weeks by email to obtain more infor-mation. If authors did not respond, and the lack ofdefinition for chronic GERD was the only reason forpossible exclusion, we included the study. Reports in ab-stract form and protocols were coded as such and ex-cluded, but a search was completed to see if the full-textwas available. Those that were not available as full-textswere excluded and studies available only in abstractform are available in the list of excluded studies (Add-itional file 8).

    Data extraction and managementFor all KQs, full data extraction was completed by onereviewer using a form developed a priori and 100% ofthese were verified by a second reviewer (Additional file

    9). Any disagreements were resolved by consensus or ifneeded, with a third reviewer. For KQ1 and KQ2, whereinformation was unclear or missing, authors were con-tacted by email twice over 2 weeks. If no response wasreceived and the information affected the ability forquantitative analysis, the study was analyzed narratively.For KQ3, data were extracted as they were synthesizedand/or reported in the included reviews. No additionalinformation from the primary studies was extracted orassessed and quality control was not performed to verifythe accuracy of the reviews’ data on the included studies.

    Risk of bias and quality assessmentFor KQ1 and KQ2, all included studies were assessed forthe risk of bias (RoB) by one reviewer, with verificationcompleted by a second reviewer. The Cochrane RoB tool[39] was used to evaluate the RoB in RCTs and theNewcastle-Ottawa scale (NOS) [40] was used to evaluatethe RoB in cohort studies. For KQ3, the quality of theincluded SRs was assessed using the AMSTAR measure-ment tool [41]. Two reviewers assessed the quality ofeach included SR independently. Any discrepancies wereresolved through discussion and if needed, a third re-viewer. We used the AMSTAR 2 [42] approach to deter-mine the final assessments of quality of conduct,including consideration of four critical domains and cat-egorized the quality as high, moderate, low or criticallylow, using the criteria described in Additional file 10.For all assessments, disagreements were resolved by con-sensus or third party adjudication.

    AnalysisFor all KQs, characteristics of the included studies/re-views are presented in tables and summarised

    Table 3 Searching for studies

    Key question 1 Key question 2 Key question 3

    Searchesa,b Additional file 4. KQ1 searches Additional file 4. KQ2 searches Additional file 4. KQ3 searches

    Databases OVID MEDLINE®OVID MEDLINE® Epub Aheadof Print, In-Process and OtherNon-Indexed CitationsEmbase Classic + EmbaseCochrane Library on Wiley

    Same as KQ1plus CINAHL using theEBSCO platform

    Same as KQ1

    Date run From the inception date on October 29–30, 2018.

    Controlled vocabulary examplesc Gastroesophageal reflux,esophageal neoplasms,endoscopy

    Gastroesophageal reflux,patient acceptance of healthcare, informed consent

    Barrett esophagus, esophagealneoplasms, meta analysis

    Keywords examplesc GERD, esophageal cancer,esophagoscopy

    GERD, patient perspective,informed decision-making

    Barrett’s dysplasia, esophagealcancer, systematic review

    Grey literature CADTH Grey Matters, websites listed in Additional file 6, bibliographies of relevantsystematic reviews and clinical practice guidelines identified from the searchstrategies and grey literature searching.

    CADTH Grey Matters plusadditional references listedin Additional file 6.

    aWhen possible, animal-only and opinion-pieces were removed from the resultsbThe search strategies were peer-reviewed using PRESS 2015 [35] and can be found in Additional file 5cVocabulary and syntax adjusted across databases, as required

    Hamel et al. Systematic Reviews (2020) 9:20 Page 7 of 25

  • narratively. For KQ1, the results are presented in evi-dence sets 1 to 8 (Additional file 11), with associated for-est plots, where applicable. For KQ2, due to the natureof the data, a meta-analysis of outcomes was not appro-priate; however, narrative results are presented. ForKQ3, the results presented in evidence sets 1-11 (Add-itional file 12) may omit some results due to overlap. Inthe case of overlap where outcome data was the same inmultiple reviews, the review with the highest methodo-logical quality or with the most complete outcome datawas included; the additional reviews are listed inAdditional file 12: Table 1 and mentioned in the Notescolumn within the evidence sets. For KQ3, odds ra-tios (OR) were commonly used in SRs and absoluterisk differences (ARDs) were calculated accordingly.Where SR authors did not provide an OR, a relativerisk (RR) was calculated based on the results and theARD was calculated based on the RR. In instanceswhere the RR did not approximate the OR reportedin the SR, we inserted the RR in the notes column inthe evidence set; however, the ARDs were calculatedbased on the OR. We determined the extent of over-lap of evidence across reviews by outcome for eachcomparison using the corrected covered area (CCA)method [43].

    Meta-analysisFor KQ1, raw data were extracted from all articles, whenavailable. Raw data were entered into Review ManagerSoftware version 5.3 [44] and hazard ratios (HR) wereproduced for the survival outcome and risk ratios (RR)were calculated for all other outcomes.

    Subgroup analysisA priori-defined subgroup analysis (KQ1) variables in-cluded age, sex, body mass index (BMI), smoking his-tory, duration of chronic GERD, definition of chronicGERD, groupings of risk factors and various ethnicgroups. Reporting did not allow for these to beundertaken.

    Sensitivity analysisSensitivity analyses were planned to restrict to thosestudies as being low risk of bias (KQ1) based on theoverall judgement, to address any decisions made re-garding handling of data or to explore statistical hetero-geneity (KQ1) and based on the timing of publication(KQ1 and KQ2). However, only two studies were consid-ered low risk of bias and therefore sensitivity analysiswas not undertaken.

    Small study effectsFor KQ1 and KQ2, to assess for small study effects, acombination of graphical aids (e.g. funnel plot) and/or

    statistical tests (e.g. Egger regression test, Hedges-Olkin)were planned if at least ten studies were available in anygiven analysis. This analysis was not undertaken.

    Rating the certainty of the evidenceFor each critical and important outcome, the GRADEframework [32, 45] was used to assess the strength andcertainty of the evidence. We followed the GRADE guid-ance for determining the extent of the risk of bias forthe body of evidence [46]. The online software GRADE-pro GDT (https://gradepro.org/) was used for theGRADE assessments. Assessment of each GRADE do-main (study limitations (i.e. risk of bias), indirectness, in-consistency, imprecision and other considerations (i.e.publication bias and comprehensiveness of the search))was presented, where possible, with the information pro-vided in the studies. If there was missing information, anarrative description was provided. The certainty of theevidence for each outcome, in each study/review, wasrated by one reviewer and verified by a second reviewer.Any discrepancies were resolved through consensus.As KQ3 is an overview, and there are no published

    methods for performing GRADE for overviews of re-views, we have used the five domains listed above as aguide. As none of the included reviews used GRADE toevaluate the body of evidence, we performed these as-sessments using the reported information in the reviewsand did not access the primary studies for any additionalinformation, as was pre-specified in the protocol. Whenundertaking domain assessments, we considered an ap-proach with sufficient face validity to align with GRADEguidance. We have elaborated on considerations and de-cisions in Additional file 13. As with existing GRADEguidance, each GRADE domain was judged as possessingno serious limitations (no rating down), serious limita-tions (rating down by one) or very serious limitations(rating down by two).

    ResultsTable 4 provides a summary of the literature search re-sults and Fig. 2a–c shows the PRISMA flow diagrams foreach KQ. Study characteristics and population demo-graphics for each key question are presented in Add-itional file 14 and overall RoB/quality assessment forincluded studies and reviews are presented in Additionalfile 15. Additional files 11, 16, 12 provide the evidenceset results, narrative results, GRADE evidence profilesand GRADE summary of findings tables for KQ1,KQ2 and KQ3, respectively. The results presentedherein provide a high level overview of the results.For additional details of the individual studies and re-views within each section, the full SRs can be foundon the CTFPHC website (www.canadiantaskforce.ca).Additional file 8 provides a list of excluded studies

    Hamel et al. Systematic Reviews (2020) 9:20 Page 8 of 25

    https://gradepro.org/http://www.canadiantaskforce.ca

  • Table

    4Summaryof

    stud

    ies/review

    s

    Keyqu

    estio

    n1

    Keyqu

    estio

    n2

    Keyqu

    estio

    n3

    Literature

    search

    (PRISM

    Aflow

    diagramsin

    Fig.

    2a–c)

    Initialsearch

    results

    7,292

    1,614

    4,374

    Ded

    uplication,grey

    litandsupp

    l.searchinga

    4,384(evaluated

    attitleandabstract)

    1,600

    3,761

    Evaluatedat

    full-text

    1645

    103

    1007

    #of

    includ

    edstud

    ies

    10(6

    RCTs,1

    rand

    omized

    cross-over

    trial,1prospe

    ctivecoho

    rt,2

    retrospe

    ctive

    coho

    rt)

    3(2

    RCTs,1

    coho

    rt)

    11SRs(10repo

    rtingresults)which

    includ

    ed25

    articles

    repo

    rtingresults

    ofRC

    Ts(1

    to16

    perreview

    )(Figs.3

    and4)

    Stud

    ycharacteristics(Fulltablesin

    Add

    ition

    alfile14:TablesS1-S3)

    Com

    parison

    s-Screen

    ingin

    thelast5yearswith

    EGDvs

    noscreen

    ing[88,90]

    -Screen

    ingmod

    ality

    (e.g.con

    ventionalEGD)vs.screening

    mod

    ality

    (transnasal

    esop

    hago

    scop

    y)–[80,81,83,85–87]

    -Biop

    symetho

    d(e.g.Fou

    r-qu

    adrant

    rand

    om)vs

    biop

    symetho

    d(chrom

    oend

    o-scop

    y)[82,84]

    -Transnasalesop

    hago

    scop

    yvs.

    Vide

    ocapsuleesop

    hago

    scop

    y-Transnasal-EGDvs.Peroral-EGD

    -Peroral-EGDandsedatedEG

    D

    -Celecoxib

    vs.Placebo

    -Omep

    razolevs.H

    istamineType

    2Receptor

    Antagon

    ists

    -PD

    T+Omep

    razolevs.O

    mep

    razole

    -Anti-refluxsurgery(Nissenfund

    oplication)

    +APC

    vsAnti-refluxsurgery(Nissenfund

    oplication)

    +Surveillance

    (end

    oscopic)

    -Radiofrequ

    ency

    ablatio

    n+Proton

    pumpinhibitorvs.

    Proton

    pumpinhibitor

    -Anti-refluxsurgery(Nissenfund

    oplication)

    vs.H

    2RA/

    Omep

    razole

    -PD

    Tusing5-ALA

    vs.PDTusingPh

    otofrin

    -PD

    Twith

    different

    treatm

    entparameters

    -RFAvs

    Surveillance(end

    oscopic)

    -APC

    +PPIvs.MPEC+PPI

    -PD

    Tvs.A

    PC+PPI

    -Endo

    scop

    icmucosalresectionvs.RFA

    Cou

    ntry

    ofcond

    uct

    8USA

    ;1India;1Japan

    3USA

    1Brazil;1China;5

    UK;4USA

    Yearspu

    blishe

    d1999–2018

    1998,1999,2014

    2008–2018

    Stud

    ysize

    -20

    to92

    participantspe

    rscreen

    ingmod

    ality

    -60–378

    participants(RCTs)

    -1580

    participants(prospectivecoho

    rt)

    -153and155participants(re

    trospe

    ctivecoho

    rts)

    62,105

    and1210

    participants

    -9to

    208participantsacross

    SRs(m

    ostwith

    <100

    participants)

    -One

    SRrepo

    rted

    onan

    ongo

    ingRC

    Twith

    noresults

    [108]

    Popu

    latio

    nde

    mog

    raph

    ics(Fulltablesin

    Add

    ition

    alfile14)

    Sex

    Men

    :42–99%

    Thesewereno

    trepo

    rted

    inthe

    includ

    edstud

    ies.

    Manyof

    thesewereno

    trepo

    rted

    across

    review

    s.

    Ethn

    icity

    White:41–99%b

    Meanage

    Meanage:48–67yearsold

    Smokers

    43%,80%

    PPIu

    se17%,48%

    ,78%

    BMI

    29.0to

    31.4c

    Outcomes

    notrepo

    rted

    -All-causeor

    cause-specificmortality

    -Qualityof

    life

    -Major

    orminor

    med

    icalproced

    ures

    -How

    patientsweigh

    the

    bene

    fitsandharm

    sof

    screen

    ing

    -EA

    C-related

    mortality

    -Qualityof

    life

    -Overtreatmen

    t

    Hamel et al. Systematic Reviews (2020) 9:20 Page 9 of 25

  • Table

    4Summaryof

    stud

    ies/review

    s(Con

    tinued)

    Keyqu

    estio

    n1

    Keyqu

    estio

    n2

    Keyqu

    estio

    n3

    -Overdiagn

    osis

    -Factorsconsidered

    inde

    cision

    tobe

    screen

    ed-Intrusiven

    essof

    thescreen

    ing

    mod

    ality

    OverallRo

    B-2stud

    ieswerelow

    RoBforhistolog

    icallyconfirm

    edBE

    -Overall,ou

    tcom

    esacross

    comparison

    swereat

    mod

    erateor

    high

    RoBforbo

    thRC

    Tsandob

    servationalstudies

    (Add

    ition

    alfile15:TablesS1

    and2)

    -1stud

    yratedas

    high

    RoB

    (Add

    ition

    alfile15:TableS3)

    -2SRswereratesas

    low

    quality

    and8werecritically

    low

    (Add

    ition

    alfile15:TableS4)

    -Multip

    letoolsused

    toevaluate

    theprim

    aryRC

    Tsinclud

    edin

    theSRs,with

    themajority

    ratedas

    unclear

    orhigh

    RoB(Add

    ition

    alfile15:TableS5)

    OverallGRA

    DE

    Allou

    tcom

    eforallcom

    parison

    wereratedas

    very

    low

    certainty.

    GRA

    DEwas

    notpe

    rform

    edfor

    thisKQ

    .Eviden

    ceformostou

    tcom

    eswas

    ratedas

    very

    low

    certaintyor

    weregivenarang

    eof

    ‘verylow

    tolow’

    (descriptio

    nof

    rang

    esin

    Add

    ition

    alfile13).

    APC

    :Argon

    PlasmaCoa

    gulatio

    n;Multip

    olar

    electrocoa

    gulatio

    n;PD

    T:Ph

    otod

    ynam

    icTh

    erap

    y;PP

    I:Proton

    PumpInhibitor;Ro

    B:riskof

    bias

    aBibliograp

    hysearch,searchforfull-text

    articlesba

    sedon

    abstractsan

    dprotocols

    bRe

    ported

    infivestud

    ies

    c Rep

    ortedin

    four

    stud

    ies

    a Including

    doub

    lecoun

    ting

    Hamel et al. Systematic Reviews (2020) 9:20 Page 10 of 25

  • a

    b

    c

    Fig. 2 a PRISMA flow diagram for KQ 1. b PRISMA flow diagram for KQ 2. c PRISMA flow diagram for KQ 3

    Hamel et al. Systematic Reviews (2020) 9:20 Page 11 of 25

  • at full-text, with reasons for each KQ. A list of ongoingstudies for all KQs are provided in Additional file 17.

    Key question 1. Effectiveness of screeningDetailed characteristics tables for the ten included stud-ies can be found in Additional 14: Table 1, and resultsare described herein. The certainty of the evidence toanswer KQ1a was very low; therefore, KQ1b was notaddressed.

    EGD versus no prior EGDTwo retrospective cohort studies by Rubenstein 2008[47] and Hammad 2019 [48] studied a group of individ-uals with EAC and evaluated their electronic medical re-cords or the institutional cancer registry to examine ifthey had a standard sedated esophagogastroduodeno-scopy (EGD) in the five years prior to cancer diagnosisor not (Additional file 11: Evidence Set 1). InRubenstein 2008, survival data, reported using aKaplan-Meier curve, showed no difference betweensurvival rates at year 1 and 10 [47]. Authors reportthat there was no difference in long-term survival(approximately 6 to 12 years) between those who hadreceived a prior EGD and those who had not (ad-justed HR 0.93, 95% confidence interval (CI) 0.58 to1.50) [very low certainty]. It was difficult to determinea range of effects across studies for survival analysesas the Hammad 2019 study only had one eligible pa-tient with a prior EGD in the past 5 years.

    Both Rubenstein et al. [47] and Hammad 2019 [48]reported information to evaluate whether an EGD inthe previous five years influenced the incidence ofEAC by stage of diagnosis at time of detection. It wasdifficult to determine a range of effects across studiesfor most stage-based analyses as one study only hadone eligible patient with a prior EGD and the stageof diagnosis was unknown (author correspondence)[48]. Rubenstein et al. [47] reported that there maybe a higher odds of a stage 1 diagnosis than a moreadvanced diagnosis (stages 2–4) (OR 2.77, 95% CI1.00 to 7.67; p = 0.0497; Forest Plot 1.1) [very lowcertainty].

    EGD versus TNEFour studies evaluated EGD (sedated) compared tounsedated transnasal esophagoscopy (TNE) (RCTs byChang 2011 [49] and Sami 2015 [50]; a randomisedcrossover study by Jobe 2006 [51]; one cohort study byMori 2010 [52]) (Additional file 11: Evidence Set 2).Sami 2015 [50] evaluated safety, defined as serious adverseevents (life-threatening, severe or medical significant con-sequences of screening), and reported no serious adverseevents in either group [very low certainty].

    Jobe et al. [51] reported on incidence of EAC only onthose who were receiving initial screening (i.e. excludingthose who were being followed with BE). There were nocases of EAC reported [very low certainty]. Three studies[49, 50, 52] defined incidence of endoscopically sus-pected BE differently. The RCTs showed no significantdifference between screening modalities; RR 1.90, 95%CI 0.19 to 19.27 [49] and p = 0.37 [50] [very low cer-tainty]. However, Mori 2010 [52] (cohort study) didshow a significant difference, with those being screenedwith TNE having a higher incidence of suspected BE(RR 2.09, 95% CI 1.30 to 3.36; Forest Plot 2.1) [very lowcertainty]. Two studies reported no difference in inci-dence of histologically confirmed BE between screeningmodalities; p = 0.44 [50] and RR 0.89, 95% CI 0.59 to1.33 [51] [very low certainty]. Incidence of dysplasia waslow, with zero in Chang 2011 [49] and nine (EGD: 5;TNE: 4) in Jobe 2006 [51] showing no difference be-tween screening modalities (RR 1.54, 95% CI 0.44 to5.44; Forest Plot 2.2) [very low certainty].Chang 2011 [49], Sami 2015 [50] and Jobe 2006 [51]

    used the same measurement tool to measure anxiety(psychological effects); however, there were differencesin when the tool was utilized and the reporting of theoutcomes were different (e.g. mean, median, level of se-verity). Therefore, no meta-analysis was performed.There was no difference in anxiety before the procedure(p = 0.084) [51] [very low certainty], less anxiety overallduring the insertion (p = 0.0001) [51] [very low cer-tainty] and during the procedure (p < 0.001 [50] and p =0.0001 [51]) for those who received EGD compared toTNE [very low certainty].

    EGD versus video capsule esophagoscopyOne RCT by Chang 2011 [49] evaluated three outcomes,all with very low certainty (Additional file 11: EvidenceSet 3). There was no difference in the incidence of endo-scopically suspected BE between screening modalities(RR 0.57, 95% CI 0.11 to 3.01; Forest Plot 3.1). Partici-pants with suspected BE based on video capsule esopha-goscopy (VCE) (swallowed device) were offered EGDand BE was confirmed through biopsy. Of the three par-ticipants with suspected BE who received VCE, nonewere histologically confirmed cases of BE. There wasalso no incidence of dysplasia among either group.

    EGD versus transoral-EGDOne cohort study by Mori 2010 [52] allowed participantsto choose between three screening modalities (sedatedEGD, unsedated TNE and unsedated transoral-EGD pre-sented here) (Additional file 11: Evidence Set 4). Overall,there was no difference in the frequency, distribution orseverity in the incidence of endoscopically suspectedBE between modalities in those with grade 2 or 3 BE

    Hamel et al. Systematic Reviews (2020) 9:20 Page 12 of 25

  • (RR 1.30, 95% CI 0.83 to 2.03; Forest Plot 4.1) [very lowcertainty].

    TNE versus VCETwo studies, Chak 2014 [53] and Chang 2011 [49],provided data on four outcomes (Additional file 11:Evidence Set 5). There was no difference betweenscreening modalities for incidence of endoscopically sus-pected BE (RR 0.86, 95% CI 0.29 to 2.56; Forest Plot 5.1)[very low certainty], [49, 53] or for those with histologi-cally confirmed BE (RR 0.62, 95% CI 0.15 to 2.52) [verylow certainty] [53]. Chang 2011 [49] reported that therewere no incidences of dysplasia with either screeningmodality [very low certainty].Those in the unsedated TNE group experienced more

    anxiety, nervousness or worry (psychological effects) be-fore the procedure than those in the swallowed VCEgroup (RR 2.28, 95% CI 1.33 to 3.88; Forest Plot 5.2) [53][very low certainty], and anxiety during the procedure(RR 2.14, 95% CI 1.22 to 3.77; Forest Plot 5.3) [53] [verylow certainty].

    Unsedated TNE versus unsedated transoral EGDOne RCT by Zaman 1999 [54] randomised participantswith upper gastrointestinal (GI) symptoms. Mori 2010[52] (cohort) included those who had previously beenscreened for upper intestinal tract disorders, and allowedparticipants to choose between three screeningmodalities (Additional file 11: Evidence Set 6). Only onecomplication (life-threatening, severe, or medicallysignificant consequence) was reported (facial swellingfollowed by surgical exploration and full recovery),with no differences between screening modalities (RR4.04, 95% CI 0.17 to 95.20; Forest Plot 6.1) [very lowcertainty] [54].Zaman et al. [54] reported no difference between

    screening modalities in the incidence of endoscopicallysuspected BE (three cases total) (RR 0.68, 95% CI 0.07 to7.09; Forest Plot 6.2) [very low certainty]. Mori et al. [52]reported a significant difference in the frequency of BE,with those screened with TNE less likely to have suspectedBE (grade 2 or 3) compared to transoral EGD (RR 0.62,95% CI 0.41 to 0.94; Forest Plot 6.3) [very low certainty].Zaman et al. [54] evaluated the levels of anxiety before

    the procedure, during insertion, and during the procedure(psychological effects). Anxiety was assessed on a scale of10 (higher score representing higher level of anxiety), withno significant difference between levels of anxiety at anytime (Forest Plots 6.4-6.6) [very low certainty].

    Random biopsy versus enhanced magnification-directedendoscopy biopsies (with acetic acid)One RCT by Ferguson 2006 [55] included patients whoreceived standard sedated EGD, with those with

    suspected BE randomised at that point to different bi-opsy methods (Additional file 11: Evidence Set 7). As allparticipants were evaluated on suspected BE throughEGD, only incidence of histologically confirmed BE isreported. There was no difference in the incidence ofhistologically confirmed BE between different methodsof biopsy. This was found in both those with patternIII and IV specialized intestinal metaplasia (RR 0.98,95% CI 0.59 to 1.64; Forest Plot 7.1) [very low cer-tainty] and among all specialized intestinal metaplasiapattern types (RR 1.14, 95% CI 0.71 to 1.82; ForestPlot 7.2) [very low certainty].

    Random biopsy versus chromoendoscopyOne RCT by Wani 2014 [56] included participants whowere given conventional EGD (n = 378) and those withsuspected BE who were randomised to either random bi-opsy (n = 33) or chromoendoscopy (n = 23) (Additionalfile 11: Evidence Set 8). There was no difference in thenumber of participants with histologically confirmed BEbetween methods (RR 0.87; 95% CI 026–2.90; Forest Plot8.1) [very low certainty].

    Key question 2. Patient values and preferencesThree studies (Chak 2014 [53], Zaman 1999 [54] andZaman 1998 [57]) provided information on reasons whyparticipants were unwilling to be part of the study orreasons for deciding against the uptake of screeningonce allocated [53]. Objectives of the included studieswere to determine the acceptance and tolerability of dif-ferent screening modalities and provide data on screen-ing results. Studies reported on those who refusedparticipation prior to study commencement (i.e. eitherprior to being screened or prior to randomisation), butdid not provide participant characteristics on this patientsubset. A narrative summary of the results is providedherein, with detailed results in Additional file 16. Nostudies provided results on how patients weight the ben-efits and harms of screening, factors considered in deci-sion to be screened or intrusiveness of the screeningmodality.

    Willingness to be screenedAll three studies provided reasons on why those askedhad refused to be screened/participate in the study. Alarge proportion of these individuals were in one study[53] with 1026 of the 1210 people asked not participat-ing, and 184 who agreed to participate. Among thosewho did not participate during the invitation period, 627(52%) did not return the phone call or respond to theletter, 385 (32%) refused to participate (with no reasonprovided), 12 (1%) were ineligible and two (0.2%) did notparticipate because of difficulty getting to the hospital.The other two studies by Zaman et al invited 105

    Hamel et al. Systematic Reviews (2020) 9:20 Page 13 of 25

  • outpatients in one study and 62 in the other. Zaman1999 [54] reported 45 of 105 (43%) patients were unwill-ing to participate in the study comparing transnasal toperoral EGD. Zaman 1998 [57] reported 19 of 62 (31%)patients unwilling to participate in the study comparingperoral to sedated EGD.The main reason unwillingness to be screened in both

    studies was due to anxiety, with 17% (18/105) [54] and19% (12/62) [57] of all those asked to participate report-ing this. Both studies also reported that a fear of gaggingwas the reason, with 10% (10/105) [54] and 5% (3/62)[57] reporting this as the reason. Lastly, not being inter-ested in the study (10/105, 10%) [54], not wishing toundergo a transnasal procedure (7/105, 7%) [54] and un-willingness to be a study subject (4/62, 6%) were also re-ported [57].

    Uptake of screeningChak 2014 [53] reported seven individuals (of 184 ran-domised) who did not receive the allocated interventionafter randomisation. Five people randomised to the TNEgroup did not receive the procedure because theywanted capsule instead. Two people randomised to theVCE group did not receive the procedure because theywere worried about the capsule getting stuck. There wasno statistically significant difference in uptake betweenintervention groups (p = 0.25).

    Key question 3. TreatmentThe review characteristics of the 11 included SRs areshown in Additional file 14: Table 3. Additional file 12:Table 1 provides additional details of all primary studiesincluded in each SR, and which treatment comparisonsprovided results in each SR, respectively. Additional file12: Evidence Sets 1-11 provides detailed results andGRADE tables. Some of the individual trials were repre-sented in more than one review since the reviews didnot have mutually exclusive eligibility criteria (Figs. 3and 4). Twenty-two sets of comparisons had overlappingdata across reviews (Additional file 18). In most cases,included studies overlapped completely, according tocorrected covered area (CCA) calculations. In few cases,there was discordance among reviews. Throughout theEvidence Sets 1-11, the word “significance” refers to stat-istical significance unless stated otherwise.

    Celecoxib versus placeboRees 2010 [58] included one primary RCT [59] and re-ported no difference between the groups for all-causemortality [low certainty] and progression to adenocarcinomaat one year [very low certainty] (three cases per group)(Additional file 12: Evidence Set 1.1). For all-cause mortality,there is discordant reporting within the review, wherethe text reports two deaths in the trial, but the forest

    plot reports three deaths in each group. Not presentedin the results table but presented narratively in the SR,review authors stated that the primary trial authorsdid not report any statistical difference for the followingoutcomes: the area of BE segment at 12 months, and inthe reduction in the number of patients progressing fromintestinal metaplasia to dysplasia between baseline and1-year. In addition, review authors reported “no statisticaldifference in the number of patients” with complete eradi-cation of dysplasia at 12 months, and with bleeding in eachgroup.

    Omeprazole versus histamine type 2 receptor antagonistsRees 2010 [58] reported data from three primary studies[60–62], and one was only an abstract [60]. The threestudies had differences with regards to drug dosage andregimens (Additional file 14: Table 4). Results andGRADE ratings are presented in Additional file 12:Evidence Set 2.1. There was no difference in the re-duction in length (cm) of BE at 12 months betweenthe compared groups, and the pooled effect estimatesfor both the overall and subgroups (I2 statistic =62.6% and 60%, respectively) remained non-significantwhen the analysis was restricted to a subgroup whoreceived a higher dose of omeprazole [very low cer-tainty] [61, 62]. There was a small change in the re-duction in area (%) of BE with omeprazole that wasstatistically significant at 12 months [very low to lowcertainty] [61, 62].

    Photodynamic therapy + omeprazole versus omeprazolealoneTwo unique [63, 64] trials (from three studies) [63–65]reported across four SRs [58, 66–68] reported on pa-tients with BE. Overholt 2007 [63] provided 5-yearfollow-up data for progression to EAC, with Overholt2005 [65] providing 2-year follow-up data for other out-comes for the same trial participants (Additional file 12:Evidence Set 3.1). Overholt 2005 [65] and Ackroyd 2000[64] reported on all-cause mortality, using photodynamictherapy (PDT) with either 5-ALA or porfimer sodium,respectively. Overholt et al. reported no statistically sig-nificant difference between groups, but this was basedon few observed events (n = 3) and Ackroyd et al. ob-served no deaths [very low certainty].At both two- (OR 0.38, 95% CI 0.18 to 0.77) [65] and

    five- (RR 0.53, 95% CI 0.31 to 0.91) [63] years, there was astatistically lower progression to EAC with combined ther-apy than with omeprazole alone [very low to low certainty].Progression from non-dysplastic to dysplastic BE was statis-tically lower with combined therapy (n = 0) compared tothe omeprazole group (n = 12) [very low certainty] [64].Both reviews show higher eradication of dysplasia

    with combined therapy [very low to low certainty];

    Hamel et al. Systematic Reviews (2020) 9:20 Page 14 of 25

  • however, there were some data discrepancies betweenreviews [58, 67] for both studies [64, 65]. Li 2008[67] provided data among those with HGD from thesame studies as the eradication of dysplasia outcome.It is unclear why more participants experienced eradi-cation of HGD than dysplasia in general, as the de-nominators are the same. There was highereradication with PDT combined with Omeprazole[very low to low certainty]. Overholt 2007 [63] re-ported that eradication of BE by 5 years was statisti-cally greater with combined therapy (OR 14.18, 95%CI 5.38 to 37.37) [very low to low certainty].One study with 36 participants (reported in three re-

    views) reported on reduction/regression of BE usingvarious measures [58, 67, 68]. Statistically significant re-ductions in both length and area of BE were observedwith combined therapy [64] in two reviews [very low cer-tainty] [58, 67]. Fayter et al. [68] provided results of evi-dence of regression (not further described), with muchhigher percentage of those in the combined group ex-periencing regression (89% vs. 11%) [very low certainty].

    There were fewer absolute treatment failures of BEwith combined therapy [very low certainty] [64, 65].Statistically significantly more strictures formed

    with combined therapy (49/138) compared to theomeprazole treatment group (0/70) in one study[very low to low certainty] [65].

    Anti-reflux surgery + Argon plasma coagulation versus anti-reflux surgery + surveillance (endoscopic)Three systematic reviews [58, 66, 67] reported datafrom a single trial with two publications [69, 70] onsix outcomes (Additional file 12: Evidence Set 4.1).Nissen fundoplication was used for anti-reflux sur-gery. Ackroyd 2004 [70] was a short-term follow upof the patients, with longer-term follow up presentedin Bright 2007 [69]. No patients progressed to cancer[very low certainty] [69]. Based on sparse events (twoinstances in the surveillance group) in Bright 2007[69] (in Li 2008 [67]), no difference between thetreatment effects was observed for progression toHGD (from LGD) [very low certainty]. Bright 2007

    Fig. 4 Map of Systematic Reviews and Primary RCTs

    Hamel et al. Systematic Reviews (2020) 9:20 Page 15 of 25

  • [69] provided 5-year follow-up data for progressionfrom intestinal metaplasia to dysplasia, and reportedno difference between the two groups (two cases ofprogression in the surveillance group) [very low cer-tainty] [58, 69].The effect estimate favoured Argon plasma coagula-

    tion (APC) [69] at 12 months for complete eradicationof BE [very low certainty]. Note: the data presented inthe forest plot differed from the data in the text [58, 69].No difference was observed between the treatmentgroups for complete ablation (among those with histo-logical change) [69] in Li 2008 [very low certainty]. Ack-royd 2004 [70] in De Souza 2014 [66] reported that nodifference in treatment failure was observed between thecompared groups [very low certainty].

    Radiofrequency ablation + proton pump inhibitor versusPPI aloneThree systematic reviews [58, 71, 72] reported data fromShaheen 2009 [73] (Additional file 12: Evidence Set 5.1).Rees et al. [58] included patients with both low- andhigh-grade dysplasia; however, Qumseya 2017 [71] andPandey 2018 [72] restricted their reporting to patientswith low-grade dysplasia. Five participants progressed toEAC at 5 years or at the latest timepoint of follow-up(RFA + PPI: 1/84; PPI: 4/43) [58], resulting in no difference

    between the compared treatments [low certainty]. Amongthose with LGD, none progressed to EAC over thefollow-up period [low (Rees 2010) and very low certainty(Qumseya 2017)] [58, 71].Fewer patients progressed to higher grades of dysplasia

    with the radiofrequency ablation (RFA) treatment [lowcertainty] [58]. However, there is discrepancy in how thisoutcome is labelled in the review. The text says therewas no data for those progressing from IM to dysplasiaand labels it as progression to higher grades of dysplasia,but the forest plot is titled progression from IM to dys-plasia. When the outcome was restricted to progressionto HGD among patients with LGD, no difference wasobserved [very low certainty] [71, 72].There was a statistically significant difference favour-

    ing RFA for complete clearance of intestinal metaplasia(RR 17.81, 95% CI 2.61–121.54) [very low certainty] [72],for complete clearance of dysplasia (OR 22.67, 95% CI8.72 to 58.94) [58] [low certainty], which remained whenrestricted to patients with LGD (OR 0.03, 95% CI 0.01–0.13) [very low certainty] [72], and for complete eradica-tion of BE (OR 143.53, 95% CI 18.53–1113.87) [low cer-tainty] [58]. De Souza 2014 [66] showed higher rate oftreatment failure in the proton pump inhibitor (PPI)treatment group compared to the RFA + PPI group(RFA + PPI: 19/84; PPI: 42/43) [very low certainty].

    Fig. 3 Primary studies and conditions overlap among the systematic reviews

    Hamel et al. Systematic Reviews (2020) 9:20 Page 16 of 25

  • There was no difference between treatment effects forstricture formatio [58] [very low certainty]. There wereno instances of perforation reported [72] [very low cer-tainty], and only one study participant developed bleed-ing, but data was not presented per arm [72] [very lowcertainty].

    Anti-reflux surgery (Nissen fundoplication) versus H2receptor agonist/omeprazoleTwo systematic reviews [58, 67] reported data from Par-rilla 2003 [74] on five outcomes. Overall, the certainty ofthe evidence was very low for all outcomes (Additionalfile 12: Evidence Set 6.1). No deaths (all-cause mortality)were reported in either group [58].Few participants progressed to EAC, with two in each

    group (not statistically significant) [58]. Rees 2010 [58] re-ported a significant difference in the incidence of progres-sion to dysplasia from intestinal metaplasia, with lessprogression in the surgical treatment group comparedwith the pharmacological treatment group. Although Liet al. [67] included the same primary study, the incidencein the surgery group differed from Rees et al., and demon-strated no significant difference between the groups [58,67]. Because different data were reported for the interven-tion groups, this led to discordant results between reviews.Although some participants experienced eradication of

    dysplasia (surgery: 5/58, H2 receptor antagonist/omepra-zole: 3/43) at 5-year follow-up, this was not statisticallydifferent between treatment groups [58]. None of theparticipants experienced complete eradication of BE at 5years in either treatment group [58].

    PDT with 5-aminolevulinic acid versus PDT with porfimersodiumMacKenzie 2008 [75] in Rees 2010 [58] reported prelim-inary data only in abstract form and recruitment had notyet been completed. The certainty of evidence was verylow for both outcomes (Additional file 12: Evidence Set7.1). There was no statistically significant difference ineradication of HGD between the treatment groups(preliminary results included 14 patients in each treat-ment group) [75].These preliminary results showed no difference be-

    tween treatment groups in stricture formation.

    Photodynamic therapy with different treatment parametersA SR by Fayter 2010 [68] with three primary studies[76–78], one of which was an abstract [76], compareddifferent parameters in the PDT treatment. The certaintyof the evidence was very low for cancer risk, and rangedfrom very low to low for the remaining four outcomes(Additional file 12: Evidence Set 7.2). Generally, higherdoses and red light had lower cancer risk and lower ratesof adenocarcinoma [76]. These results were considered

    significant, but were taken from an abstract, so shouldbe interpreted with caution.

    Radiofrequency ablation versus surveillance (endoscopic)Phoa 2014 [79] reported in two systematic reviews [71,72], included patients with BE with low-grade dysplasia.These reviews also included another primary study byShaheen et al. [73]; however, results from this study arepresented in Evidence Set 5.1 as another review [58]states that both treatment groups also received pharma-cological therapy (Additional file 12: Evidence Set 8.1).There were seven people with progression to EAC (RFA:1/68, Surveillance: 6/68) [very low certainty]. Progressionper patient-year is also presented [very low certainty].Qumseya 2017 [71] reported data as cumulative progres-sion from LGD to HGD [very low certainty] and pro-gression per patient-year [very low to low certainty]. Fewevents were observed (RFA: 0, Surveillance: 12). Pandey2018 [72] demonstrated a marginally statistically signifi-cant results favouring RFA (RR 0.03, 95% CI 0.00 to0.44) [very low to low certainty] [72]. Although Pandeyand Qumseya reported discrepant data for the surveil-lance group in the number of patients with progressionto HGD, 18 and 12, respectively, effect estimates aresimilar between reviews.RFA resulted in more patients with complete eradica-

    tion of dysplasia (RR 3.52, 95% CI 2.40 to 5.17) [very lowto low certainty] [72]. A favourable treatment effect wasobserved with RFA for complete eradication of intestinalmetaplasia (RR 123.30, 95% CI 7.78 to 1954.10) [very lowto low certainty] [72].Eight strictures were formed among the study popu-

    lation; however, data was not reported per arm [verylow to low certainty] [72]. None of the study patientsdeveloped perforations [very low to low certainty][72], and only one study participant developed bleed-ing, but data was not reported per group [very low tolow certainty] [72].

    Argon plasma coagulation + PPI versus multipolarelectrocoagulation + PPIRees 2010 [58] reported on two primary studies(Additional file 12: Evidence Set 9.1) [80, 81], with noinstances of mortality (all-cause) reported [very low tolow certainty] and one case of stricture formation in theArgon plasma coagulation (APC) + PPI group [very lowcertainty].

    Multipolar electrocoagulation + PPI versus Argon plasmacoagulation + PPITwo SRs [66, 67] reported the same two primary studiesas Evidence Set 9.1; however, the intervention and com-parison groups are reversed (Additional file 12: EvidenceSet 9.2) [80, 81]. Both outcomes are presented as one

    Hamel et al. Systematic Reviews (2020) 9:20 Page 17 of 25

  • review provided the pooled OR (OR 2.01, 95% CI 0.77 to5.23) [very low certainty] for histological complete abla-tion of BE [67] and the other provided the pooled riskdifference (RD − 0.14, 95% CI − 0.33 to 0.05) [very lowcertainty] for treatment failure (the opposite of completeablation). Both favour multipolar electrocoagulation(MPEC) + PPI [66].

    Photodynamic therapy versus Argon plasma coagulation +PPIFive systematic reviews [58, 66–68, 82] reported on sixprimary studies [83–88] of which some were abstracts(e.g. Zoepf 2003 [87]) (Additional file 12: Evidence Set10.1). There were many differences between the SRs andthe primary studies within the SRs in how comparisongroups were reported, heterogeneity between therapytypes (e.g. PDT with 5-ALA or Porfimer sodium), differ-ences in their drug dosing and light delivery regimens[58] and differences in the participants who were in-cluded in the analyses (e.g. all levels of dysplasia or LGDonly). Rees 2010 [58] reported on three studies [84–86],with a combined incidence of all-cause mortality ofone in the PDT group and none in the APC + PPI group[very low certainty] [84].Almond 2014 [82] reported on three studies [84, 86,

    88] in participants with LDG. One incident case of EACby 12 months in the PDT group was reported [very lowcertainty]. Almond et al. [82] reported no events of pro-gression to high-grade dysplasia among 17 participants[very low certainty] [84, 86].Rees 2010 [58] and Almond 2014 [82] show discrepant

    data for the PDT group in Ragunath et al. [86]. Thenumber of patients experiencing complete eradication ofdysplasia was reported as 10/13 in Rees 2010, and 8/11in Almond 2014 [very low certainty]. As Almond et al.included only those with low-grade dysplasia, it mightbe that the two additional participants in Rees et al. hadhigh-grade dysplasia, although this is not clearly re-ported. Five SRs [58, 66–68, 82] reported on PDT versusAPC + PPI and how it affected BE in five primary stud-ies [83–87]. These reviews reported the outcomes in sev-eral ways: complete ablation of BE, eradication of BE,reduction of BE (length, surface reduction) and treat-ment failure (no ablation). Overall, there was a high levelof heterogeneity among studies and in the results withvery low certainty in all of these outcomes except the re-duction in length (cm) which was rated as low certainty.Determining concordance of results across reviews wasdifficult due to the differences in how information wasreported. Almond 2014 [82] reports on Ragunath 2005[86], reporting no difference between treatments ineradication of intestinal metaplasia (two participants ineach group) [very low certainty].

    Both Rees 2010 [58] and Almond 2014 [82] reportedon stricture, with Rees 2010 including three primarystudies [84–86] and Almond 2014 only including Ragu-nath 2005 [86]. Although there was discordance in thenumber of those experiencing stricture, neither reviewreported any difference between treatment groups [verylow certainty].

    Endoscopic mucosal resection versus radiofrequencyablationThree SRs [89–91] included patients with BE and intra-mucosal neoplasia (i.e. early stage adenocarcinoma). Al-though both Fujii-Lau et al. [90] and Chadwick et al.[89] include Shaheen 2011 [92] as an included study, be-cause only one of the treatment groups was consideredrelevant for those reviews, neither reported the resultsfrom the placebo group. Therefore, results from Shaheen2011 [92] are not presented (Additional file 12: EvidenceSet 11.1). All three reviews provided results for bothtreatment groups for the primary study of van Vilsteren2011 [93], although all three reviews also label the treat-ment groups differently (e.g. stepwise EMR vs. focalEMR + RFA, EMR vs. RFA, complete EMR vs. RFA).Both endoscopic mucosal resection (EMR) and radiofre-quency ablation (RFA) eradicated neoplasia (eradicationof cancer) in most cases (EMR: 100%; RFA: 96%), withno difference between treatments [very low certainty][91]. Eradication of dysplasia was completed in almostall participants at the end of the treatment and atfollow-up. Only one participant in the RFA group didnot have complete eradication at the end of treatmentand follow-up [very low certainty] [89]. Almost all par-ticipants experienced complete eradication of intestinalmetaplasia, although there was slight discordance amongthe percentages reported in the two reviews [very lowcertainty] [89, 91].Only one participant in the EMR treatment group ex-

    perienced recurrence of cancer [very low certainty] [90],no participant experienced recurrence of dysplasia [lowcertainty] [90] and two participants in each treatmentgroup experienced recurrence of intestinal metaplasia[very low certainty] [90].Two SRs [89, 91] reported on bleeding, with some data

    discrepancies, but overall concordant results. One SR[89] reported that among the 25 participants in the EMRgroup, only one participant experienced perforations. Noone in the RFA group experienced this outcome. Mostparticipants receiving EMR treatment experienced stric-tures (22 of 25, 88%) compared to only three of 22 (14%)in the RFA group. Review authors did not provide effectestimates, but a risk ratio of 6.45 (95% CI 2.23 to 18.66)for EMR compared to RFA was calculated using thesedata [91]. Almost all participants receiving EMR

    Hamel et al. Systematic Reviews (2020) 9:20 Page 18 of 25

  • experienced stenosis requiring treatment (88%, 22/25),with only three of 21 (14%) experiencing stenosis in theRFA group [89]. This difference was statistically signifi-cant with a calculated risk ratio of 6.45 (95% CI 2.23–18.65) for EMR compared with RFA. All of these adverseevents were rated as very low certainty.

    DiscussionEsophageal cancer, although lower in incidence relativeto other cancers, has a higher mortality rate, partly dueto a more advanced stage at diagnosis, when the canceris widely spread to other vital organs and is incurable.This makes the consideration of whether to invest inscreening services important. In 2012, a Cochrane sys-tematic review by Yang et al. [94] set out to include onlyRCTs comparing screening versus no screening, andfound no studies meeting their inclusion criteria. Fiveyears later, this systematic review found no additionalrandomised controlled trials comparing screening to noscreening. Among the few studies that have assessed theeffectiveness of screening of individuals with chronicGERD, there exists several limitations (e.g. small samplesizes, one-time screening test with no follow-up). Al-though there may be higher odds of stage 1 diagnosis ifan EGD had been performed in the previous 5 years, thestudy included a small number of cases, resulting in lowprecision [47]. Those diagnosed at earlier stages (T1 andT2) can be treated with potentially curable therapies, forexample, esophagectomy in patients with high-gradedysplasia and stage T1a cancer has been associated witha greater survival; 89% at 1 year, 77% at five years and68% at 10 years [95]. Comparatively, those with latestage cancer that cannot be cured by surgery receivechemotherapy/chemoradiation and have a 15% 5-yearsurvival rate [2].There was little difference in the incidence rates of

    EAC, BE and dysplasia using alternative screeningmethods. Although EGD with biopsy is considered thegold standard for the diagnosis and surveillance of BE[96, 97], the results from these studies may encourageincreased usage of alternative methods of screening forBE and EAC. Conventional EGD uses sedation, whichincreases the cost of screening (e.g. monitoring patientspost-procedure) and resources used (e.g. availability of agastroenterologist, recovery room). Alternate methodsdo not require sedation, can be done in a primary caresetting and require little monitoring post-procedure. Instudies where participants who had experienced a previ-ous screening and were allowed to then select whichscreening modality they wanted, there was a preferencetowards unsedated methods. Of the 1574 participants,721 (46%) chose transnasal, 599 (38%) chose transoraland 254 (16%) chose EGD [52]. Further supporting pa-tient choice of screening modality, RCTs reported higher

    levels of dropouts and anxiety among those randomisedto TNE compared to other screening modalities, al-though not always significant. The perceived discomfortof the unsedated transnasal procedure could contributeto increased anxiety.When considering patient values and preferences for

    screening, the data is also sparse. Three studies reportedon the willingness, or in this case the unwillingness, toparticipate and be screened in a study on screening forEAC and precancerous conditions. One study also pro-vided outcome information on uptake of screening,more specifically reasons why they did not uptakescreening after allocation. No other outcomes of interestwere addressed in these studies, overall providing littleevidence to answer the KQ2. We are not aware of anyother reviews that have been done in the area of upperGI screening in relation to how patients weigh the bene-fits and harms of screening and what factors contributeto these preferences and to their decision to undergoscreening, so there is nothing to compare it to.In our overview evaluating treatment for BE, with or

    without dysplasia, and early-stage adenocarcinoma(KQ3), 11 SRs were included. Treatment modalities cov-ered pharmacological therapy, various ablative tech-niques, surgery and some combinations thereof, with amix of statistically significant and non-significant results,meaning that treatment may show an effect on someoutcomes and little to no effect on others. However,there were few studies, all with small sample sizes byoutcome, and for many outcomes, only one study pro-vided results, thereby providing little information withwhich to gauge the certainty of the evidence. In consult-ation with clinical experts, in addition to evidence fromretrospective and prospective clinical series (e.g. AIMtrial [92]), and registry data, certain treatments are cur-rently considered as the standard of care. For example,BE with HGD should be treated with ablation and T1aesophageal cancer (EAC and ESCC) should be treatedwith endoscopic resection (either endoscopic mucosalresection or endoscopic submucosal resection).

    LimitationsBoth reviews and the overview of reviews were devel-oped using rigorous methodological standards, as de-tailed a priori in registered protocols. There may,however, still be some limitations. There is a risk ofmissing studies, although we minimized this risk bysearching multiple databases and using several tech-niques to search for grey literature. We included onlyEnglish and French language studies, and some studieswere excluded because we could not get access to thefull text (i.e. not available through open access journalsor through interlibrary loans). There is a chance thatsome of these records may have met the inclusion

    Hamel et al. Systematic Reviews (2020) 9:20 Page 19 of 25

  • criteria and provided additional results. In KQ3, most re-cords (68%) were excluded during our screening phasedue to not meeting the pre-defined SR definition [98].Reason for exclusion were mainly lack of quality assess-ment of primary studies and not a study design of inter-est (either a narrative review or clinical practiceguideline based on a non-systematic literature review).Consequently, there is a chance that our conclusionsmay not be reflective of the totality of relevant, existingevidence. Updating the evidence base is an important re-search agenda item. Among those that did meet our pre-defined definition, some were excluded because theyonly included observational studies, or did not separateresults of RCTs from observational studies.When evaluating the results for the effectiveness of

    screening (KQ1), given the very low certainty of the evi-dence, true effects may be substantially different or un-certain in light of limitations in the body of evidence.There were several important methodological limitationsleading to a moderate or high risk of bias among allstudy outcomes. The few included studies, and generallysmall sample sizes leads to imprecise results that couldnot be assessed for consistency or publication bias. Atrend that may continue in this area, as half of the po-tentially relevant ongoing trials are expecting samplepopulations of less than 200 participants (Additional file17). Blinding of participants to screening modality wasnot possible in these studies. The inability to blind pa-tients could affect psychological outcomes, as a patientmight have a preference to one screening modality overanother. When evaluating the results for patient valuesand preferences (KQ2), it was difficult to accurately as-sess RoB for these studies, as the primary purpose of theincluded studies was to evaluate acceptability afterscreening and effectiveness of the screening modality, adifferent lens to the context of our review. Most out-come data were collected before randomisation, and asthere is no formal tool to assess RoB prior to randomisa-tion, these outcomes were not assessed. Measurementbias may be present, as studies did not clearly state howthis outcome data were collected. It is not clear how thedata were collected among those who refused participa-tion during the consent period, as there is no mention ofquestionnaires or if and how study personnel collectedthis information. Only the uptake of screening outcomein one study stated that a non-completion questionnairewas given to ascertain reasons for non-completion. Itwas difficult to assess the inconsistency among the in-cluded studies, mainly due to a lack of informationamong the studies contributing to outcome results. Forexample, the largest study invited 1210 participants, with38% (385/1026) of those declining to participate not pro-viding any information on why they refused. Poorreporting of patient information for those who

    contributed outcome data was seen in all studies. Nonereported on the age and sex of these participants, andindication for screening (as described above), making itdifficult to understand how comparable these studiesmight have been. Similarly, the quality of the evidencefor treating BE, dysplasia and early-stage cancer (KQ3)was low or very low across the comparisons and out-comes, indicating uncertainty that the observed effectswould be representative of the true underlying effect.Poor reporting was a barrier in assessing all domains.Additionally, items within tools such as the Jadad scoreand Downs & Black do not directly translate to consid-erations that GRADE guidance suggests for assessingrisk of bias. The current limited evidence originated in11 poorly conducted reviews (two rated as low qualityand nine rated as critically low quality), from smallRCTs (published between 1996 and 2011 with one pub-lished in 2014) with unclear or high risk of bias withshort follow-up times. Multicenter trials are needed toincrease the power of the evidence base. The lack of alonger patient follow-up time to inform outcomes maybe explained by patient retention issues or the cost offollowing patients long-term.The lack of a definition of chronic GERD, or even how

    studies defined GERD, leads to a serious concern for thedirect generalizability of the population represented inthese studies to the target population of this review.Among studies that did provide a description on howGERD was defined, not all studies used a validated ques-tionnaire to define GERD, while some defined GERD in-clusion based on “typical symptoms”. Some studies didnot define GERD at all. A standardized definition ofchronic GERD would allow trialists to better identify thepopulation of interest. Additionally, as more data accrue,this may lead to more certainty as to whom the evidencewould apply (i.e. directness) and with greater precision ofthe estimate and better quality of conduct (and reporting).Several outcomes of interest, including mortality, qual-

    ity of life and overdiagnosis, were not reported in any ofthe included studies (KQ1). This is mostly because thestudy results were cross-sectional in nature and theseoutcomes would require follow-up. In the absence of theoutcomes of interest to calculate overdiagnosis, we wereuna