Diagnostic and prognostic aspects of tubal patency testingChapter 2 Quality of reporting of test...
Transcript of Diagnostic and prognostic aspects of tubal patency testingChapter 2 Quality of reporting of test...
UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)
UvA-DARE (Digital Academic Repository)
Diagnostic and prognostic aspects of tubal patency testing
Coppus, S.F.P.J.
Publication date2012Document VersionFinal published version
Link to publication
Citation for published version (APA):Coppus, S. F. P. J. (2012). Diagnostic and prognostic aspects of tubal patency testing.
General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an opencontent license (like Creative Commons).
Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, pleaselet the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the materialinaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letterto: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. Youwill be contacted as soon as possible.
Download date:11 May 2021
Diagnostic and prognosticaspects of tubal patency testing
Sjors Coppus
Diagnostic and prognostic aspects of tubal patency testing Sjors Coppus
23138 Coppus_Omsl 01-10-12 14:16 Pagina 1
Diagnostic and prognostic aspects of tubal patency testing
Sjors Coppus
23138 Coppus.indd 1 28-09-12 14:55
Financial support for the publication of this thesis has been kindly provided by:
Conceptus BV, Goodlife Pharma, MSD Nederland, Johnson & Johnson Medical BV,
ChipSoft BV, Ferring Pharmaceuticals, Covidien Nederland BV, Guerbet Nederland
BV, Karl Storz Nederland BV, ERBE Nederland BV, Hologic Benelux BV, Memidis
Pharma BV, Medical Dynamics, Stöpler Instrumenten & Apparaten BV
Cover: Fallopian tube, original tissue paper sculpture by Polly and Tom Verity
Layout: Ferdinand van Nispen, Citroenvlinder-DTP.nl, Bilthoven
Print: GVO Drukkers & Vormgevers B.V. | Ponsen & Looijen, Ede
ISBN: 978-90-6464-586-0
© 2012 Sjors Coppus
23138 Coppus.indd 2 28-09-12 14:55
Diagnostic and prognostic aspects
of tubal patency testing
ACADEMISCH PROEFSCHRIFT
ter verkrijging van de graad van doctor
aan de Universiteit van Amsterdam
op gezag van de Rector Magnificus
prof. dr. D.C. van den Boom
ten overstaan van een door het college
voor promoties ingestelde commissie,
in het openbaar te verdedigen in de Agnietenkapel
op 4 december 2012, te 10.00 uur
door
Sjors Franciscus Petrus Joannes Coppus
Geboren te Deurne
23138 Coppus.indd 3 28-09-12 14:55
Promotiecommissie
Promotores: Prof. Dr. B.W.J. Mol
Prof. Dr. F. van der Veen
Copromotor: Dr. B.C. Opmeer
Overige leden: Prof. Dr. S. Bhattacharya
Prof. Dr. M.P.M. Burger
Prof. Dr. J.L.H. Evers
Prof. Dr. M.J. Heineman
Prof. Dr. J.A. Land
Prof. Dr. E.W. Steyerberg
Faculteit der Geneeskunde
23138 Coppus.indd 4 28-09-12 14:55
That slender and narrow seminal duct rises,
fibrous and pale, from the horns of the uterus itself;
becomes, when it has gone a little bit away, appreciably broader,
and curls like a branch until it comes near the end,
then losing the horn-like curl, and becoming very broad,
has a distinct extremity which appears fibrous and fleshy through its red color,
and its end is torn and ragged like the fringe of well-worn garments,
and it has a wide orifice
which lies always closed through the ends of the fringe falling together;
and, if these be carefully separated and opened out,
they resemble the orifice of a brass trumpet.
Wherefore since the seminal duct from its beginning to its end
has a likeness to the bent parts of this classic instrument,
separated or attached, therefore,
it has been called by me uteri tuba (trumpet of the uterus).
Gabriele Fallopius, 1561
23138 Coppus.indd 5 28-09-12 14:55
Contents
Chapter 1 Introduction 9
Part 1 Methodological considerations 25
Chapter 2 Quality of reporting of test accuracy studies in reproductive
medicine: impact of the standards for reporting of diagnostic
accuracy (STARD)-initiative.
Fertility and Sterility 2006, 86: 1321-1329.
27
Chapter 3 Evaluating prediction models in reproductive medicine.
Human Reproduction 2009, 24: 1774-1778.
47
Part 2 Clinical studies 61
Chapter 4 The predictive value of medical history taking and Chlamydia
IgG ELISA antibody testing (CAT) in the selection of subfertile
women for diagnostic laparoscopy: a clinical prediction model
approach.
Human Reproduction 2007, 22: 1353-1358.
63
Chapter 5 Identifying subfertile ovulatory women for timely tubal
patency testing: a clinical decision rule based on medical
history.
Human Reproduction 2007, 22: 2685-2692.
79
Chapter 6 The capacity of hysterosalpingography and laparoscopy to
predict natural conception.
Human Reproduction 2011, 26: 134-142.
99
Chapter 7 Chlamydia trachomatis IgG seropositivity is associated with
lower natural conception rates in ovulatory subfertile women
without visible tubal pathology.
Human Reproduction 2011, 26: 3061-3071.
117
Part 3 Discussion, summary and epilogue 133
Chapter 8 General discussion 135
Chapter 9 Summary 145
Samenvatting 153
Chapter 10 Epilogue 161
23138 Coppus.indd 6 28-09-12 17:06
Part 4 Appendices 169
Chapter 11 List of co-authors and their affiliations
List of publications
Dankwoord
About the author
171
175
179
185
23138 Coppus.indd 7 28-09-12 16:57
23138 Coppus.indd 8 28-09-12 14:55
1Introduction
23138 Coppus.indd 9 28-09-12 14:55
Chapter 1
10
Subfertility is customarily defined as failure to conceive after regular unprotected
sexual intercourse for one year. The prevalence of subfertility is around 14%,
affecting one in seven couples (Hull et al., 1985). Data from historical populations
estimate the average prevalence of subfertility to be 5.5%, 9.4% and 19.7%,
respectively, at ages 25-29 years, 30-34 years and 35-39 years (Bongaarts, 1982). In
the Netherlands, this percentage is between 12% and 17%, depending on female
age (Bonsel and van der Maas, 1994).
A diagnosis of unexplained subfertility is usually made after it has been
demonstrated that a woman has a regular ovulation, patent Fallopian tubes with
no evidence of peritubal adhesions, fibroids or endometriosis and that the male
partner has normal semen quality (Simon and Laufer, 1993). The standard fertility
work-up starts with a thorough medical history and physical examination followed
by the assessment of ovulation and a semen analysis. Assessment of tubal
patency is usually reserved as last test in this work-up. Tubal disease includes tubal
obstruction and pelvic adhesions due to infection, endometriosis and previous
surgery and accounts for subfertility in 10-30% of the couples (Evers, 2002). Several
diagnostic tests to assess the tubal condition are available to the clinician.
Hysterosalpingography (HSG)
HSG was first performed in 1910 by Rindfleish, who injected a bismuth solution in
the uterine cavity (Rindfleish, 1910). In 1914, the first report on HSG for determining
tubal patency with an oil-soluble contrast medium was published (Carey, 1914).
Carey used Collargol which could cause significant tissue damage and was painful
to the women (Hanlo, 1930). Because of these serious adverse events, its use was
abandoned and a tubal insufflation test was introduced by Rubin in 1920 (Rubin,
1920). Rubin insufflated oxygen (later carbon dioxide) under pressure through the
cervical canal into the uterine cavity. Tubal patency was determined by presence of
air under the diaphragm on X-ray, by auscultation of air flow into the abdomen or
a drop in pressure during insufflation. In 1925, Heuser was the first to report on the
use of lipiodol in HSGs (Heuser, 1925). Lipiodol, like Collargol an oil-soluble contrast
medium but with a lower viscosity, proved less toxic and was widely accepted
(Hanlo, 1930). Gradually lipiodol was replaced by a water-soluble contrast medium
(WSCM) for several reasons: lower cost, better imaging of the tubal mucosal folds
23138 Coppus.indd 10 28-09-12 14:55
Introduction
11
1and ampullary rugae (Soules and Spandoni, 1982), lower viscosity and more
prompt demonstration of tubal patency reducing the need for a delayed film,
lower likelihood of persistence of contrast medium within the pelvic cavity and of
complications such as anaphylaxis or long-term lipogranuloma formation. At HSG
the contrast medium is slowly injected through the cervical canal into the uterine
cavity. By fluoroscopy (continuous X-ray imaging) the flow of contrast can be
followed and the uterine cavity and lumen of the fallopian tube can be visualized.
HSG is performed as an outpatient procedure and can be painful. Some clinics
still use oil-soluble contrast medium (OSCM), because no severe side-effects have
been reported since the introduction of fluoroscopy screening during injection of
contrast and the abandonment of the procedure when intravasation of contrast
occurs (Lindequist et al., 1991). Also, lipogranuloma formation has not been
reported in women with patent tubes following OSCM (Acton et al., 1988).
Nevertheless, most clinics use WSCM these days, although there is evidence that
flushing of the tubes with OSCM causes less pain and has a positive effect on
pregnancy rates when compared with WSCM (Glatstein et al., 1998; Luttjeboer et al.,
2007). The main complication of HSG is pelvic infection after the procedure, which
can occur in 1-3% of all cases and up to 10% if tubal pathology is present (Stumpf
and March, 1980; Pittaway et al., 1983; Forsey et al., 1990). The risk of infection can
be reduced by Chlamydia culture of the cervix and prophylactic antibiotics in
women with a medical history of pelvic infections (Pittaway et al, 1983), although
some authors advocate prophylactic antibiotics before uterine instrumentation
instead of endocervical screening (Land et al., 2002).
Laparoscopy and dye
The first diagnostic laparoscopy (DL) in humans was applied by Jacobeus in Sweden
in 1910, by inducing a pneumoperitoneum and introducing a cystoscope into the
peritoneal cavity. Initially this technique to diagnose intra-abdominal abnormalities
was mainly used by physicians. Kalk in Germany was principally responsible for
developing laparoscopy into an effective diagnostic and surgical procedure in the
early 1930s (Trimbos-Kempers, 1981; Te Linde’s Operative Gynecology, 1992). It
was not until 1946 before Palmer in France introduced laparoscopy and dye test
(synonyms: dye pertubation, chromopertubation) into gynaecologic practice for
23138 Coppus.indd 11 28-09-12 14:55
Chapter 1
12
the assessment of tubal patency and published his first 250 cases (Palmer, 1946).
In this procedure, which requires general anaesthesia and operating facilities, a
water-soluble dye (usually methylene blue), is injected into the uterine cavity.
Through the laparoscope, the pelvic cavity can be inspected for presence of
pelvic-peritoneal adhesions, endometriosis and tubal patency. Although the uterine
anatomy can be assessed, intrauterine abnormalities can only be visualized if the
procedure is combined with hysteroscopy. This can easily be performed in the same
setting. Like HSG, DL is an invasive procedure and can result in complications such
as infection, vascular-, intestinal- and urologic injuries, which have been reported to
vary between 0.06 and 1.5% (Jansen et al., 1997; Chapron et al., 1998; Härkki-Siren
et al., 1999). In the nineties of the previous century, laparoscopy was integrated as
an essential diagnostic procedure in infertility protocols (ASRM guidelines, 1992;
WHO/Rowe et al., 1993), based on one publication that reported abnormal findings
at laparoscopy in 75% of 24 couples with otherwise unexplained subfertility (Drake
et al., 1977). In 1997, it was reported that 96% of the reproductive endocrinologists
in the USA routinely performed an HSG and 89% a diagnostic laparoscopy in the
diagnostic fertility work-up (Glatstein et al., 1997).
A diagnostic laparoscopy can be combined with interventions like coagulation
or excision of minimal to mild endometriosis, adhesiolysis or cystectomy. In a
randomized controlled trial, it was found that laparoscopic resection or ablation of
minimal and mild endometriosis and adhesiolysis enhances fecundity in subfertile
women (Marcoux et al., 1997). They reported that one in eight women with this
condition should benefit from surgery, but the monthly fecundity rate among
women who underwent surgery (6.1%) was much lower than the rate expected
in fertile women (20%). Whether DL with dye pertubation has a positive effect on
pregnancy rate is not known. One randomised controlled trial showed no difference
between pertubation with an OSCM versus a WSCM (Al Fahdli et al., 2006).
Chlamydia trachomatis antibody testing
Chlamydia trachomatis infection, a sexually transmitted disease (STD), is a major
cause of pelvic inflammatory disease (PID), which can lead to chronic abdominal
pain, ectopic pregnancy and tubal pathology (Weström and Wolner-Hanssen, 1993;
Paavonen and Eggert-Kruse, 1999). The causal relationship between Chlamydia
23138 Coppus.indd 12 28-09-12 14:55
Introduction
13
1trachomatis and genital tract infection was established in 1958 (Collier et al.,
1958). C. trachomatis targets the epithelial cells of the cervix. In the Netherlands an
estimated 35 000 cases of Chlamydia cervicitis occur annually (Health Council of
the Netherlands, 2004).
Early infection can be detected by culture of cervical swabs and treated effectively
with antibiotics. However, the majority of these infections is asymptomatic and
remains therefore undiagnosed. Most women will effectively clear a Chlamydia
trachomatis infection by a normal immune response and have a low risk of
complications. Still, untreated infection can through ascendance result in PID
and tubal factor subfertility (Peipert, 2003; Den Hartog et al., 2006). There are no
prospective controlled trials available on how frequently women tested positive
for Chlamydia cervicitis will develop tubal infertility (Land et al., 2010). A correlation
between the age of a woman, severity of infections, the number of PID episodes
and the risk of tubal pathology has been demonstrated (Weström, 1980; Weström
et al., 1992). The risk of tubal pathology varied from 10% after one episode of PID to
20% after two episodes and 40% after three episodes (Weström et al., 1992).
Since an association between serum C. trachomatis IgG antibodies and tubal
pathology was noted, Chlamydia antibody tests (CAT) were introduced as
screening tests in the work-up of subfertile couples to provide the clinician with
a risk estimate of the presence of tubal pathology (Punnonen et al., 1979; Den
Hartog et al., 2006). CAT is a non-invasive, non-expensive and easy to perform test,
but does not provide images of the uterine and tubal anatomy nor of the pelvis.
Other diagnostic tests
Apart from HSG, laparoscopy and CAT testing; other tools are also used to assess
tubal patency. Falloposcopy is a technique with which through a thin fibreoptic
scope the lumen of the fallopian tubes can directly be visualized (Kiss et al.,
1993; Dechaud et al., 1998; Lundberg et al., 1998). Hystero-contrast-sonography
(HyCoSy) is a method where tubal patency is tested by injection of contrast into
the uterine cavity and patency is assessed by transvaginal ultrasound (Schlief
and Deichert, 1991). Transvaginal hydrolaparoscopy (THL) is performed as a
needle puncture technique of the pouch of Douglas using an adjusted Veress-
23138 Coppus.indd 13 28-09-12 14:55
Chapter 1
14
needle trocar system and warm normal saline is injected through this system
into the pelvic cavity. It allows direct visualization of peritubal - and peri-adnexal
adhesions and endometriotic implants. Tubal patency can be assesses by injection
of methylene blue into the uterine cavity like at laparoscopy (Gordts et al., 1998).
These techniques will not be further discussed, because few studies are available
that evaluate the diagnostic accuracy and predictive performance on pregnancy
rates of these tests.
The performance of HSG, laparoscopy and CAT in terms of accuracy and fertility prognosis.
Since the early nineteen seventies many studies on the performance of HSG and
CAT for the assessment of tubal pathology have been published. In these studies
laparoscopy is usually the reference test. The ideal (or ‘gold standard’) test for
tubal disease would correctly identify all women with tubal disease. It would be a
sensitive test (i.e. the test result would be abnormal in all women with the disease)
and it would also be specific (i.e. the test result would be normal only in women
without the disease).
In a systematic review from 20 studies published between 1968 and 1994 on the
diagnostic accuracy of HSG compared with laparoscopy, only 3 studies could
be identified in which the judgment of tubal pathology at laparoscopy was
independent of the knowledge of HSG results and thus preventing observer bias
(Swart et al., 1995). A meta-analysis of these studies showed a sensitivity of 65%
and a specificity of 83% for HSG for diagnosing tubal patency. HSG was unreliable
for diagnosing peri-adnexal adhesions (Swart et al., 1995).
The accuracy of serum Chlamydia antibodies (CAT) was assessed in another meta-
analysis (Mol et al., 1997). This study showed that the discriminative capacity of
Chlamydia antibody titers depended on the immuno-essays used and that the
diagnosis of any tubal pathology was comparable with that of HSG in diagnosing
tubal occlusion (Mol et al., 1997). The negative predictive value (NPV) of CAT in
subfertile women was 75-90%, making the presence of tubal pathology in women
with a negative CAT less likely, whereas the positive predictive value (PPV) of CAT
ranged between 30 and 65%, indicating a high false-positive rate if laparoscopy is
performed in CAT positive women (Den Hartog et al., 2006).
23138 Coppus.indd 14 28-09-12 14:55
Introduction
15
1In a study performed in 1995, diagnostic laparoscopy failed to be an ideal predictor
for future fertility (Collins et al., 1995). In a prospective cohort study of 794 women,
laparoscopy performed better than HSG as a predictor of future fertility (Mol
et al., 1999). In this study HSG showed a fecundity rate ratio (FRR) of 0.8 and 0.49
for one-sided and two-sided tubal occlusion, whereas in laparoscopy these were
0.51 and 0.15, respectively. After a normal HSG or a HSG with one-sided occlusion,
laparoscopy showed two-sided occlusion in 5% of these women and their fertility
prospects were almost zero. If two-sided tubal occlusion was detected on HSG but
not during laparoscopy, fertility prospects were slightly impaired. Fertility prospects
after bilateral occlusion on HSG were strongly impaired in case laparoscopy showed
one-sided and two-sided occlusion, with FRRs of 0.38 and 0.19, respectively.
Background of the research
Many reports of diagnostic tests, including the before mentioned studies, contain
methodological shortcomings that can result in an overestimation of diagnostic
test accuracy (Lijmer et al., 1999). A number of issues in the design and conduct of
diagnostic accuracy studies can lead to bias or variation. The sources of bias and
variation for which there is the most evidence are demographic features, disease
prevalence and severity, partial verification bias, clinical review bias and observer
and instrument variation (Whiting et al., 2004). In diagnostic accuracy studies the
results of a test are compared with those of a reference standard, the best available
method to detect presence or absence of disease. These results are then expressed
in accuracy statistics, such as sensitivity and specificity. In studies on the prognostic
performance of tests, test results are compared with a clinically relevant end point
and statistics are calculated to express the predictive capacity of the test. A group of
scientist and editors developed the STARD (Standards for Reporting of Diagnostic
Accuracy) statement to improve the reporting quality of diagnostic accuracy and to
help authors and readers to assess the potential for bias in a study and to evaluate
the generalizability of the results (Bossuyt et al., 2003).
Similarly, for improvement on the reporting of meta-analyses, QUORUM (Quality
of Reporting of Meta-analyses) and MOOSE (Meta-analyses of observational
Studies in Epidemiology) are available (Moher et al., 2001; Stroup et al., 2000). A
problem with these conventional meta-analyses in diagnostic studies is that they
23138 Coppus.indd 15 28-09-12 14:55
Chapter 1
16
statistically pool the results of individual diagnostic studies which typically examine
the accuracy of a test in isolation from medical history and clinical examination,
or do not adjust for overlap of information captured by clinical history, physical
examination and additional tests. How useful a test will be in practice remains
therefore uncertain (Miettinen and Caro, 1994; Bachman et al., 2003; Khan et al.,
2003; Broeze et al., 2009).
Not only in meta-analyses, but also in individual studies, the diagnostic performance
of HSG and CAT has been assessed in isolation of patient characteristics, and the
sensitivity and specificity of HSG and CAT have been assumed to be stable across
subgroups of women (Swart et al., 1995; Mol et al., 1997; Perquin et al., 2006). Since
conventional systematic reviews and meta-analyses are based on aggregate data
at the study level and not at the level of subgroups, this is unavoidable. The use
of data at the patient level in an individual patient data (IPD) meta-analysis could
overcome this limitation, because the accuracy of tests can be assessed in relation
to other patient characteristics and allows the development or evaluation of
diagnostic algorithms for individual patients (Broeze et al., 2009).
Despite clinical research done so far and reflecting the limitations of this research
as discussed, there is still inconsistency between fertility guidelines about the best
diagnostic strategy for tubal patency testing (NICE, 2004; NVOG, 2004; ASRM, 2006)
and there is no consensus on which test should be initially used, or on the most
effective or cost-effective sequence of tests (Fatum et al., 2002; Perquin et al., 2006;
Bosteels et al., 2007; Den Hartog et al., 2008). In many fertility clinics, CAT is used
ahead of more invasive tests such as HSG or laparoscopy, whereas in other clinics
CAT is not used at all and all women are directly referred for HSG or even laparoscopy.
Because of these inconsistencies in tubal assessment of subfertile women and the
importance of early identification of women with bilateral tubal pathology, who
have significantly reduced conception chances, our research has focused on the
effective and efficient use of tools and resources available in clinical practice.
We therefore critically reappraised the literature dealing with tubal pathology
in the subfertile population by using STARD criteria and IPD meta-analysis of
diagnostic and prognostic studies if possible and used the data obtained from
a large multicentre prospective cohort study among subfertile couples that had
23138 Coppus.indd 16 28-09-12 14:55
Introduction
17
1completed their basic fertility work-up collected in 38 clinics in the Netherlands
between 2002 and 2004. The results of these studies will be presented in 3 theses.
- Diagnostic tests for tubal pathology from a clinical and economic perspective
(H.R. Verhoeve)
- Diagnostic and prognostic aspects of tubal patency testing (S.F. Coppus)
- Diagnosing tubal pathology -the individual approach- (K.A. Broeze)
Background and outline of the thesis
During initiation of this research project, we noticed a wide practice variation
between clinics on work-up for diagnosing tubal pathology. Internationally, even
larger heterogeneity existed in tubal work-up. We therefore decided to focus on
diagnostic and prognostic aspects of tubal patency testing, and identified several
knowledge gaps about the clinical usefulness of medical history taking, CAT and
HSG in daily practice. To answer these knowledge gaps, we first had to gain insight
in methodological issues, which were developing very rapidly at the time of start
of the research presented in this thesis. As a result, the research presented in this
thesis starts with a focus on methodological issues in diagnostic and prognostic
research in general. Hereafter, we investigate the value of multivariable diagnostic
models based on medical history taking, to estimate pretest probabilities for tubal
pathology that might help to distinguish women with a high and low probability
of (severe) tubal disease at first consultation. Furthermore, the added value of
Chlamydia antibody testing upon medical history taking is explored. Finally, we
research the prognostic significance of various tubal pathology tests like HSG,
laparoscopy and CAT testing for spontaneous pregnancy.
Chapter 2 focuses on the quality of reporting of test accuracy studies in
reproductive medicine. We evaluate whether the introduction of the STARD
statement has led to a better reporting of test accuracy studies in two leading
journals within the field of reproductive medicine.
In chapter 3, we report on the methodological considerations when combining
the results of several diagnostic tests into models to predict the outcome of
such diagnoses over time. The importance of a clinically relevant spread in well
23138 Coppus.indd 17 28-09-12 14:55
Chapter 1
18
calibrated probabilities is emphasized, rather than a focus on sensitivity and
specificity of a model.
In chapter 4, we present the results of a pilot study on the use of multivariable
prediction models in the diagnosis of tubal pathology in subfertile women. Using
the data of 163 women from a Scottish prospective cohort study, strategies of
medical history taking alone, CAT testing alone, or an integrated use of both are
compared.
In chapter 5, data of 3716 women who underwent tubal patency testing as
part of their routine fertility workup are used to relate elements in their medical
history to the presence of tubal pathology. With multivariable logistic regression,
we construct two diagnostic models. One in which tubal disease is defined as
occlusion and/or severe adhesions of at least one tube, whereas in a second model,
tubal disease is defined as the presence of bilateral abnormalities. For both models,
discrimination and calibration is evaluated, and the models are transformed into
simple diagnostic score charts for application of the models in clinical practice.
In chapter 6, we evaluate the impact of unilateral and bilateral tubal pathology
detected at HSG or laparoscopy on treatment-independent pregnancy rates, as
compared with women in whom invasive tubal testing showed no abnormalities.
In chapter 7, we describe the results of a cohort study of 1882 women in whom,
after CAT testing, invasive tubal testing with HSG or laparoscopy showed no tubal
pathology. The effect of CAT-seropositive status on spontaneous pregnancy rates
is evaluated.
Chapter 8 provides a general discussion of the results presented in this thesis and
outlines their clinical implications. Finally suggestions for future research are given.
Chapter 9 summarizes the data presented in this thesis.
Chapter 10 provides an epilogue in which the results of three theses on tubal
pathology from our study group are integrated.
23138 Coppus.indd 18 28-09-12 14:55
Introduction
19
1References
Acton CM, Devitt JM, Ryan EA. Hysterosalpingography in infertility--an experience of 3,631 examinations. Aust N Z J Obstet Gynaecol 1988; 28:127-33.
Al-Fadhli R, Sylvestre C, Buckett W, Tulandi T. A randomized study of laparoscopic chromopertubation with lipiodol versus saline in infertile women. Fertil Steril 2006; 85:505-7.
American Fertility Society (1992) Investigation of the Infertile Couple. American Fertility Society, Birmingham, AL, USA.
Bachmann LM, ter Riet G, Clark TJ, Gupta JK, Khan KS. Probability analysis for diagnosis of endometrial hyperplasia and cancer in postmenopausal bleeding: an approach for a rational diagnostic workup. Acta Obstet Gynecol Scand 2003; 82:564-9.
Bongaarts J. Infertility after the age 30: a false alarm. Fam Plann Perspect 1982; 14:75-8.
Bonsel GJ, van der Maas PJ. Aan de wieg van de toekomst. Scenario’s voor de zorg rond de menselijke voorplanting 1995-2010. Bohn Stafleu van Loghum. Houten 1994.
Bosteels J, Van Herendael B, Weyers S, D’Hooghe T. The position of diagnostic laparoscopy in current fertility practice. Hum Reprod Update 2007; 13:477-85.
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC. Standards for Reporting of Diagnostic Accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003; 326:41-4.
Broeze KA, Opmeer BC, Bachmann LM, Broekmans FJ, Bossuyt PM, Coppus SF, Johnson NP, Khan KS, ter Riet G, van der Veen F, van Wely M, Mol BW. Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine. BMC Med Res Methodol 2009; 27:9:22.
Cary WH. Note on determination of patency of fallopian tubes by the use of Collargol and the X-ray shadow. Am J Obstet 1914; 69:462.
Chapron C, Querleu D, Bruhat MA, Madelenat P, Fernandez H, Pierre F, Dubuisson JB. Surgical complications of diagnostic and operative gynaecological laparoscopy: a series of 29,966 cases. Hum Reprod 1998; 13:867-72.
Collier LH, Duke-Elder S, Jones BR. Experimental trachoma produced by cultured virus. Br J Ophthalmol 1958; 42:705-20.
Collins JA, Burrows EA, Wilan AR. The prognosis for live birth among untreated infertile couples. Fertil Steril 1995; 64:22-8.
Dechaud H, Daures JP, Hedon B. Prospective evaluation of falloposcopy. Hum Reprod 1998; 13:1815-8.
den Hartog JE, Morré SA, Land JA. Chlamydia trachomatis-associated tubal factor subfertility: Immunogenetic aspects and serological screening. Hum Reprod Update 2006; 12:719-30.
den Hartog JE, Lardenoije CM, Severens JL, Land JA, Evers JL, Kessels AG. Screening strategies for tubal factor subfertility. Hum Reprod 2008; 23:1840-8.
Donnez J, Jadoul P. Taking a history in the evaluation of infertility: obsolete or venerable tradition? Fertil Steril 2004; 81:16-7
Drake T, Tredway D, Buchanan G, Takaki N, Daane T. Unexplained infertility. A reappraisal. Obstet Gynecol 1977; 50:644-6.
23138 Coppus.indd 19 28-09-12 14:55
Chapter 1
20
Evers J. Female subfertility. Lancet 2002; 360: 151-159.
Fatum M, Laufer N, Simon A. Investigation of the infertile couple: should diagnostic laparoscopy be performed after normal hysterosalpingography in treating infertility suspected to be of unknown origin? Hum Reprod 2002; 17:1-3.
Forsey JP, Caul EO, Paul ID, Hull MG. Chlamydia trachomatis, tubal disease and the incidence of symptomatic and asymptomatic infection following hysterosalpingography. Hum Reprod 1990; 5:444-7.
Glatstein IZ, Harlow BL, Hornstein MD. Practice patterns among reproductive endocrinologists: the infertility evaluation. Fertil Steril 1997; 67:443-51.
Glatstein IZ, Harlow BL, Hornstein MD. Practice patterns among reproductive endocrinologists: further aspects of the infertility evaluation. Fertil Steril 1998; 70:263-9.
Gordts S, Campo R, Rombauts L, Brosens I. Transvaginal hydrolaparoscopy as an outpatient procedure for infertility investigation. Hum Reprod 1998; 13:99-103.
Hanlo EAJM. Uterosalpingographie (Thesis). SW Melchior. Amersfoort. 1930.
Härkki-Siren P, Sjöberg J, Kurki T. Major complications of laparoscopy: a follow-up Finnish study. Obstet Gynecol 1999; 94:94-8.
Health Council of The Netherlands (2004) Screening for Chlamydia. The Hague: Health Council of The Netherlands. Publication no. 2004/07.
Helmerhorst FM, Oei SG, Bloemenkamp KW, Keirse MJ. Consistency and variation in fertility investigations in Europe. Hum Reprod 1995; 10:2027-30.
Heuser C. La radiographie de l’uterus, des trompes et des ovaires avec ou sans injection de lipiodol, pour le diagnostic précoce de la grossesse, de la stérilité, de la perméabilité des trompes. Bull. Soc. Rad. Méd. France 1925. T. XIII no. 119.
Hubacher D, Grimes D, Lara-Ricalde R, de la Jara J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor infertility. Fertil Steril 2004; 81:6-10.
Hull MG, Glazener CM, Kelly NJ, Conway DI, Foster PA, Hinton RA, Coulson C, Lambert PA, Watt EM, Desai KM. Population study of causes, treatment, and outcome of infertility. BMJ 1985; 291:1693-7.
Jansen FW, Kapiteyn K, Trimbos-Kemper T, Hermans J, Trimbos JB. Complications of laparoscopy: a prospective multicentre observational study. BJOG 1997; 104:595-600.
Khan KS, Bachmann LM, ter Riet G. Systematic reviews with individual patient data meta-analysis to evaluate diagnostic tests. Eur J Obstet Gynecol Reprod Biol 2003; 108:121-5.
Kiss H, Wenzl R, Egarter C. Tuboscopy. Initial experiences with microendoscopic tubal diagnosis. Wien Klin Wochenschr 1993; 105:719-22.
Land JA, Gijsen AP, Evers JL, Bruggeman CA. Chlamydia trachomatis in subfertile women undergoing uterine instrumentation. Screen or treat? Hum Reprod 2002; 17:525-7.
Land JA, Van Bergen JE, Morré SA, Postma MJ. Epidemiology of Chlamydia trachomatis infection in women and the cost-effectiveness of screening. Hum Reprod Update 2010; 16:189-204.
Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, Bossuyt PM. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999; 282:1061-6.
Lindequist S, Justesen P, Larsen C, Rasmussen F. Diagnostic quality and complications of hysterosalpingography: oil- versus water-soluble contrast media--a randomized prospective study. Radiology 1991; 179:69-74.
23138 Coppus.indd 20 28-09-12 14:55
Introduction
21
1Lundberg S, Rasmussen C, Berg AA, Lindblom B. Falloposcopy in conjunction with laparoscopy: possibilities and limitations. Hum Reprod 1998; 13:1490-2.
Luttjeboer F, Harada T, Hughes E, Johnson N, Lilford R, Mol BW. Tubal flushing for subfertility. Cochrane Database Syst Rev. 2007;18:CD003718.
Luttjeboer FY, Verhoeve HR, van Dessel HJ, van der Veen F, Mol BW, Coppus SF. The value of medical history taking as risk indicator for tuboperitoneal pathology: a systematic review. BJOG 2009; 116:612-25.
Marcoux S, Maheux R, Bérubé S. Laparoscopic surgery in infertile women with minimal or mild endometriosis. Canadian Collaborative Group on Endometriosis. N Engl J Med 1997; 337:217-22.
McComb PF. Clinical history is the essence of medicine. Fertil Steril 2004; 81:13-5.
Miettinen OS, Caro JJ. Foundations of medical diagnosis: what actually are the parameters involved in Bayes’ theorem? Stat Med 1994; 13:201-9.
Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. QUOROM Group. Br J Surg 2000; 87:1448-54.
Mol BW, Dijkman B, Wertheim P, Lijmer J, van der Veen F, Bossuyt PM. The accuracy of serum chlamydial antibodies in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1997; 67:1031-7.
Mol BW, Collins JA, Burrows EA, van der Veen F, Bossuyt PM. Comparison of hysterosalpingography and laparoscopy in predicting fertility outcome. Hum Reprod 1999; 14:1237-42.
Moons KG, Biesheuvel CJ and Grobbee DE. Test research versus diagnostic research. Clin Chem 2004; 50:273-6.
National institute for clinical excellence . (2004) Fertility guideline: assessment and treatment for people with fertility problems. www.nice.org.uk/pdf/download.aspx?)0=CGO11fullguideline.
NVOG Richtlijn Orienterend Fertiliteits-onderzoek (OFO) Werkgroep Voortplantings Endocrinologie en Fertiliteit. Nederlandse Vereniging voor Obstetrie en Gynaecologie (NVOG) 2004.
Paavonen J, Eggert-Kruse W. Chlamydia trachomatis: impact on human reproduction. Hum Reprod Update 1999; 5:433-47.
Palmer R. La coelioscopie gynécologique. Mémoires de l´Academie Chirurgie 1946. 72, p.363.
Peipert JF. Clinical practice. Genital chlamydial infections. N Engl J Med 2003; 349:2424-30.
Perquin DA, Dörr PJ, de Craen AJ, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomized controlled trial. Hum Reprod 2006; 21:1227-31.
Punnonen R, Terho P, Nikkanen V, Meurman O. Chlamydial serology in infertile women by immunofluorescence. Fertil Steril 1979; 31:656-9.
Pittaway DE, Winfield AC, Maxson W, Daniell J, Herbert C, Wentz AC. Prevention of acute pelvic inflammatory disease after hysterosalpingography: efficacy of doxycycline prophylaxis. Am J Obstet Gynecol 1983; 147:623-6.
Rindfleisch W. Darstellung des Cavum Uteri. Klin Wochenschr 1910; 47:780.
Rowe PJ, Comhaire FH, Hargreave TB, Mahmoud AMA. WHO manual for the standardized investigation of the infertile couple. Cambridge, UK: Cambridge University Press Cambridge, UK, 1993.
23138 Coppus.indd 21 28-09-12 14:55
Chapter 1
22
Rubin IC. The non-operative determination of patency of fallopian tubes by means of intra-uterine inflation with oxygen and the production of an artificial pneumoperitoneum. Landmark article 1920. JAMA 1983; 250:2359-65.
Schlief R, Deichert U. Hysterosalpingo-contrast sonography of the uterus and fallopian tubes: results of a clinical trial of a new contrast medium in 120 patients. Radiology 1991; 178:213-5.
Simon A, Laufer N. Unexplained infertility: a reappraisal. Ass Reprod Rev. 1993; 3:26-36.
Soules MR, Spadoni LR. Oil versus aqueous media for hysterosalpingography: a continuing debate based on many opinions and few facts. Fertil Steril 1982; 38:1-11.
Stumpf PG, March CM. Febrile morbidity following hysterosalpingography: identification of risk factors and recommendations for prophylaxis. Fertil Steril 1982; 33:487-492.
Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 2000; 283:2008-12.
The practice committee of the American Society for Reproductive Medicine. Effectiveness and treatment for unexplained infertility. Fertil Steril 2006; 86:S111-114.
Tulandi T, Platt R. The art of taking a history. Fertil Steril 2004; 81:11-2.
Swart P, Mol BW, van der Veen F, van Beurden M, Redekop WK, Bossuyt PM. The accuracy of hysterosalpingography in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1995; 64:486-91.
Te Linde’s Operative Gynecology. Editors Thompson JD, Rock JA. 1992. 7th edition. p. 361. Lippincott Company.
Trimbos-Kemper GCM. Tubachirurgie (Thesis). Leiden. 1981.
Weström L. Incidence, prevalence, and trends of acute pelvic inflammatory disease and its consequences in industrialized countries. Am J Obstet Gynecol 1980; 138:880-92.
Weström L, Joesoef R, Reynolds G, Hagdu A, Thompson SE. Pelvic inflammatory disease and fertility. A cohort study of 1,844 women with laparoscopically verified disease and 657 control women with normal laparoscopic results. Sex Transm Dis 1992; 19:185-92.
Weström L, Wölner-Hanssen P. Pathogenesis of pelvic inflammatory disease. Genitourin Med 1993; 69:9-17.
Whiting P, Rutjes AW, Dinnes J, Reitsma J, Bossuyt PM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technol Assess 2004; 8):iii, 1-234.
23138 Coppus.indd 22 28-09-12 14:55
23138 Coppus.indd 23 28-09-12 14:55
23138 Coppus.indd 24 28-09-12 14:55
Part 1Methodological considerations
23138 Coppus.indd 25 28-09-12 14:55
23138 Coppus.indd 26 28-09-12 14:55
2Quality of reporting of test accuracy
studies in reproductive medicine:
impact of the standards for
reporting of diagnostic
accuracy (STARD)-initiative.
Sjors CoppusFulco van der Veen
Patrick BossuytBen Willem Mol
Fertility and Sterility 2006, 86: 1321-1329
The content of this article was presented as a Prize Paper Candidate session oral presentation at the 61st ASRM Annual Meeting, 2005, Montreal, Canada.
23138 Coppus.indd 27 28-09-12 14:55
Chapter 2
28
Abstract
BACKGROUND: To evaluate the extent to which test accuracy studies
published in two leading reproductive medicine journals in the years
1999 and 2004 adhered to the Standards for Reporting of Diagnostic
Accuracy Studies (STARD) initiative parameters, and to explore whether
the introduction of the STARD statement has led to an improved quality of
reporting.
METHODS: Structured literature search. Articles that reported on the
diagnostic performance of a test in comparison with a reference standard
were eligible for inclusion. For each article we scored how well the twenty-
five items of the STARD checklist were reported. These items deal with the
study question, study participants, study design, test methods, reference
standard, statistical methods, reporting of results and conclusions. We
calculated the total number of reported STARD items per article, summary
scores for each STARD item and the average number of reported STARD
items per publication year.
RESULTS: We found 24 studies reporting on test accuracy in reproductive
medicine in 1999 and 27 studies in 2004. The mean number of reported
STARD items for articles published in 1999 was 12.1 ± 3.3 (range 6.5-20) and
12.4 ± 3.2 (range 7-17.5) in 2004, after publication of the STARD statement.
Overall, less than half of the studies reported adequately on 50% or more of
the STARD items. The reporting of individual items showed a wide variation.
There was no significant improvement in mean number of reported items
for the articles published after the introduction of the STARD statement.
CONCLUSIONS: Authors of test accuracy studies in the two leading fertility
journals poorly report the design, conduct, methodology and statistical
analysis of their study. Strict adherence to the STARD guidelines should be
encouraged.
23138 Coppus.indd 28 28-09-12 14:55
Quality of reporting of diagnostic studies
29
2
Introduction
The number of available diagnostic tests is growing exponentially. Evaluation of
their accuracy in ruling in or ruling out disease is increasingly important as these
data can help us identify which tests are useful to perform.
A problem in test accuracy studies is that rigorous methodological standards
for research on diagnostic tests have been slower to develop than standards for
studies of therapeutic interventions. This could be explained by the fact that test
evaluation research is less straightforward than the evaluation of therapy, where
randomized controlled trials have become the de facto standard method for
evaluating effectiveness. In the last 10 years, more and more empirical evidence
has been gained on design features of test accuracy studies associated with bias
and lack of applicability. Several factors can threaten the internal and external
validation of accuracy studies. One study showed that the methodological quality
of studies of diagnostic accuracy, as published in four major medical journals
between 1978 and 1993, was mediocre at best. Information on key elements of
design, conduct and analysis was often not reported (1).
In 1999, Lijmer et al. (2) showed that many reports of diagnostic tests contain
methodological shortcomings that can result in an overestimation of diagnostic
test accuracy. A systematic review showed that a number of issues in the design
and conduct of diagnostic accuracy studies can lead to bias or variation, with the
best-documented effects found for demographic features, disease prevalence and
severity, partial verification bias, clinical review bias, and observer and instrument
variation (3).
In reproductive medicine, tests are ordered to establish the presence or absence
of disease or to predict clinical outcome such as ovarian response or pregnancy.
In diagnostic accuracy studies the results of a test are compared to those of a
reference standard, the best available method to detect the presence of absence
of disease. These results are expressed then expressed in accuracy statistics, such
as sensitivity and specificity. In studies on the prognostic performance of tests,
test results are compared with a clinically relevant endpoint and statistics are
calculated to express the predictive capacity of the test.
23138 Coppus.indd 29 28-09-12 14:55
Chapter 2
30
To interpret the results of clinical studies on testing correctly, readers must
understand the design, conduct, and data analysis and must be able to judge the
internal validity and generalizability of the results. This goal can only be achieved
through complete transparency of reporting in the articles. In recent years, a
growing number of guidelines have been published to support authors of medical
articles to clearly report the methodology used. For randomised clinical trials, the
CONSORT statement (Consolidated Standards of Reporting Trials) was published in
1996 (4) and in a revised form in 2001 (5). For meta-analyses QUOROM (Quality of
Reporting of Meta-analyses) and MOOSE (Meta-analysis of Observational Studies
in Epidemiology) are available (6, 7).
For diagnostic studies, a group of scientists and editors developed the STARD
(Standards for Reporting of Diagnostic Accuracy) statement to improve the
reporting quality of studies on diagnostic accuracy. This STARD statement contains
a 25-item checklist and a prototypical flow chart that helps authors to write a
proper report of a diagnostic accuracy study and enables readers to assess the
potential for bias in a study and to evaluate the generalizability of the results.
STARD was primarily developed for authors to ensure completeness of reporting,
not for reviewers to determine the quality of the study methodology.
In January 2003, this statement was first published simultaneously in eight
medical journals, (American Journal of Clinical Pathology, Annals of Internal Medicine,
British Medical Journal, Clinical Chemistry, Clinical Biochemistry, Clinical Chemistry
and Laboratory Medicine, the Lancet and Radiology) together with an explanation
and elaboration document published in Clinical Chemistry (8, 9). Since then, it
has been published or promoted in several other biomedical journals, including
Academic Radiology, British Journal of Obstetrics and Gynaecology, Clinical Radiology,
Gynecological Oncology, JAMA, Annals of Clinical Microbiology and others (10-15).
In reproductive medicine, neither Human Reproduction nor Fertility and Sterility
published STARD or promoted this statement with an editorial. In the instructions
for authors of these journals on how to prepare a paper, only Human Reproduction
lists the STARD statement as compulsory when writing an article on test accuracy.
Since it is not known how reports on diagnostic and prognostic accuracy in
reproductive medicine comply with the STARD-checklist, and to explore whether
23138 Coppus.indd 30 28-09-12 14:55
Quality of reporting of diagnostic studies
31
2
the introduction of the STARD-statement has improved the quality of report, we
conducted a systematic literature survey in which the quality of reporting of these
studies published in Fertility and Sterility and Human Reproduction was assessed for
the years 1999 (pre-STARD) and 2004 (post-STARD).
Materials and Methods
Study selection and data extractionWe performed a systematic search in all issues of Fertility and Sterility and Human
Reproduction published in 1999 (pre-STARD) and in 2004 (post-STARD) for articles
reporting on the diagnostic or prognostic accuracy of a test. A search was run in
Pubmed on journal name and publication year. This search was then limited to
articles that dealt with human research and had an abstract. Hereafter we excluded
all (randomized) clinical trials, meta-analyses, letters, editorials, practice guidelines
and reviews with the search limit options of the search system.
We screened the remaining articles for eligibility based on title and abstract. Articles
were eligible for inclusion if they reported on primary studies on diagnostic or
prognostic accuracy. Diagnostic accuracy studies were defined as the assessment
of a new test in comparison with a reference standard. Prognostic accuracy studies
were defined as studies in which a test was performed to predict clinical outcomes
in ART such as ovarian hyperstimulation syndrome, ovarian response, or pregnancy.
A flowchart of the study selection process is shown in Fig. 1.
All selected articles were closely examined on the extent to which they reported the
STARD criteria. A standardized score form for the quality of reporting of diagnostic
accuracy studies that had been developed and tested at the department of Clinical
Epidemiology and Biostatistics from the Academic Medical Centre, University of
Amsterdam was used for this purpose (16). A single trained reviewer scored all
articles, with a secondary reviewer (BWJM) checking a random sample of 20% to
ensure accuracy in interpretation of the articles. Inter-rater agreement between
these two reviewers was expressed as overall agreement percentage and more
formally tested with the kappa statistic.
23138 Coppus.indd 31 28-09-12 14:55
Chapter 2
32
Pubmed search for articles published in Fertility and Sterility and Human Reproduction in 1999 and 2004 (n = 2566)
Search limited to human research with available abstract (n = 2127)
Excluded: n = 439
Excluded: n = 670: • Trial n = 452 • Editorial n = 4 • Letter n = 57 • Meta-analysis n = 16 • Practice guideline n = 14 • Review n = 127
Potentially eligible articles identified and screened for retrieval based on title and abstract (n = 1457) !
Potentially eligible articles (n = 60)
Excluded: n = 1397
Assessment of the quality of reporting on diagnostic and prognostic test accuracy studies in reproductive medicine (n = 51)
Excluded: n = 9 No diagnostic or prognostic test evaluation n = 7 Correspondence n = 2
FIGURE 1. Flowchart showing search and selection process of articles reporting on diagnostic or prognostic test accuracy.
Statistical analysisThe 25-items of the STARD list can be divided into items addressing formulation
of the study question, study participants, study design, test methods, reference
standard, statistical methods, report of results and conclusions (Table 1). For
each item of the STARD statement, the total number of articles reporting all the
23138 Coppus.indd 32 28-09-12 14:55
Quality of reporting of diagnostic studies
33
2
elements needed for that item was summed. Equal weights were applied to each
item. The total number of reported STARD items, was also calculated for each
article by summing the number of reported items (0-25 points possible). A higher
number of items indicates better reporting and visa versa.
Six items (items 8, 9, 10, 11, 13 and 24) concern both the index test as well as the
reference test. If only the index test or only the reference standard was reported
adequately, this item was scored as 0.5 point. The overall mean, range, and standard
deviation of the total number of reported STARD items were calculated. Testing for
significant improvement of the total number of reported STARD items 1 year after
the introduction of this statement was performed with an unpaired two-tailed
t-test for independent samples, with the significance level set at P<0.05.
Results
A total number of 2127 articles reporting on human research with an available
abstract published in 1999 and 2004 in Fertility and Sterility and Human Reproduction
were identified. After limiting the search, as described in the method section, the
remaining 1457 articles were screened for eligibility. Of these, 51 (3.5%) were
marked as prognostic or diagnostic accuracy studies (17-67). Twenty-eight of
these were published in Fertility and Sterility and 23 in Human Reproduction.
Agreement between the two reviewers was good with an overall agreement
percentage of 91.5%. The kappa-statistic had a value of 0.82, indicating a very
good agreement beyond change.
The mean number of reported STARD items in the 24 accuracy reports published
in 1999 was 12.1 (SD 3.3) with a range of 6.5 to 20 on a scale from 0 to 25 points. In
2004, the 27 included articles reported a mean of 12.4 (SD 3.2) items with a range
of 7 to 17.5 (Fig. 2). The mean difference was a nonsignificant 0.3 points (P=0.7). No
subgroup analysis was performed. In both years, slightly less than half of studies
(11/24 and 13/27) reported on more than 50% of the items (≥12.5 items). The
maximum number of reported items in a single article in 1999 was 20 of the 25
items versus 17.5 of the 25 items in 2004.
The reporting of individual items showed a wide variation (0% to 96%), which
is shown in Table 1. Although we interpreted item 1 more broadly (allowing
23138 Coppus.indd 33 28-09-12 14:55
Chapter 2
34
“diagnostic” to be read as “prognostic”), articles about diagnostic and prognostic
accuracy were indexed poorly, with only 29% and 19% of articles clearly indexing
the study as an article reporting on diagnostic or prognostic accuracy in the way
that is recommended by the STARD committee. The aim of the study was stated
in more than 80% of articles, with an adequate description of the test under study,
the reference test, and target condition or outcome of interest.
1999
year
2004
Num
ber
of r
epor
ted
STA
RD it
ems
20
18
16
14
12
10
8
6
FIGURE 2. Median, interquartile ranges, and the range of adequately reported STARD items in 1999 and 2004.
The methods section lacked a detailed definition of the study population in which
the tests were carried out in 30% to 40% of studies. Of the 16 items in the method
section -item 8, 9, 10, 11 and 13 counted twice, for index test and reference test
separately- only three of these were reported in more than 75% of the articles
in 1999, a number that increased to six in 2004. For 1999 these items were the
description of participant recruitment, of data collection and adequate description
of the index test (items 4, 6 and 8a). In 2004, these items were participant
recruitment, participant sampling, data collection, description of the index test,
and definition of and rationale for units and cutoffs for the index test and reference
23138 Coppus.indd 34 28-09-12 14:55
Quality of reporting of diagnostic studies
35
2
standard (items 4, 5, 6, 8a, 9a and 9b). The reference standard and its rationale or
the outcome of interest was only reported in 52% of the studies.
TABLE 1. Report of items of the STARD statement in 51 articles of diagnostic and prognostic accuracy published in 1999 and 2004.
1999¶ 2004¶¶
Category and item no. (n=24) (n=27)
Title, abstract and keywords
1. Identify the article as a study of diagnostic accuracy (recommend MESH heading “sensitivity and specificity”)
7 (29) 5 (19)*
Introduction
2 State the research question or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups.
21 (88) 22 (81)
Methods
3 Describe the study population: the inclusion and exclusion criteria, setting, and locations where data were collected.
17 (71) 17 (63)
4 Describe participant recruitment: was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had undergone the index test(s) or the reference standard?
20 (83) 26 (96)
5 Describe participant sampling: was the study population a consecutive series of participants defined by the selection criteria in item 3 and 4? If not, specify how participants were further selected.
15 (63) 21 (78)
6 Describe data collection: was data collection planned before the index test and reference standard were performed (prospective study) or after (retrospective study)?
19 (79) 23 (85)
7 Describe the reference standard and its rationale. 10 (42) 14 (52)
8 Describe technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standard. a. For the index test 23 (96) 24 (89)b. For the reference standard 16 (67) 16 (59)
9 Describe definition of and rationale for the units, cut-offs and/or categories of the results of the index test and the reference standard.a. For the index test 16 (67) 25 (93)b. For the reference standard 16 (67) 21 (78)
10 Describe the number, training and expertise of the persons executing and reading the index tests and the reference standard.a. For the index test 6 (25) 5 (19)b. For the reference standard 2 (8) 2 (7)
11 Describe whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other test and describe any other clinical information available to the readers.a. For the index test 10 (42) 7 (26)b. For the reference standard 4 (17) 6 (22)
12 Describe methods for calculating or comparing measures of diagnostic accuracy, and the statistical methods used to quantify uncertainty (e.g. 95% confidence intervals).
2 (8) 3 (11)
13 Describe methods for calculating test reproducibility, if done.a. For the index test 14 (62) 13 (48)b. For the reference standard 5 (27) 1 (4)
23138 Coppus.indd 35 28-09-12 14:55
Chapter 2
36
Results
14 Report when study was done, including beginning and ending dates of recruitment.
14 (58) 22 (81)
15 Report clinical and demographic characteristics of the study population (e.g. age, sex, spectrum of presenting symptoms, comorbidity, current treatments, recruitment centers).
16 (67) 19 (70)
16 Report the number of participants satisfying the criteria for inclusion that did or did not undergo the index tests and/or the reference standard; describe why participants failed to receive either test (a flow diagram is strongly recommended).
19 (89) 19 (70)
17 Report time interval from the index tests to the reference standard, and any treatment administered between them.
16 (67) 17 (63)
18 Report distribution of severity of disease (define criteria) in those with the target condition; other diagnoses in participants without the target condition.
9 (38) 9 (33)
19 Report a cross tabulation of the results (including indeterminate and missing results) by the results of the reference standard; for continuous results report the distribution of the test results by the results of the reference standard.
16 (67) 12 (44)
20 Report any adverse events from performing the index tests or the reference standard.
0 1 (4)
21 Report measures of diagnostic accuracy and measures of statistical uncertainty (e.g., 95% confidence intervals).
2 (8) 6 (22)
22. Report how indeterminate results, missing responses, and outliers of the index tests were handled.
1 (4) 4 (15)
23. Report estimates of variability of diagnostic accuracy between subgroups of participants, readers or centers, if done.
4 (17) 6 (22)
24. Estimates of test reproducibility, if donea. For the index test 8 (33) 6 (22)b. For the reference standard 0 0
Discussion
25. Discussion of the clinical applicability of the study findings. 22 (92) 26 (96)
* Number in parentheses expresses percentage ¶ Mean STARD score, 12.1 ± 3.3; range 6.5-20 ¶¶ Mean STARD score, 12.4 ± 3.2; range 7-17.5
Information concerning the index test was generally better reported than that for the
reference standard (items 8-13 and 24). Notably information regarding the number
and training of the persons executing and evaluating the index test and reference
test, and blinding of readers to the tests was reported in a minority of reports. In
2004, methods for calculating measures of diagnostic or prognostic accuracy, such
as sensitivity, specificity, likelihood ratios, predictive values and ROC curves, were
reported in only three articles. These were also the only articles that reported a
statistical method used to quantify uncertainty, in all cases 95% confidence intervals.
Indices of accuracy and the corresponding confidence intervals were reported in
another three articles. Reporting indices of accuracy was done more often in articles
in 1999. However, because the STARD statement invites authors to report confidence
intervals as well, no more than 2 articles fulfilled item 12 (data not shown).
23138 Coppus.indd 36 28-09-12 14:55
Quality of reporting of diagnostic studies
37
2
The study population in which tests were executed was reported in 71% and
63% of articles. Of the 11 items in the result section, study period, clinical and
demographic characteristics, and time interval between tests were reported with
a frequency of more than 60%. A flow diagram, which is strongly encouraged by
the STARD steering committee, was presented in only 3 of the 51 articles.
Discussion
The results of this study indicate that the quality of reporting in articles on test
accuracy in reproductive medicine is poor to mediocre. For both publication
years examined, more than 50% of articles report less than 50% of STARD items.
The mean number of reported STARD items was 12.1 for the articles published in
1999 and 12.4 for articles published in 2004. If we compare our results to data on
quality of reporting from other fields of medicine, the standards of reporting in
reproductive medicine are quite comparable to these other specialities. Articles
published in the year 2000 in journals with impact factor of four or higher that
regularly publish articles on diagnostic accuracy, reported 11.9 STARD items on
average, with only 41% of articles reporting on more than 50% of STARD items
(16). In ophthalmology, only 44% of the 16 diagnostic accuracy studies published
in 2002 in the five leading ophthalmic journals reported more than half of the
STARD items (68).
Description of study population was reported in 60% to 70% of the articles.
Although this may seem a reasonable score, in our opinion this percentage is
worrisome. An absent or incomplete description of the study population affects
the generalizability of the study. Empirical evidence exists that this may be
accompanied by overestimated test accuracy (2).
The indexing of the articles as diagnostic studies fulfilled the criteria of the STARD
list in only a minority of reports. This is of concern due to the following reason:
with the number of diagnostic meta-analyses growing, electronic databases
have become indispensable tools to identify studies. To facilitate retrieval of their
diagnostic study, authors should identify their studies as such. STARD recommends
the use of the term “diagnostic accuracy” in the title or abstract of a report that
compares the results of index test(s) with that of a reference standard. The MESH
23138 Coppus.indd 37 28-09-12 14:55
Chapter 2
38
heading “Sensitivity and Specificity” is also recommended. Several reports however,
have shown that a search with this term does not identify all accuracy studies,
while incorrectly identifying many articles as such. “Sensitivity and Specificity” may
therefore be a less suitable set of keywords (69-70). Using the term “diagnostic
accuracy” in title or abstract would enable the reader to easily and quickly identify
a test accuracy study as such.
In the present study, we only valued the quality of reporting (defined as the number
of reported STARD items), not the methodological quality or the degree of bias of
the studies. If a study reported that the evaluation of the index and reference test
was not blinded and this was clearly stated as such in the report, item 11 was
considered to be well reported. Yet this may indicate a possible methodological
shortcoming, depending on the amount of subjectivity associated with reading
both tests. Good reporting though allows the reader-clinician to judge the
potentials for bias in a study. The Quality Assessment of Studies of Diagnostic
Accuracy Included in Systematic Reviews (QUADAS) tool, which contains items
concerning bias in diagnostic accuracy studies, is a checklist that can be used to
judge the potential for bias in a diagnostic accuracy study, and has been developed
to be used in systematic reviews of diagnostic test accuracy (71). Some of the items
listed in the QUADAS tool, were also selected by the US Department of Health and
Human Services in their evidence-based practice report on systems to rate the
strength of scientific evidence (www.ahrq.gov/clinic/tp/strengthtp.htm).
Confidence intervals (CIs), which are important to judge the reliability of the
estimates of diagnostic accuracy, were reported in only eight studies. It is
noteworthy that, although in studies of effects CIs are more or less obligatory;
these are much less being reported in studies of diagnostic accuracy. Like measures
of effect in therapeutic trials, estimates of diagnostic accuracy are also subject to
chance variation, and authors should quantify the amount of uncertainty around
their observed values. Methods for calculating the precision around frequently
used measures of diagnostic accuracy are readily available in literature (72, 73).
One other review revealed that CIs were reported in only 50% of 16 diagnostic
evaluation reports published in the BMJ in 1996 and 1997 (74). In our study
confidence intervals were reported in only 16% of studies.
23138 Coppus.indd 38 28-09-12 14:55
Quality of reporting of diagnostic studies
39
2
In this study, we evaluated whether or not information on each of the 25 STARD
items was reported. Although we have referred to this evaluation in this manuscript
as the “quality of reporting”, the latter may be defined by an additional number of
features, such as clarity of the language used for reporting, absence of ambiguity,
consistency between abstract, methods, results and discussion. Such a more
detailed analysis was beyond the scope of our paper.
It has been shown that the introduction of a guideline on how to report
randomized controlled trials has led to improvement in the quality of reporting
of these studies. Articles in journals that had adopted CONSORT showed greater
improvement in quality of reporting than did articles in a journal that did not
advocate its use (75). A recent study provided additional evidence to suggest a
role for checklists. Articles published in Clinical Chemistry - a journal that adopted a
preliminary checklist for the report of diagnostic accuracy studies as early as 1996
- showed a greater improvement in reporting from 1996 to 2002 than articles
published in Clinical Chemistry and Laboratory Medicine, a journal that did not use
a similar checklist before (76).
Our study was unable to detect a significant improvement of the quality of
reporting between 1999 and 2004 in a “before-and-after STARD” evaluation. A
possible explanation may be that the time involved between submitting an article
to a journal and the final acceptation for publication (the so-called “peer-review
time”) did not allow authors to incorporate the STARD-guidelines into their articles.
Another possibility might be that authors, reviewers and editors are less familiar
with the current STARD checklist than with others such as CONSORT.
Possible limitations of our study are the relatively small number of studies included,
and the fact that we applied the STARD list also to prognostic accuracy studies. As
far as the first limitation is concerned, we decided to include studies published from
January 2004 onward, specifically with the aim of reducing the effect of the above
mentioned peer review time. This obviously limited the number of eligible articles.
With respect to the second limitation, we believe that it is justified to apply the
STARD-criteria to prognostic accuracy studies as well, as in reproductive medicine
these studies have a striking similarity with diagnostic test accuracy studies. For
the evaluation of the prognostic accuracy of tests, transparent reporting of study
23138 Coppus.indd 39 28-09-12 14:55
Chapter 2
40
execution, participants, flow of participants throughout the study, statistical
uncertainty, definition of outcome and rationale, reproducibility of test results, and
so on, is as essential as it is with diagnostic test accuracy studies.
In conclusion, this study shows that the quality of reporting on diagnostic and
prognostic accuracy studies in reproductive medicine is comparable to other fields
of medicine but leaves ample room for improvement. Authors, reviewers, editors
and readers should be aware of the STARD criteria for reporting on diagnostic/
prognostic accuracy studies. We welcome stricter adherence to this guideline.
AcknowledgementsThe authors thank Nynke Smidt, Ph.D., from the Institute for Research in Extramural Medicine, VU University Medical Center Amsterdam and Anne W.S. Rutjes, Ph.D., from the department of Clinical Epidemiology and Biostatistics of the Academic Medical Centre, University of Amsterdam, for generously providing the STARD scoring form.
23138 Coppus.indd 40 28-09-12 14:55
Quality of reporting of diagnostic studies
41
2
References
1. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 1995;274:645-51.
2. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061-6. Erratum in: JAMA 2000; 283:1963.
3. Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 2004;140:189-202.
4. Altman DG. Better reporting of randomised controlled trials: the CONSORT statement. BMJ 1996; 313:570-1.
5. Moher D, Schulz KF, Altman D; CONSORT Group (Consolidated Standards of Reporting Trials). The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987-91.
6. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999; 354:1896-1900.
7. Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. JAMA 2000; 283:2008-2012.
8. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003;326:41-44.
9. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7-18.
10. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Toward complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Acad Radiol 2003;10:599-600.
11. Khan KS, Bakour SH, Bossuyt PM. Evidence-based obstetric and gynaecologic diagnosis: the STARD checklist for authors, peer-reviewers and readers of test accuracy studies. BJOG 2004;111:638-40.
12. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Radiol 2003;58:575-80.
13. Selman TJ, Khan KS, Mann CH. An evidence-based approach to test accuracy studies in gynecologic oncology: the ‘STARD’ checklist. Gynecol Oncol 2005;96:575-8.
14. Rennie D. Improving reports of studies of diagnostic tests: the STARD initiative. JAMA 2003;289:1061-6.
15. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Ann Clin Biochem 2003;40:357-63.
16. Smidt N, Rutjes AW, van der Windt DA, Ostelo RW, Reitsma JB, Bossuyt PM, et al. Quality of reporting of diagnostic accuracy studies. Radiology 2005;235:347-53.
23138 Coppus.indd 41 28-09-12 14:55
Chapter 2
42
17. Coulam CB, Goodman C, Rinehart JS. Colour Doppler indices of follicular blood flow as predictors of pregnancy after in-vitro fertilization and embryo transfer. Hum Reprod 1999;14:1979-82.
18. Noci I, Maggi M, Biagiotti R, D’Agata A, Criscuoli L, Marchionni M. Serum CA-125 values on the day of oocyte retrieval are not predictive of subsequent pregnancy with in-vitro fertilization. Hum Reprod 1999; 14:1773-76.
19. Bjercke S, Tanbo T, Dale PO, Morkrid L, Abyholm T. Human chorionic gonadotrophin concentrations in early pregnancy after in vitro-fertilization. Hum Reprod 1999;14:1642-46.
20. Yuval Y, Lipitz S, Dor J, Achiron R. The relationships between endometrial thickness, and blood flow and pregnancy rates in in-vitro fertilization. Hum Reprod 1999;14:1067-71.
21. Hall JE, Welt CK, Cramer DW. Inhibin A and inhibin B reflect ovarian function in assisted reproduction but are less useful at predicting outcome. Hum Reprod 1999;14:409-15.
22. Urman B, Alatas C, Aksoy S, Mercan R, Isiklar A, Balaban B. Elevated serum progesterone level on the day of human chorionic gonadotropin administration does not adversely affect implantation rates after intracytoplasmic sperm injection and embryo transfer. Fertil Steril 1999:72:975-79.
23. Branigan EF, Estes MA, Muller CH. Advanced semen analysis: a simple screening test to predict intrauterine insemination success. Fertil Steril 1999;71:547-551.
24. Evenson DP, Jost LK, Marshall D, Zinaman MJ, Clegg E, Purvis K et al. Utility of the sperm chromatin structure assay as a diagnostic and prognostic tool in the human fertility clinic. Hum Reprod 1999;14:1039-49.
25. Kinkel K, Chapron C, Balleyguier C, Fritel X, Dubuisson J, Moreau J. Magnetic resonance imaging characteristics of deep endometriosis. Hum Reprod 1999;14:1080-86.
26. Kataoka ML, Togashi K, Kobayashi H, Inoue T, Fujii S, Konishi J. Evaluation of ectopic pregnancy by magnetic resonance imaging. Hum Reprod 1999;14:2644-50.
27. Yamamoto Y, Sofikitis N, Mio Y, Miyagawa I. Highly sensitive quantitative telomerase assay of diagnostic testicular biopsy material predicts the presence of haploid spermatogenic cells in therapeutic testicular biopsy in men with Sertoli cell-only syndrome. Hum Reprod 1999;14:3041-47.
28. Campo R, Gordts S, Rombauts L, Brosens I. Diagnostic accuracy of transvaginal hydrolaparoscopy in infertility. Fertil Steril 1999:71:1157-1160.
29. Huey S, Abuhamad A, Barroso G, Hsu MI, Kolm P, Mayer J, Oehninger S. Perifollicular blood flow Doppler indices, but not follicular pO
2, pCO
2, of pH, predict oocyte developmental
competence in in vitro fertilization. Fertil Steril 1999;72:707-712.
30. Agrawal R, Tan SL, Wild S, Sladkevicius P, Engmann L, Payne N et al. Serum vascular endothelial growth factor concentrations in in vitro fertilization cycles predict the risk of ovarian hyperstimulation syndrome. Fertil Steril 1999;71:287-293.
31. Mol BWJ, Hajenius PJ, Engelsbel S, Ankum WM, van der Veen F, Hemrika DJ, Bossuyt PMM. Are gestational age and endometrial thickness alternatives for serum human chorionic gonadotropin as criteria for the diagnosis of ectopic pregnancy? Fertil Steril 1999;72:643-645.
32. Kitawaki J, Kusuki I, Koshiba H, Tsukamoto K, Fushiki S, Honjo H. Detection of aromatase cytochrome P-450 in endometrial biopsy specimens as a diagnostic test for endometriosis. Fertil Steril 1999;72:1100-1106.
23138 Coppus.indd 42 28-09-12 14:55
Quality of reporting of diagnostic studies
43
2
33. Nowacek GE, Meyer WR, McMahon MJ, Thorp JR, Wells SR. Diagnostic value of cervical fetal fibronectin in detecting extrauterine pregnancy. Fertil Steril 1999;72:302-4.
34. Daniel Y, Geva E, Lerner-Geva L, Eshed-Englender T, Gamzu R, Lessing JB et al. Levels of vascular endothelial growth factor are elevated in patients with ectopic pregnancy: is this a novel marker? Fertil Steril 1999;72:1013-1017.
35. Azziz R, Hincapie LA, Knochenhauer ES, Dewailly D, Fox L, Boots LR. Screening for 21-hydroxylase-deficient nonclassic adrenal hyperplasia among hyperandrogenic women: a prospective study. Fertil Steril 1999;72:915-925.
36. Güleki B, Bulbul Y, Onvural A, Yorukoglu K, Posaci C, Demir N, Erten O. Accuracy of ovarian reserve tests. Hum Reprod 1999: 14:2822-26.
37. Ludwig M, Jelkmann W, Bauer O, Diedrich K. Prediction of severe ovarian hyperstimulation syndrome by free serum vascular endothelial growth factor concentration on the day of human chorionic gonadotrophin administration. Hum Reprod 1999;14:2437-41.
38. Haddad B, Abirached F, Louis-Sylvestre C, Le Blond J, Paniel BJ, Zorn JR. Predictive value of early human chorionic gonadotrophin serum profiles for fetal growth retardation. Hum Reprod 1999;14:2872-75.
39. Hata T, Yanagihara T, Hayashi K, Yamashiro C, Ohnishi Y, Akiyama M et al. Three-dimensional ultrasonographic evaluation of ovarian tumours: a preliminary study. Hum Reprod 1999;14:858-61.
40. Lashen H, Afnan M, McDougall L, Clark P. Prediction of over-response to ovarian stimulation in an intrauterine insemination program. Hum Reprod 1999;14:2751-2754.
41. Frazier LM, Grainger DA, Schieve LA, Toner JP. Follicle-stimulating hormone and estradiol levels independently predict the success of assisted reproductive technology treatment. Fertil Steril 2004;82:834-40.
42. Volpes A, Sammartano F, Coffaro F, Mistretta V, Scaglione P, Allegra A. Number of good quality embryos on day 3 is predictive for both pregnancy and implantation rates in in vitro fertilization/intracytoplasmic sperm injection cycles. Fertil Steril 2004;82:1330-1336.
43. Ioannidis G, Sacks G, Reddy N, Seyani L, Margara R, Lavery S, Trew G. Day 14 maternal serum progesterone levels predict pregnancy outcome in IVF/ICSI treatment cycles: a prospective study. Hum Reprod Advance Access published Dec 9, 2004.
44. Hazout A, Bouchard P, Seifer DB, Aussage P, Junca AM, Cohen-Bacrie P. Serum antimüllerian hormone/müllerian-inhibiting substance appears to be a more discriminatory marker of assisted reproductive technology outcome than follicle-simulating hormone, inhibin B, or estradiol. Fertil Steril 2004;82:1323-1329.
45. Bungum M, Humaidan P, Spano M, Jepson K, Bungum L, Giwercman A. The predictive value of sperm chromatin structure assay (SCSA) parameters for the outcome of intrauterine insemination, IVF and ICSI. Hum Reprod 2004;19:1401-08.
46. De Neuborg, Gerris J, Mangelschots K, van Royen E, Vercruyssen M, Elseviers M. Single top quality embryo transfer as a model for prediction of early pregnancy outcome. Hum Reprod 2004;19:1476-79.
47. Fasouliotis SJ, Spandorfer SD, Witkin SS, Schattman G, Liu HC, Roberts JE et al. Maternal serum levels of interferon-γ and interleukin-2 soluble receptor-α predict the outcome of early IVF pregnancies. Hum Reprod 2004;19:1357-63.
48. Blazar AS, Hogan JW, Frankfurter D, Hackett R, Keefe D. Serum estradiol positively predicts outcomes in patients undergoing in vitro fertilization. Fertil Steril 2004;81:1707-1709.
23138 Coppus.indd 43 28-09-12 14:55
Chapter 2
44
49. Frattarelli JL, Levi AJ, Miller BT, Segars JH. Prognostic use of mean ovarian volume in in vitro fertilization cycles: a prospective assessment. Fertil Steril 2004;82:811-815.
50. Hauzman E, Fedorcsák P, Klinga K, Papp Z, Rabe T, Strowitzki T et al. Use of serum inhibin A and human chorionic gonadotropin measurements to predict the outcome of in vitro fertilization pregnancies. Fertil Steril 81;66-71.
51. Bancsi LFJMM, Broekmans FJM, Looman CWN, Habbema JDF, te Velde ER. Impact of repeated antral follicle counts on the prediction of poor ovarian response in women undergoing in vitro fertilization. Fertil Steril 2004;81:35-41.
52. Fish KE, Phipps M, Trimarchi J, Weitzen S, Blazar A. CA-125 serum levels and pregnancy outcome in in vitro fertilization. Fertil Steril 2004;82:1705-07.
53. Wiener-Megnazi Z, Vardi L, Lissak A, Shnizer S, Zeev Reznick A, Ishai D et al. Oxidative stress indices in follicular fluid as measured by the thermochemiluminescence assay correlate with outcome parameters in in vitro fertilization. Fertil Steril 2004;82:1171-1176.
54. Virro MR, Larson-Cook KL, Evenson DP. Sperm chromatin structure assay (SCSA®) parameters are related to fertilization, blastocyst development, and ongoing pregnancy in in vitro fertilization and intracytoplasmatic sperm injection cycles. Fertil Steril 2004;81:1289-95.
55. Van Rooij IAJ, de Jong E, Broekmans FJM, Looman CWN, Habbema JDF, te Velde ER. High follicle-stimulating hormone levels should not necessarily lead to the exclusion of subfertile patients from treatment. Fertil Steril 2004;81:1478-1485.
56. Kin KH, Sik OH D, Heaok Jeong J, Sub Shin B, Sun Joo B, Sup Lee K. Follicular blood flow is a better predictor of the outcome of in vitro fertilization-embryo transfer than follicular fluid vascular endothelial growth factor and nitric oxide concentrations. Fertil Steril 2004;82:586-92.
57. Allamaneni SSR, Bandaranayake I, Agarwal A. Use of semen quality scores to predict pregnancy rates in couples undergoing intrauterine insemination with donor sperm. Fertil Steril 2004;82:606-11.
58. Kamiyama S, Teruya Y, Nohara M, Kanazawa K. Impact of detection of bacterial endotoxin in menstrual effluent on the pregnancy rate in in vitro fertilization and embryo transfer. Fertil Steril 2004;82:788-92.
59. Pal L, Kovacs P, Witt B, Jindal S, Santoro N, Barad D. Postthaw blastomere survival is predictive of the success of frozen-thawed embryo transfer cycles. Fertil Steril 2004;82:821-826.
60. Lédéé-Bataille N, Olivennes F, Kadoch J, Dubanchet S, Frydman N, Chaouat G et al. Detectable levels of interleukin-18 in uterine luminal secretions at oocyte retrieval predict failure of the embryo transfer. Hum Reprod 2004;19:1968-73.
61. Fasouliotis SJ, Spandorfer SD, Witkin SS, Liu HC, Roberts JE, Rosenwaks Z. Maternal serum vascular endothelial growth factor levels in early ectopic and intrauterine pregnancies after in vitro fertilization treatment. Fertil Steril 2004;82:309-13.
62. Potlog-Nahari C, Feldman AL, Stratton P, Koziol DE, Segars J, Merino MJ et al. CD10 immunohistochemical staining enhances the histological detection of endometriosis. Fertil Steril 2004;82:86-92.
63. Omodei U, Ferrazzi E, Ramazzotto R, Becorpi A, Grimaldi E, Scarselli G et al. Endometrial evaluation with transvaginal ultrasound during hormone therapy: a prospective multicenter study. Fertil Steril 2004;81:1632-1637.
64. Hubacher D, Grimes D, Lara-Ricalde R, de la Jara J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor infertility. Fertil Steril 2004;81:6-10.
23138 Coppus.indd 44 28-09-12 14:55
Quality of reporting of diagnostic studies
45
2
65. De Kroon CD, van der Sandt HAGM, van Houwelingen JC, Jansen FW. Sonographic assessment of non-malignant ovarian cysts: does sonohistology exist? Hum Reprod 2004;19:2138-43.
66. Den Hartog JE, Land JA, Stassen FRM, Slobbe-van Drunen MEP, Kessels AGH, Bruggeman CA. The role of Chlamydia genus-specific and species-specific IgG antibody testing in predicting tubal disease in subfertile women. Hum Reprod 2004;19:1380-84.
67. Somigliana E, Viganò P, Tirelli AS, Felicetta I, Torresani E, Vignali M et al. Use of the concomitant serum dosage of CA 125, CA 19-9 and interleukin-6 to detect the presence of endometriosis. Results from a series of reproductive age women undergoing laparoscopic surgery for benign gynaecological conditions. Hum Reprod 2004;19:1871-76.
68. Siddiqui MA, Azuara-Blanco A, Burr J. The quality of reporting of diagnostic accuracy studies published in ophthalmic journals. Br J Ophthalmol 2005;89:261-5.
69. Deville WL, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.
70. Bachmann LM, Coray R, Estermann P, Ter Riet G. Identifying diagnostic studies in MEDLINE: reducing the number needed to read. J Am Med Inform Assoc 2002;9:653-58.
71. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25.
72. Lang TA, Secic M. Generalizing from a sample to a population: reporting estimates and confidence intervals. In: Lang TA, Secic M, eds. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. Philadelphia: American College of Physicians, 1997:55-63.
73. Habbema JDF, Eijkemans R, Krijnen P, Knottnerus JA. Analysis of data on the accuracy of diagnostic test. In: Knottnerus JA, ed. The evidence base of clinical diagnosis. London: BMJ Publishing Group, 2002:117-44.
74. Harper R, Reeves B. Reporting of precision of estimates for diagnostic accuracy: a review.BMJ 1999;318:1322-3.
75. Moher D, Jones A, Lepage L; CONSORT Group (Consolidated Standards for Reporting of Trials. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 2001;285:1992-5.
76. Lumbreras-Lacarra B, Ramos-Rincon JM, Hernandez-Aguado I. Methodology in diagnostic laboratory test research in clinical chemistry and clinical chemistry and laboratory medicine. Clin Chem 2004;50:530-6.
23138 Coppus.indd 45 28-09-12 14:55
23138 Coppus.indd 46 28-09-12 14:55
3Evaluating prediction models in
reproductive medicine.
Sjors CoppusFulco van der Veen
Brent OpmeerBen Willem MolPatrick Bossuyt
Human Reproduction 2009, 24: 1774-1778
23138 Coppus.indd 47 28-09-12 14:55
Chapter 3
48
Abstract
Prediction models are used in reproductive medicine to calculate the
probability of pregnancy without treatment, as well as the probability of
pregnancy after ovulation induction, intrauterine insemination or in vitro
fertilization. The performance of such prediction models is often evaluated
with a receiver operating characteristic (ROC) curve. The area under the
ROC curve, also known as c-statistic, is then used as a measure of model
performance. The value of this c-statistic is low for most prediction
models in reproductive medicine. Here, we demonstrate that low values
of the c-statistic are to be expected in these prediction models, but we
also show that this does not imply that these models are of limited use
in clinical practice. The calibration of the model (the correspondence
between model-based probabilities and observed pregnancy rates) as well
as the availability of a clinically useful distribution of probabilities and the
ability to correctly identify the appropriate form of management are more
meaningful concepts for model evaluation.
23138 Coppus.indd 48 28-09-12 14:55
Evaluating prediction models
49
3
Introduction
In the evaluation of subfertile couples, clinicians have traditionally focused on
finding the underlying cause of subfertility. Yet, only in a minority of couples, a
causal diagnosis can be made, one which fully explains why conception has not
occurred. Examples of such causal diagnoses are anovulation, azoospermia and
bilateral tubal pathology. In the majority of couples, a relative factor is found, only
partially explaining why conception has not occurred. The latter factors include
advanced maternal age, mild male subfertility, cervical hostility, mild endometriosis
and one-sided tubal pathology. In other couples, no factors impairing fertility can
be found at all, and a diagnosis of unexplained subfertility is made.
A number of treatments options are available once a causal diagnosis has been
made. In anovulatory women, infertility can be corrected by ovulation induction.
Women with bilateral tubal disease can be treated with tubal surgery or offered
in vitro fertilisation (IVF). In men with azoospermia or oligoasthenozoospermia,
the inability to conceive can be overcome by surgical sperm retrieval and assisted
fertilization. Treatment of these couples is indicated, as their probability of natural
conception is virtually zero. In contrast to couples with a causal diagnosis, couples
with a relative factor as well as those with unexplained subfertility vary in their
prognosis. Whereas some have relatively good chances of achieving a pregnancy
without treatment, these chances are rather low in others (Collins et al., 1995). As
targeted treatments are not available for these conditions, empiric interventions
such as intrauterine insemination (IUI) and IVF are usually offered. IUI and IVF are
expensive and not without potential side effects, and these treatment options
should therefore be offered to couples only if their probability of conceiving
with treatment is sufficiently higher than their probability of conceiving without
treatment.
Clinical decision making in these couples should therefore be guided by these
probabilities (Habbema et al., 2004). A number of validated prediction models
are available to help the clinician in calculating the probability of pregnancy in
subfertile couples, with and without treatment (see Leushuis et al., 2009, for a
critical appraisal). There are models to calculate: the probability of a treatment
independent pregnancy (Eimers et al., 1994; Collins et al., 1995; Snick et al., 1997;
Hunault et al., 2004; Hunault et al., 2005; van der Steeg et al., 2007), the probability
23138 Coppus.indd 49 28-09-12 14:55
Chapter 3
50
of a pregnancy after IUI (Steures et al., 2004; Custers et al., 2007) and the probability
of a pregnancy after IVF (Templeton et al., 1996; Stolwijk et al., 1996; Smeenk et al.,
2000; Lintsen et al., 2007).
For most of these models, the ability to predict pregnancy has been evaluated
by receiver operating characteristic (ROC) curves. These curves are obtained by
comparing the proportion of couples exceeding a pre-set probability threshold
within those who achieve a pregnancy with the proportion also exceeding that
threshold in those who do not achieve a pregnancy in a specified time frame. This
is done for all possible probability thresholds. The area under the resulting ROC
curve (AUC), also known as c-statistic, expresses the extent to which a model can
identify couples who will become pregnant and is used as a measure of model
performance.
In general, the c-statistic for prediction models in reproductive medicine is rather
low, ranging between 0.59 and 0.64 for models for treatment independent
pregnancy, between 0.56 and 0.59 for IUI, and between 0.50 and 0.67 for IVF. A
graph of such a typical ROC curve is shown in Fig. 1, adapted from a paper on
the prediction of pregnancy after IUI (Custers et al., 2007). According to criteria
for evaluating areas under ROC curves, these values indicate low-to-modest
discriminatory performance (Swets, 1988). It is questionable whether the
c-statistic is the best way to express the predictive performance of a model and,
consequently, if a limited discriminatory capacity precludes the use of the model
in clinical practice. In this paper, we illustrate that the area under the ROC curve or
c-statistic has limited value in the evaluation of prediction models in reproductive
medicine. We argue that calibration and prognostic classification are more relevant
concepts for such models to guide clinical decision making.
c-Statistic in diagnostic research
In principle, the aim of diagnostic testing is to distinguish diseased from non-
diseased patients. The diagnostic accuracy of a test is studied by comparing the
result of the test under evaluation with the result of a reference standard in a series
of patients, the latter being the best available method for classifying a patient as
diseased or not. The results of the cross-classification can then be expressed as
23138 Coppus.indd 50 28-09-12 14:55
Evaluating prediction models
51
3
the sensitivity and specificity of the test under evaluation. The sensitivity, or true-
positive fraction, reflects the proportion of diseased patients with a positive test
result, whereas the specificity, or one minus the false-positive fraction, reflects the
proportion of patients without the disease with a negative test result.
1,0
0,8
0,6
0,4
0,2
0,00,0 0,2 0,4 0,6 0,8 1,0
Sens
itiv
ity
1-specificity
FIGURE 1. Typical ROC curve of a prediction model in reproductive medicine. This ROC curve derived from a paper in which a prediction model for pregnancy after IUI was externally validated, demonstrating an AUC of 0.56 (Custers et al., 2007). Sensitivity is defined here as the percentage of cycles not resulting in an ongoing pregnancy that was predicted correctly, and specificity was defined as the percentage of cycles that resulted in an ongoing pregnancy that was predicted correctly. Reprinted from Custers et al., (2007). Copyright 2007, with permission from Elsevier.
In case the studied test is of a continuous nature, the sensitivity and specificity of
the test depend on the cut-off value to define positive and negative test results.
Sensitivities and specificities can be calculated over the whole range of possible
cut-off values, where higher sensitivities are obtained at lower specificities and
visa versa. The resulting series of sensitivity-specificity pairs is plotted as an ROC
curve, showing the sensitivity versus the specificity for each possible cut-off value.
The AUC indicates how well the test discriminates between diseased and non-
diseased patients. An AUC of 1 implies perfect discrimination; whereas an AUC
of 0.5 means that the test does not discriminate at all (Hanley and McNeil, 1982).
Assuming an error-free reference standard, every diagnostic test has the potential
of being 100% discriminative.
23138 Coppus.indd 51 28-09-12 14:55
Chapter 3
52
c-Statistic in prognostic research
Prediction models in reproductive medicine evaluate whether a combination of
factors can predict the occurrence of pregnancy within a specified time period.
In contrast to the situation in diagnostic testing, in which the disease is present at
the moment of testing, pregnancy has not yet occurred at the time the potentially
predictive factors are measured. Whether or not pregnancy occurs can be
considered the outcome of a stochastic process occurring over time. This implies
that the value of a test cannot be expressed by the proportion of patients who
are pregnant at the time of testing, as this is essentially zero. The association has
to be expressed in terms of the probability that pregnancy will occur in the future.
This probability can be calculated for a single test or for a multivariable prediction
model; the latter most often developed using Cox proportional hazard modelling.
Such prediction models are often evaluated similar to diagnostic tests, using
ROC curves and their AUC or c-statistic. In doing so, the calculated probability of
pregnancy within a specified time period is regarded as the continuous test result,
and the reference standard is the actual occurrence of pregnancy within that
specified period of follow-up. Using different cut-off values for these probabilities
yields a series of sensitivity and specificity pairs, through which an ROC curve can
be drawn. For prediction models in reproductive medicine, the AUC or c-statistic
reflects how well the model is able to distinguish those with a future pregnancy
from those that will not achieve a pregnancy within the time frame specified, not
the degree to which couples with a higher probability will conceive first.
Although this may seem quite logical at prima facie, the crux of the matter is that
perfect discrimination is achievable only if the population of subfertile couples
consists of two distinct subpopulations: fertile couples that have not yet conceived
but will do so in the near future and infertile couples guaranteed not to conceive. It
is very unlikely that this is a fair representation of couples after an initial subfertility
workup. Subfertility is not dichotomous in nature but a complex multifactorial
phenomenon, corresponding to a gradual continuum of impaired reproductive
capacity. Couples with very good pregnancy chances will rarely enter a subfertility
workup, as they are the ones that usually have conceived spontaneously within
12 months after the start of unprotected intercourse. Most couples with a causal
diagnosis, the ones with poorest fertility prospects, are identified in the first part of
the subfertility workup and treated accordingly. With the two extremes eliminated,
23138 Coppus.indd 52 28-09-12 14:55
Evaluating prediction models
53
3
the remaining group of subfertile couples predominantly has an intermediate
prognosis.
When perfect discrimination is not achievable, the maximum value of the c-statistic
depends on the underlying distribution of probabilities. The lower the variability
of the probabilities, the lower the maximum value of the c-statistic will be (Gail
and Pfeiffer, 2005; Cook, 2007). Among subfertile couples, defined as couples that
have not conceived despite a year of unprotected intercourse, the probability
to conceive spontaneously within the next 12 months is more or less normally
distributed in couples with unexplained subfertility, with a mean probability of
30% (van der Steeg et al., 2007). As the distribution of probabilities in couples that
successfully conceive has considerable overlap with that of couples that are not
successful, the maximum c-statistic that can be reached with any model is in the
order of 0.62 (Gail and Pfeiffer, 2005; Cook 2007).
Calibration
So if pregnancy itself cannot be predicted with 100% accuracy, what then? We
should realize that the modern approach in subfertility treatment is to offer tailored
management to individual couples, in which treatment is offered only to couples
who have poor chances of a spontaneous pregnancy. Ideally, early treatment in
couples with high chances of natural conception should be avoided, as should
the delay of treatment in couples with poor prospects of pregnancy. Such an
approach could reduce costs and may prevent multiple pregnancies as well as
other complications of assisted reproductive technologies (Steures et al., 2006). For
this tailored treatment, a classification of couples is required. This classification will
not be in terms of those that will become pregnant versus those that will not, as
this is almost impossible. The relevant classification is one in terms of those for
whom the probability of becoming pregnant with treatment, be it IUI or IVF, is
sufficiently higher than the probability of becoming pregnant without treatment,
versus those for which this is not the case.
Prediction models in reproductive medicine should be judged on this capability.
One necessary condition is that we should be able to trust calculated probabilities.
The reliability of these model-based probabilities can be evaluated by inspecting
23138 Coppus.indd 53 28-09-12 14:55
Chapter 3
54
the calibration of the model: the degree to which calculated probabilities agree
with actual observed event rates. The level of calibration of a prediction model
can be explored by assigning couples to subgroups, based on the calculated
probabilities, and then, after follow-up, comparing the average calculated
probability in each subgroup with the proportion of couples that became
pregnant. One way of assigning couples to subgroups is using deciles of calculated
probabilities. A comparison of the mean probability in each subgroup versus the
observed proportion can be plotted in a calibration plot, as in Fig. 2. In case of
perfect calibration, all points in the calibration plot will be located on the diagonal,
the line of equality. In the left plot, probabilities are well calibrated, whereas the
middle and right plots in Fig. 2 show calculated probabilities that suffer from
underestimation and overestimation, respectively.
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
23138 Coppus.indd 54 28-09-12 14:55
Evaluating prediction models
55
3
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
1,0
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
FIGURE 2. Calibration plots with calculated probability on the X-axis and observed proportion on the Y-axis. The left plot shows perfect calibration. The middle plot demonstrates a model that tends to give underestimated probabilities, whereas the plot on the right shows systematic overestimation.
A combination of perfect calibration and perfect discrimination can only be
achieved with perfect prediction: a probability of 100% for all those with a future
event, and a probability of 0% for all others. Such an extreme bimodal distribution
is not very realistic in most applications, including subfertility (Ware, 2006; Cook,
2007). In fact, there is a trade-off between calibration and discrimination. A
simulation model showed that, assuming a realistic range of probabilities, the
c-statistic for a perfectly calibrated model would be 0.83 at best (Diamond, 1992).
Overlapping probability distributions in those with and those without the future
event will usually prevent that maximum from being reached (Gail and Pfeiffer,
2005).
Prognostic classification
Although good calibration is very important, a well-calibrated model is not
necessarily useful in subfertility care. Probabilities also need to have sufficient
variability over a clinically useful range, as the essence of prediction models is
their ability to correctly classify individuals into clinically useful risk strata (Cook
et al., 2006; Cook, 2007; Ridker and Cook, 2007). Imagine a perfectly calibrated
model that assigns a probability of achieving a treatment independent pregnancy
23138 Coppus.indd 55 28-09-12 14:55
Chapter 3
56
of 20%-30% to all subfertile couples, and a second perfectly calibrated model
that assigns a probability of IVF success of 20%-30% in the same couples. That
would probably lead to a recommendation of expectant management in all,
which is likely to be erroneous, and the models would not allow us to identify
couples that are better off with IVF, separating them from couples that are better
off with expectant management. For models to be useful, they need to not only
be well calibrated, but also produce a clinically useful distribution in probabilities
in the target population. If we believe that some couples are better off with IVF,
both the spread of probabilities with expectant management and the spread of
probabilities with IVF should be sufficiently wide and distributions should be well
apart. Only then couples can be subdivided into prognostic categories to decide
on the appropriate treatment policy. Various graphs and expressions of the range
in calculated probabilities have recently been suggested in the literature. One
example is the predictiveness curve, suggested by Pepe et al. (2007), which plots
the cumulative distribution of the calculated probabilities in a group, and as such
is a useful visual aid in expressing the relative variability in calculated probabilities.
Research is emerging, and a debate has started, on the best ways to evaluate
superiority of models for prediction (Pepe et al., 2007; Janes et al., 2008; Pencina et
al., 2008). In most of these discussions, a change in the c-statistic is not regarded
as the best statistic to measure an improvement in model performance (Cook,
2007), as a substantial improvement in the c-statistic can only be achieved by
unrealistically large associations between the predictor variables and outcome
(Ware, 2006). Whereas odds ratios for diagnostic tests exceeding 30 or more are
of no exception, odds ratios for prognostic tests rarely exceed 2 (Glas et al., 2003).
Most importantly, a comparison of areas under the curve between prognostic
models does not show us whether individual couples have a different prognosis in
one model when compared with a second model. Regarding the latter, some have
suggested examining how many couples are reclassified, i.e. reassigned from the
group of those for whom expectant management is the better option to those for
whom, say, IUI or even IVF is the better alternative. Ultimately, however, the value
of prediction models should be based on outcome: to what extent does the use of
models improve patient outcome, i.e. allow a couple to have a healthy pregnancy,
at a reasonable balance with patient burden, morbidity and costs.
23138 Coppus.indd 56 28-09-12 14:55
Evaluating prediction models
57
3
Conclusion
In reproductive medicine, prognostic models that perfectly predict pregnancy
in subfertile couples, after natural conception or after assisted reproductive
technologies, do not exist and most likely will never exist. Yet, properly calibrated
models, with sufficient variability in calculated probabilities, can be used to support
clinical decision making when the chances of a pregnancy with treatment have to
be weighed against the chances of a pregnancy without treatment. The commonly
used c-statistic or area under the ROC curve expresses only discrimination and is,
as such, not a good measure of the extent to which predictive markers and models
can guide decision making. To assess the clinical value of prediction models,
calibration, variability in probabilities and, subsequently, the degree to which
reclassification of these probabilities yield clinically relevant prognostic strata are
more relevant criteria.
23138 Coppus.indd 57 28-09-12 14:55
Chapter 3
58
References
Collins JA, Burrows EA, Wilan AR. The prognosis for live birth among untreated infertile couples. Fertil Steril 1995; 64:22-28.
Cook NR, Buring JE, Ridker PM. The effect of including c-reactive protein in cardiovascular risk prediction models for women. Ann Intern Med 2006; 145:21-29.
Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007; 115:928-935.
Custers IM, Steures P, van der Steeg JW, van Dessel TJ, Bernardus RE, Bourdrez P, Koks CA, Riedijk WJ, Burggraaff JM, van der Veen F, Mol BW. External validation of a prediction model for an ongoing pregnancy after intrauterine insemination. Fertil Steril 2007; 88:425-431.
Diamond GA. What price perfection? Calibration and discrimination of clinical prediction models. J Clin Epidemiol 1992; 45:85-89.
Eimers JM, te Velde ER, Gerritse R, Vogelzang ET, Looman CW, Habbema JD. The prediction of the chance to conceive in subfertile couples. Fertil Steril 1994; 61:44-52.
Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Biostatistics 2005; 6:227-239.
Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 2003; 56:1129-1135.
Habbema JD, Collins J, Leridon H, Evers JL, Lunenfeld B, te Velde ER. Towards less confusing terminology in reproductive medicine: a proposal. Hum Reprod 2004; 19:1497-1501.
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29-36.
Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, te Velde ER. Two new prediction rules for spontaneous pregnany leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004; 19:2019-2026.
Hunault CC, Laven JS, van Rooij IA, Eijkemans MJ, te Velde ER, Habbema JD. Prospective validation of two models predicting pregnancy leading to live birth amont untreated subfertile couples. Hum Reprod 2005; 20:1636-1641.
Janes H, Pepe MS, Gu W. Assessing the value of risk predictions by using risk stratification tables. Ann Intern Med 2008; 149:75-760.
Leushuis E, van der Steeg JW, Steures P, Bossuyt PM, Eijkemans MJ, van der Veen F, Mol BW, Hompes PG. Prediction models in reproductive medicine: a critical appraisal. Hum Reprod Update 2009; 15:537-552.
Lintsen AM, Eijkemans MJ, Hunault CC, Bouwmans CA, Hakkaart L, Habbema JD, Braat DD. Predicting ongoing pregnancy chances after IVF and ICSI: a national prospective study. Hum Reprod 2007; 22:2455-2462.
Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008; 27:157-172.
Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, Thompson IM, Zheng Y. Integrating the predictiveness of a marker with its performance as a classifier. Am J Epidemiol 2007; 167:362-368.
23138 Coppus.indd 58 28-09-12 14:55
Evaluating prediction models
59
3
Ridker PM, Cook NR. Biomarkers for prediction of cardiovascular events. N Engl J Med 2007: 356:1474-1475.
Smeenk JM, Stolwijk AM, Kremer JA, Braat DD. External validation of the templeton model for predicting success after IVF. Hum Reprod 2000; 15:1065-1068.
Snick HK, Snick TS, Evers JL, Collins JA. The spontaneous pregnancy prognosis in untreated subfertile couples: the Walcheren primary care study. Hum Reprod 1997; 12:1582-1588.
Steures P, van der Steeg JW, Mol BWJ, Eijkemans MJ, van der Veen F, Habbema JD, Hompes PG, Bossuyt PM, Verhoeve HR, van Kasteren, van Dop PA. Prediction of an ongoing pregnancy after intrauterine insemination. Fertil Steril 2004; 82:45-51.
Steures P, van der Steeg JW, Hompes PG, Habbema JD, Eijkemans MJ, Broekmans FJ, Verhoeve HR, Bossuyt PM, van der Veen F, Mol BW. Intrauterine insemination with controlled ovarian hyperstimulation versus expectant management for couples with unexplained subfertility and an intermediate prognosis: a randomised clinical trial. Lancet 2006; 368:216-221.
Stolwijk AM, Zielhuis GA, Hamilton CJ, Straatman H, Hollanders JM, Goverde HJ, van Dop PA, Verbeek AL. Prognostic models for the probability of achieving an ongoing pregnancy after in-vitro fertilization and the importance of testing their predictive value. Hum Reprod 1996; 11:2298-2303.
Swets J. Measuring the accuracy of diagnostic systems. Science 1998; 240:1285–1293.
Templeton A, Morris JK, Parslow W. Factors that affect outcome of in-vitro fertilisation treatment. Lancet 1996; 348:1402-1406.
Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile couples. Hum Reprod 2007; 22:536-542.
Ware JH. The limitations of risk factors as prognostic tools. N Engl J Med 2006; 355: 2615-2617.
23138 Coppus.indd 59 28-09-12 14:55
23138 Coppus.indd 60 28-09-12 14:55
Part 2Clinical studies
23138 Coppus.indd 61 28-09-12 14:55
23138 Coppus.indd 62 28-09-12 14:55
4The predictive value of medical
history taking and Chlamydia IgG
ELISA antibody testing (CAT) in the
selection of subfertile women for
diagnostic laparoscopy: a clinical
prediction model approach.
Sjors CoppusBrent Opmeer
Susan LoganFulco van der Veen
Siladitya BhattacharyaBen Willem Mol
Human Reproduction 2007, 22: 1353-1358
The content of this article was presented as an oral presentation at the 22nd ESHRE Annual Meeting, 2006, Prague, Czech Republic.
23138 Coppus.indd 63 28-09-12 14:55
Chapter 4
64
Abstract
BACKGROUND: Medical history taking as well as Chlamydia antibody titre
(CAT) testing is currently used in the selection of patients for diagnostic
laparoscopy with tubal patency testing. Most research has focused on
the predictive value of CAT in isolation from medical history. We assessed
therefore whether the combination of medical history and CAT testing
improves the efficiency of selecting patients for laparoscopy as compared
to the use of either medical history or CAT.
METHODS: Data of 207 consecutive subfertile women were used to create
multivariable logistic regression models for the prediction of tubal disease
as diagnosed by diagnostic laparoscopy.
RESULTS: The model with data of medical history only had an area under
the receiver operating characteristic curve (AUC) of 0.65 (95% CI 0.56-0.74).
Addition of CAT increased the AUC to 0.70 (95% CI 0.62-0.78) (P=0.065). CAT
was positive in 40 women and showed a sensitivity of 0.37 (95% CI 0.26-
0.49) for a specificity of 0.88 (95% CI 0.82-0.93). In CAT positive women a
blank medical history did not decrease the probability of tubal disease. Of
the 167 women tested CAT negative, 23 (14%) still had a high probability
of disease due to their medical history and 11 of them (48%) showed tubal
abnormalities on diagnostic laparoscopy.
CONCLUSIONS: CAT testing adds valuable information to a woman’s risk
profile based on her medical history. The combination of medical history
taking and CAT testing has a better yield for diagnosing tubal disease than
either of these alone.
23138 Coppus.indd 64 28-09-12 14:55
Medical history taking and CAT testing
65
4
Introduction
Tubal pathology affects approximately 15-30% of subfertile women (Evers, 2002).
Currently, laparoscopy is considered to be the best available test for diagnosing
tubal abnormalities. It allows direct visualization of the pelvis and, in addition to
tubal patency testing, offers the opportunity to detect pelvic adhesions. However,
laparoscopy does have several drawbacks. First, it is an invasive surgical procedure
that requires general anaesthesia, both of which carry associated risks. Secondly, as
an expensive investigation that requires operating time and dedicated personnel,
its availability is not unlimited. Therefore, triage is needed to limit the number of
unnecessary laparoscopies, while maintaining a high diagnostic yield.
Since it was reported in 1979 that women with positive Chlamydia IgG antibodies
are more likely to have tubal pathology than those without (Punnonen et al., 1979),
numerous studies have reported on the value of Chlamydia antibody titer (CAT)
testing to predict tubal pathology. Based on these reports, the Dutch Society for
Obstetrics and Gynaecology (NVOG) recommends the use of CAT as a first-line
test in the basic work-up of subfertile couples, with a fixed cut-off level (IgG MIF
>1:32 or ELISA >1.1) above which post-infectious pelvic disease should be ruled
out with laparoscopy and chromotubation (Swart et al., 1995; NVOG, 2004). The
pre-test probability of disease of a woman, based on specific risk factors or other
characteristics from the woman’s medical history is not formally taken into account.
In contrast to the Dutch guideline, the guideline from the National Institute for
Clinical Excellence (NICE) in the United Kingdom does not recommend CAT testing
in the workup of subfertile women, but advises the use of past medical history in
the woman to decide whether diagnostic laparoscopy should be performed or
not (NICE, 2004).
Most diagnostic test research is interpreted in isolation from its clinical context.
However, this does not reflect daily clinical practice, in which the diagnostic
investigation is a consecutive process always starting with history taking and
physical examination, followed by laboratory tests and more invasive and costly
tests such as imaging (Moons et al., 2004). With multivariable analysis one can
reveal the “add-on” value of diagnostic tests (Moons and Grobbee, 2002), in this
case the additional value of CAT testing.
23138 Coppus.indd 65 28-09-12 14:55
Chapter 4
66
The present study therefore aimed to explore the diagnostic consequences
of medical history taking by itself as recommended by the NICE guideline, CAT
testing, as recommended by the Dutch Society for Obstetrics and Gynaecology,
and the combined interpretation of the results from medical history and CAT
testing, which is intuitively used in daily practice by many clinicians.
Materials and Methods
PatientsWe used data from a previously reported study, in which 207 consecutive women
referred for evaluation of subfertility underwent a diagnostic laparoscopy.
Characteristics of included women and the materials and methods used in this
study have been published previously (Logan et al., 2003). In brief: timing of tubal
assessment was made on clinical grounds and was independent of the result of the
CAT test. All couples had undergone assessment of ovulation and a semen analysis.
Women who were physically unsuitable for laparoscopy or who had undergone
a tubal patency test in the past (either hysterosalpingography or laparoscopy), as
well as women with previous tubal surgery (including ectopic pregnancy) were
excluded from the analysis. Before surgery, all women completed a standard
questionnaire with regards to their demographic data and medical history. For
CAT testing, a Chlamydia trachomatis-specific IgG antibody peptide-based ELISA
(pELISA, Medac, Germany) was used. A positive IgG titer (>1.1) was indicative
of a past infection. If the result was equivocal, an infection with C. trachomatis
in the past could not be excluded and therefore positive and equivocal results
were pooled. Tubal pathology (uni-or bilateral) was defined as the presence
of adhesions involving the Fallopian tube and ovary, clubbing of the tube,
hydrosalpinx or obstruction to the flow of dye in the absence of endometriosis,
which was categorized as a separate entity according to the Revised American
Fertility classification (American Fertility Society, 1985).
Data analysisWe developed multivariable logistic regression models to predict tubal pathology
using the following data from the medical history: female age, number of previous
term deliveries, previous induced abortion, type of subfertility (i.e. primary
or secondary), duration of subfertility, history of pelvic inflammatory disease
(PID), history of sexually transmitted disease, previous use of an intra-uterine
23138 Coppus.indd 66 28-09-12 14:55
Medical history taking and CAT testing
67
4
contraceptive device and prior non-gynaecological pelvic surgery. We evaluated
the prognostic value of CAT both as a dichotomous and continuous variable. First,
we checked the assumption of linearity between the continuous variables age,
duration of subfertility, number of previous term deliveries and titer of CAT on
the one hand and tubal pathology on the other hand visualized with smoothed
piecewise polynomials (splines), and formally tested with a generalized additive
model. As we found no significant deviations from linearity in the relationship
with the outcome, it was not necessary to transform these continuous variables.
Subclasses within categorical variables were dichotomized. For dichotomous and
continuous variables, univariable beta coefficients (β), odds ratios (ORs) and 95%
confidence intervals (95% CI) as well as P-values were calculated. In the univariable
analysis, the P-value for statistical significance was set at < 0.05 in all analyses.
Subsequently, multivariable logistic regression analysis with a stepwise backwards
selection procedure was used to construct the prediction models. Since the
erroneous exclusion of a variable is more deleterious for a model than including
too many factors (Mol et al., 2000; Steyerberg et al., 2000), we used a significance
level of 30% for entry in the multivariable model and a significance level of 20% to
stay in the model.
To adjust for overfitting, we performed internal validation with bootstrapping.
Bootstrapping is a technique to evaluate the robustness of the model by
simulating sampling variation in repeated samples from the study population.
Cases are selected randomly from the original sample to create a new sample
of the same size. As cases are drawn with replacement, some cases may be
randomly selected multiple times and others not at all (Steyerberg et al., 2001).
We generated 500 bootstrap data sets in which the same multivariable logistic
regression model was estimated. By analyzing the difference between the
models based on the original dataset and the bootstrap estimates, a shrinkage
factor was calculated and used to correct the original model coefficients and
associated probabilities for overfit.
We developed two models. The first model (“clinical history” model) was based on
items from medical history only. In the second model (“clinical history and CAT”
model) previous infection with C. trachomatis as determined with CAT ELISA was
combined with medical history. This second model was developed to explore
what the result of CAT adds to the information already obtained from medical
23138 Coppus.indd 67 28-09-12 14:55
Chapter 4
68
history taking. To make comparisons, as a third “model”, CAT was dichotomized into
positive or negative test results, as recommended by the Dutch guideline.
The discriminative capacity of the two constructed models was evaluated with
receiver operating characteristic (ROC) curve analysis, in which the area under the
curve (AUC) was calculated. Accuracy of CAT was plotted into the ROC curve as a
comparison. Sensitivity was defined as the number of women with tubal pathology
that was predicted correctly, and specificity was defined as the number of women
without tubal pathology predicted correctly. Calibration of the model, which is the
agreement between predicted and observed probabilities, was evaluated with the
goodness-of-fit Hosmer and Lemeshow test statistic and by visual inspection of
the calibration plots. Finally, we explored the diagnostic consequences of different
strategies. Data were analyzed using SPSS 12.0.1 (SPSS Inc., Chicago, IL, USA) and
S-PLUS 2000 (MathSoft Inc.)
Results
The prevalence of tubal pathology was 30.4% (63/207). Demographic data of
included women have been published elsewhere (Logan et al., 2003). In the
univariable analysis, a higher number of previous term deliveries, a history of PID,
and a positive CAT were statistically significant predictors of tubal pathology. The
other variables were not statistically significant associated with tubal subfertility
(Table I). Four predictive variables, i.e. the number of previous term deliveries,
duration of subfertility, a history of PID and previous non-gynaecological pelvic
surgery were found to be associated with tubal pathology at a P-level of <0.20 in
the “clinical history” model (Table I).
The backward logistic regression selection procedure excluded duration of
subfertility due to a P-value of 0.38 in the “clinical history and CAT” model. However,
as one of the aims of the present study was to compare medical history taking
with an integrated approach of history taking and CAT testing, exclusion of this
variable in the second model would have prohibited this exploration. Analysis
of the regression coefficients and 95% CI for the OR of this variable showed that
duration of subfertility only had a minor contribution towards the estimated
probabilities. To allow comparison, we therefore decided to force duration of
23138 Coppus.indd 68 28-09-12 14:55
Medical history taking and CAT testing
69
4
subfertility into the second model (Table I). The possible additional value of use
of CAT as a continuous variable in the prediction model was explored, but did not
improve the discriminatory capacity (data not shown). As a result, CAT was used as
dichotomous variable in the “clinical history and CAT” model.
TABLE I. Results of the uni- and multivariable analysis
Univariable analysis Multivariable analysisclinical history model ¶
Multivariable analysis clinical history and CAT
model ¶¶
Variable OR 95% CI P OR 95% CI P OR 95% CI P
Age (per year) 1.03 0.96-1.09 0.44
Number of deliveries 2.05 1.23-3.43 0.006 2.14 1.27-3.63 0.004 2.05 1.17-3.58 0.01
Induced abortion 1.59 0.77-3.29 0.21
Secondary infertility 1.59 0.87-2.88 0.13
Duration of subfertility 1.02 1.00-1.03 0.08 1.01 1.00-1.03 0.11 1.01 1.00-1.03 0.38
History of PID 3.48 1.06-11.4 0.04 3.28 0.95-11.3 0.06 3.27 0.88-12.2 0.08
History of STD 1.15 0.33-3.98 0.82
IUCD use in history 2.40 0.67-8.59 0.18
Pelvic surgery 2.93 0.86-9.97 0.09 3.18 0.89-11.4 0.08 4.49 1.10-14.7 0.03
CAT positivity 4.30 2.09-8.83 <0.001 4.07 1.88-8.81 <0.001
OR = odds ratio ¶ Backward selection procedureCI = confidence interval ¶¶ Backward selection procedure with forced entry of duration of subfertilityP = p-value
Internal validation with bootstrapping showed a 17.5% overfit for the clinical
history model, and a 14% overfit for the “clinical history and CAT” model. Therefore
the regression coefficients of the model variables were corrected with these
shrinkage factors.
The ROC curves of the two models are shown in Fig. 1. The “clinical history” model
showed an AUC of 0.65 (95% CI 0.56-0.74). “The clinical history and CAT model”
increased the AUC to 0.70 (95% CI 0.62-0.78) (P=0.065) (De Long et al., 1988). As
a comparison, the performance of CAT alone –sensitivity 37% (95% CI 26-49),
specificity 88% (95% CI 82-93) - is plotted in the ROC space.
Calibration of the models was acceptable, as confirmed with the Hosmer and
Lemeshow goodness-of-fit test statistic. This test statistic had a P-value of 0.30 for
the “clinical history” model and a P-value of 0.35 for the “clinical history and CAT”
model. Tables IIa and IIb show the agreement between different categories of
predicted and observed probabilities in the two prediction models. Both models
show an acceptable agreement between the mean predicted probabilities and
23138 Coppus.indd 69 28-09-12 14:55
Chapter 4
70
the observed probability of tubal pathology. However, above a 30% predicted
probability, both models underestimate the chance of disease. In Tables IIa and IIb
the categories of predicted probability of the “clinical history” and “clinical history
and CAT” model are also cross-tabulated against the results of laparoscopy and
CAT. Using the “clinical history model”, 28 women that tested CAT positive, are
classified as having a predicted probability of tubal pathology <30%. However,
12 of these 28 women (43%) still showed abnormalities at laparoscopy (Table IIa).
Using the “clinical history and CAT model” the same 28 women that would have
been classified as low risk patients based on clinical history only, shift to a high
risk classification (i.e. a predicted probability of tubal disease >30%) (Table IIb). On
the other hand, Table IIb also shows that among the 167 women who tested CAT
negative, 23 (14%) still had a predicted probability of >30% of tubal disease due to
their medical history. In this selected group of CAT negative women 48% (11/23)
still showed tubal abnormalities on diagnostic laparoscopy.
1,0
0,8
0,6
0,4
0,2
0,00,0 0,2 0,4 0,6 0,8 1,0
Sens
itivi
ty
1-Specificity
FIGURE I. Receiver operating characteristic curves of the multivariable logistic regression models for the prediction of tubal pathology in subfertile patients.- - - - - = CAT-–-–-–- = clinical history model____ = clinical history + CAT model
23138 Coppus.indd 70 28-09-12 14:55
Medical history taking and CAT testing
71
4
TABLE IIa. The classification of women according to their predicted probabilities, cross-tabulated against the results of laparoscopy and univariable CAT testing.
CAT positive CAT negative
Predicted probability*
Diseased Non-diseased
Diseased Non-diseased
Total Mean predicted probability
Observed probability
10-20% 8 7 15 76 106 18% 21%
20-30% 4 9 12 34 59 25% 27%
30-40% 2 1 6 11 20 35% 40%
>40% 9 0 7 6 22 50% 73%
Total 23 17 40 127 207
40 167 207
* Probability calculated from the “clinical history” model
TABLE IIb. The classification of women according to their predicted probabilities, cross-tabulated against the results of laparoscopy and univariable CAT testing.
CAT positive CAT negative
Predicted probability**
Diseased Non-diseased
Diseased Non-diseased
Total Mean predicted probability
Observed probability
10-20% 0 0 22 91 113 15% 19%
20-30% 0 0 7 24 31 24% 23%
30-40% 9 11 7 9 36 37% 44%
>40% 14 6 4 3 27 57% 67%
Total 23 17 40 127 207
40 167 207
** Probability calculated from the “clinical history and CAT” model
To show the consequences of medical history taking by itself, CAT testing alone
by itself, and a combination of medical history taking and CAT testing, diagnostic
and clinical parameters are listed in Table III. For comparison, the consequences of
performing diagnostic laparoscopy in all women without any selection strategy are
shown in the first column. The number of laparoscopies that has to be performed
to detect one woman with tubal pathology is comparable when using history,
CAT or history and CAT and much lower than without any workup. However, the
detection rate of tubal pathology (true positive rate) is highest when using the
combined model (54% versus 38 and 37% respectively).
23138 Coppus.indd 71 28-09-12 14:55
Chapter 4
72
TABLE III.
Diagnostic parameter All LS* probability >30% by history LS
positive CAT LS
probability >30% by history and CAT LS
Number of laparoscopies 207 42 40 63
Tubal disease diagnosed 63 24 23 34
Tubal diseased missed 0 39 40 29
Unnecessary laparoscopies 144 18 17 26
NND** 3.3 1.8 1.7 1.9
Sensitivity NA 38% 37% 54%
Specificity NA 88% 88% 80%
* Laparoscopy** Number needed to diagnose: number of laparoscopies needed to detect one abnormality.
Discussion
In this study we evaluated the efficiency of medical history taking, CAT testing,
and a combination of both in selecting women for laparoscopy to detect tubal
pathology. We found that the discriminative capacity of the “medical history” model
did not differ from that of CAT testing alone. However, combined interpretation of
both history and CAT identified the women at highest risk for tubal disease, and
increased sensitivity for a similar specificity. In CAT positive women, the probability
of tubal disease is so high that a prompt diagnostic laparoscopy is indicated,
irrespective of whether the woman has a suspect or non-suspect medical history.
A small subgroup of CAT negative women (14%) still have a high probability of
tubal disease due to their history, and laparoscopy will identify tubal abnormalities
in around half of these women. The majority of women (70%) in this cohort were
classified as having a probability <30% of tubal disease due to a negative CAT
status and a non-suspect clinical history. Since 80% of these women will show no
tubal abnormalities, laparoscopy in these women can be deferred.
Our study has some limitations. First, both our study cohort and the actual number
of patients with positive factors in the medical history were relatively small. This
limits the statistical power of this study and leads to relatively large confidence
intervals. However, the strength of using this cohort of women is the fact that
the decision to perform a diagnostic laparoscopy was irrespective of the result of
CAT. This makes it possible to obtain estimates of diagnostic performance without
partial verification bias, a limitation encountered in many other diagnostic studies
on this subject (Logan et al., 2003; Whiting et al., 2004).
23138 Coppus.indd 72 28-09-12 14:55
Medical history taking and CAT testing
73
4
A second limitation of this study is the fact that we dichotomized the result of
the CAT ELISA assay. A progressive increase in the likelihood of tubal pathology
in patients with a higher titre of the Chlamydia antibody test has been shown for
microimmunofluorescence (MIF) tests (Thomas et al., 2000), as well as for ELISA
(Tanikawa et al., 1996). The antibody titres of women in our cohort showed only
little diversity, as a result of which we were not able to demonstrate this relation
in the present study. It may be possible that use of a MIF test would have resulted
in a larger diversity between IgG titres of the patients, leading to extra diagnostic
information if incorporated into the prediction model. MIF tests however have
been shown to suffer from a significant amount of cross-reactivity with Chlamydia
pneumoniae. Furthermore, interpretation of these tests is subject to inter-observer
variation (Gijsen et al., 2001; Land et al., 2003). Moreover, a recent study showed the
pELISA used in this study to be the one most specific for a previous C. trachomatis
infection (Land et al., 2003).
A third limitation concerns our definition of tubal pathology. Broad criteria were
used, namely presence of adhesions involving the Fallopian tubes and ovaries,
clubbing of the tubes and hydrosalpinges or obstruction to the flow of dye,
irrespective of severity or whether the pathology was uni- or bilateral. Previous
studies have shown that the diagnostic performance of CAT increases when the
label “tubal disease” is limited to the more severe cases (Land et al., 1998). Due to a
low number of women with bilateral tubal disease in this study, we were unable to
restrict our analysis to these cases.
Detection of endometriosis by laparoscopy was not part of the intention to screen
by medical history, since CAT will be unable to predict adhesions or damage due
to endometriosis. Although this approach might be open to debate, this has
been done previously in studies on the subject (Akande et al., 2003; Logan et al.,
2003). Data on the capacity of medical history to predict endometriosis have been
published previously (Calhaz-Jorge et al., 2004).
One previous study has attempted to predict tubal pathology by means of a
multivariable logistic regression model (Hubacher et al., 2004). Women were
divided into a ‘high probability’ and ‘low probability’ group based on a logistic
regression model that included four variables, namely previous symptoms of PID,
vaginal discharge, lower genital tract infection, and presence of antibodies to C.
23138 Coppus.indd 73 28-09-12 14:55
Chapter 4
74
trachomatis. The results demonstrated a sensitivity of 84% with a specificity of 29%.
The authors concluded that “the role of history taking in the evaluation of women
with tubal factor infertility is limited”. Although clinical history does not have the
ultimate discriminative power to distinguish between patients with and without
tubal pathology, we have demonstrated in the present study that it can be used to
create individualized patient risk profiles and have shown the additional value of
CAT testing upon medical history taking.
In our study we included women with both uni- and bilateral tubal disease.
Unilateral tubal occlusion or tubo-ovarian adhesions may compromise fertility
prospects moderately, but only bilateral tubal occlusion is known to virtually
exclude all possibility of a spontaneous pregnancy (Mol et al., 1999). Therefore,
it is questionable whether all women with minor tubal pathology should be
labelled as having tubal disease. To date, no evidence-based policies for the
treatment of unilateral tubal disease have been published (Ahmad et al., 2006),
and therefore detection of one-sided tubal disease is unlikely to result in a major
shift in treatment. Since most of these women have a low predicted probability
of disease, postponing laparoscopy in these women may allow them to conceive
spontaneously (van der Steeg et al., 2007), thereby making laparoscopy redundant.
Unfortunately, we did not have data about treatment independent pregnancy
rates in this cohort of patients, and cannot answer this question.
Our model allows differentiation between women with a high predicted chance
of tubal pathology and women with a low chance. What probability of diagnostic
uncertainty is acceptable for women is not known at the moment. It may be
possible that women prefer to have 100% certainty about their tubal integrity.
In that case, since no workup will ever be 100% accurate all women will require
diagnostic laparoscopy. This will increase costs and enlarge the pool of women
with a diagnosis of minimal pathology of questionable prognostic significance
(Collins et al., 1995). Use of a multivariable prediction model including history and
CAT allows a conjoint decision of whether or not to perform a laparoscopy, based
on the woman’s as well as doctor’s preferences. Depending on these preferences,
one could perform a diagnostic laparoscopy in case of a high predicted
probability, postpone laparoscopy for several months in cases where the predicted
probability is low, or use a less invasive alternative, i.e. hysterosalpingography or
hysterosalpingo-contrast-sonography. Women should be adequately informed
23138 Coppus.indd 74 28-09-12 14:55
Medical history taking and CAT testing
75
4
regarding the chance of delaying the diagnosis in case of low predicted probability
and negative CAT status.
In this study, we explored the value of medical history taking and Chlamydia
antibody testing in predicting tubal subfertility by constructing diagnostic
prediction models. Whether the use of such models is feasible and clinically
effective in daily routine practice is not known. Further research will have to
evaluate women’s preferences regarding the trade-off between likelihood
of detecting tubal pathology and the risks and discomfort associated with a
potentially unnecessary diagnostic laparoscopy.
In conclusion, combined use of medical history taking and CAT testing has a
superior diagnostic accuracy than one of these alone. CAT testing adds additional
predictive value to a woman’s medical history risk profile.
Appendix:The chance of a patient of having tubal pathology can be calculated with the following formula: probability of tubal pathology = 1/[1+ EXP(-β)] in which the β for the different models are: “clinical history” model: beta = -1.764 + (0.629*number of term deliveries) + (0.0116*duration of subfertility) + (0.979*positive history of PID) + (0.955*history of non-gynaecologic pelvic surgery). “history+cat” model: beta = -1.874 + (0.616*number of term deliveries) + (0.0069*duration of subfertility) + (1.020*positive history of PID) + (1.201*history of non-gynaecologic pelvic surgery) + (1.208*positive CAT).
23138 Coppus.indd 75 28-09-12 14:55
Chapter 4
76
References
American Fertility Society. Revised American Fertility Society classification of endometriosis. Fertil Steril 1985; 43:351-352.
Ahmad G, Watson A, Vandekerckhove P and Lilford R. Techniques for pelvic surgery in subfertility. Cochrane Database Syst Rev 2006; 2: CD000221.
Akande VA, Hunt LP, Cahill DJ, Caul EO, Ford WC, Jenkins JM. Tubal damage in infertile women: prediction using Chlamydia serology. Hum Reprod 2003; 18:1841-1847.
Calhaz-Jorge C, Mol BW, Nunes J, Costa AP. Clinical predictive factors for endometriosis in a Portugese infertile population. Hum Reprod 2004; 19:2126-2131.
Collins JA, Burrows EA, Wilan AR. The prognosis for live birth among untreated infertile couples. Fertil Steril 1995; 64:22-28.
De Long ER, De Long DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44:837-845.
Evers JL. Female subfertility. Lancet 2002; 360:151-159.
Gijsen AP, Land JA, Goossens VJ, Leffers P, Bruggeman CA, Evers JL. Chlamydia pneumoniae and screening for tubal factor subfertility. Hum Reprod 2001; 16:487-491.
Hubacher D, Grimes D, Lara-Ricalde R, de la Jarra J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor subfertility. Fertil Steril 2004; 81:6-10.
Land JA, Evers JL, Goossens VJ. How to use Chlamydia antibody testing in subfertility patients. Hum Reprod 1988; 13:1094-1098.
Land JA, Gijsen AP, Kessels AG, Slobbe ME, Bruggeman CA. Performance of five serological chlamydia antibody tests in subfertile women. Hum Reprod 2003; 18:2621-2627.
Logan S, Gazvani R, McKenzie H, Templeton A, Bhattacharya S. Can history, ultrasound, or ELISA chlamydial antibodies, alone or in combination, predict tubal factor subfertility in subfertile women? Hum Reprod 2003; 18:2350-2356.
Mol BW, Swart P, Bossuyt PM, van der Veen F. Prognostic significance of diagnostic laparoscopy for spontaneous fertility. J Reprod Med 1999; 44:81-86.
Mol BW, van Wely M, Steyerberg EW. Using prognostic models in clinical infertility. Hum Fertil (Camb) 2000; 3:199-202.
Moons KG, Grobbee DE. Diagnostic studies as multivariable, prediction research. J Epidemiol Community Health 2002; 56:337-338.
Moons KG, Biesheuvel CJ, Grobbee DE. Test research versus diagnostic research. Clin Chem 2004; 50:273-6.
Nederlandse Vereniging voor Obstetrie en Obstetrie (2004) Orienterend fertiliteitsonderzoek [NVOG guideline diagnostic fertility workup]
National Institute for Clinical Excellence (2004) Guideline CG11: Fertility: assessment and treatment for people with fertility problems. (http://www.nice.org.uk/download.aspx?o=CG011fullguideline)
Punnonen R, Terho P, Nikkanen V, Meurman O. Chlamydial serology in infertile women by immunofluorescence. Fertil Steril 1979; 31:656-659.
23138 Coppus.indd 76 28-09-12 14:55
Medical history taking and CAT testing
77
4
Steyerberg EW, Eijkemans MJ, Harrell FE Jr., Habbema JD. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000; 19:1059-1079.
Steyerberg EW, Harrell FE Jr., Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001; 54:774-781.
Swart P, Mol BW, van der Veen F, van Beurden M, Redekop WK, Bossuyt PM. The accuracy of hysterosalpingography in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1995; 64:486-491.
Tanikawa M, Harada T, Katagiri C, Onohara Y, Yoshida S, Terakawa N. Chlamydia trachomatis antibody titres by enzyme-lined immunosorbent assay are useful in predicting severity of adnexal adhesion. Hum Reprod 1996; 11:2418-2421.
Thomas K, Coughlin L, Mannion PT, Haddad NG. The value of Chlamydia trachomatis antibody testing as part of routine infertility investigations. Hum Reprod 2000; 15:1079-1082.
Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ et al., Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile couples. Hum Reprod 2007; 22:536-542.
Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 204; 140: 189-202.
23138 Coppus.indd 77 28-09-12 14:55
23138 Coppus.indd 78 28-09-12 14:55
5Identifying subfertile ovulatory
women for timely tubal patency
testing: a clinical decision model
based on medical history.
Sjors CoppusHarold Verhoeve
Brent OpmeerJan Willem van der Steeg
Pieternel SteuresRene Eijkemans
Peter HompesPatrick Bossuyt
Fulco van der VeenBen Willem Mol
Human Reproduction 2007, 22: 2685-2692
23138 Coppus.indd 79 28-09-12 14:55
Chapter 5
80
Abstract
BACKGROUND: The aim of tubal testing is to identify women with bilateral
tubal pathology in a timely manner, so they can be treated with IVF or
tubal surgery. At present it is unclear for which women early tubal testing is
indicated, and in whom it can be deferred.
METHODS: Data of 3716 women who underwent tubal patency testing as
part of their routine fertility workup were used to relate elements in their
medical history to the presence of tubal pathology. With multivariable
logistic regression, we constructed two diagnostic models. One in which
tubal disease was defined as occlusion and/or severe adhesions of at least
one tube, whereas in a second model, tubal disease was defined as the
presence of bilateral abnormalities.
RESULTS: Both models discriminated moderately well between women
with and women without tubal disease with an area under the receiver-
operating characteristic curve (AUC) of 0.65 (95% CI: 0.63-0.68) for any
tubal pathology and 0.68 (95% CI: 0.65-0.71) for bilateral tubal pathology,
respectively. However, the models could make an almost perfect distinction
between women with a high and a low probability of tubal pathology. A
decision rule in the form of a simple diagnostic score chart was developed
for application of the models in clinical practice.
CONCLUSIONS: In conclusion, the present study provides two easy to
use decision rules that can accurately express the women’s probability of
(severe) tubal pathology at the couple’s first consultation. They could be
used to select women for tubal testing more efficiently.
23138 Coppus.indd 80 28-09-12 14:55
Medical history models for tubal pathology
81
5
Introduction
Tubal pathology ranks among the most frequent causes of subfertility, next
to ovulatory disorders and sperm defects (Evers, 2002). Therefore, in UK and
Dutch guidelines, tubal patency testing is a part of the fertility workup (National
Institute of Clinical Excellence 2004; Nederlandse Vereniging voor Obstetrie
en Gynaecologie, 2004). The most frequently used and widely accepted tests
for this are hysterosalpingography (HSG) and diagnostic laparoscopy with
chromopertubation. Both tests have several drawbacks however. Women consider
HSG a painful test and HSG has a considerable risk of infection (Forsey et al., 1990;
Liberty et al., 2007). Laparoscopy is an invasive surgical procedure that requires
general anesthesia with associated risks, and its availability is restricted as it is an
expensive investigation requiring operating time and dedicated personnel.
For these reasons, tubal testing is usually performed in a selected group of
subfertile women. At present, guidelines advocate medical history taking as a tool
to select women for tubal testing (National Institute of Clinical Excellence 2004). It
is not entirely clear how elements from the medical history should be combined to
identify women that should undergo early tubal patency testing. So far, only a few
studies with relatively small sample sizes have tried to summarize this information
in a quantitative manner (Hubacher et al., 2004; Coppus et al., 2007).
The aim of the present study was to form clinical decision rules based on medical
history that can help to identify women at high and low risk for (severe) tubal disease.
These decision rules, presented as simple quantitative score charts, were developed
in a large cohort of women who underwent routine tubal patency testing.
Materials and Methods
PatientsBetween January 2002 and February 2004, consecutive couples presenting at the
fertility clinic of 38 centres in The Netherlands were invited to join a prospective
cohort study. The local ethics committee of each participating centre gave
Institutional Review Board approval for this study. All couples underwent a basic
fertility workup according to the guidelines of the Dutch Society of Obstetrics and
23138 Coppus.indd 81 28-09-12 14:55
Chapter 5
82
Gynaecology. The details of this workup have been described in detail elsewhere
(van der Steeg et al., 2007).
The present analysis was limited to couples with a regular ovulatory cycle,
defined as a cycle length between 23 and 35 days with a within cycle variation
of less than 8 days. Ovulation was detected by a basal body temperature chart,
mid-luteal serum progesterone, or by ultrasonographic monitoring of the cycle.
Couples with a history of reversal of sterilization, tubal surgery, IVF or previous
tubal patency testing were excluded. Couples in whom semen analysis showed
a severe impairment of semen quality requiring IVF-ICSI (defined as a post-wash
total motile count < 1*106), were also excluded from the present analysis.
For each couple, we registered referral status (by a general practitioner or
gynaecologist), duration of child wish, female and male age, dysmenorrhoea, female
and male smoking habits, history of pelvic inflammatory disease (PID), history
of Chlamydia infection, previous pelvic surgery other than Caesarean section,
previous treatment with intrauterine insemination and previous pregnancies from
the female and male partner in the present and/or prior relationships, including
whether these resulted in an ectopic pregnancy, legally induced abortion,
spontaneous abortion or term delivery. Moreover, the total motile sperm count
(TMC) was recorded.
Duration of child wish was defined as the period between the date the couple
had started unprotected intercourse and the date of first consultation. Female and
male age was calculated at the time of this first visit. Previous PID had either been
confirmed laparoscopically or was documented if a woman had been treated with
antibiotics for a period of intense abdominal-pelvic pain and fever, for which no
other focus had been found. Infection with Chlamydia trachomatis was self reported
by the patient, and could have been symptomatic or asymptomatic. Missing data
in patients’ history were imputed, as restricting the analysis to complete cases
leads to loss of statistical power and potentially biased results in multivariable
analyses (van der Heijden et al., 2006). For this purpose the ‘aRegImpute’ multiple
imputation function in S-plus 2000 (MathSoft Inc.) was used.
Tubal patency testing was performed with either HSG, diagnostic laparoscopy
(DLS) or transvaginal hydrolaparoscopy (THL), depending on the local protocol
23138 Coppus.indd 82 28-09-12 14:55
Medical history models for tubal pathology
83
5
of participating hospitals. In case, tubal patency was evaluated both with HSG
and diagnostic laparoscopy or THL, the result of the most invasive test was taken
as conclusive. Tubal pathology was defined as uni- or bilaterally impaired or
absent flow of contrast medium on HSG, and as uni- or bilateral occlusion and/or
hydrosalpinx and/or severe adhesions disturbing ovum pick-up if the tubes were
evaluated with laparoscopy or THL. This definition included endometriosis related
tubal disease.
Statistical analysesWe developed two multivariable logistic regression models. In the first model, all
women with either uni- or bilateral abnormalities were coded as diseased, whereas
only those women with bilaterally normal tubes were considered as non-diseased.
This first model will be referred to as the ‘any pathology model’. In the second
model, only women with bilateral tubal abnormalities were classified as diseased,
while women with uni- or bilaterally normal tubes were considered to be non-
diseased. The latter model will be referred to as the ‘bilateral pathology model’.
For the development of the multivariable models, we first checked the assumption
of linearity between the continuous variables female age, male age, TMC and
duration of child wish on the one hand and both definitions of tubal pathology
on the other hand using smoothed piecewise polynomials (splines) (Harrell et al.,
1988). Non-linear associations were redefined, based on these spline functions.
Subclasses within categorical variables were dichotomized. We then performed
univariable and multivariable logistic regression analysis to estimate odds ratios
(ORs), 95% confidence intervals (95% CI) and corresponding P-values for both
dichotomous and continuous variables.
As the use of too stringent P-values for variable selection is more deleterious for a
model than including too many factors (Steyerberg et al., 2000), all variables that
showed a significance level of <30% in univariable analyses were entered in the
multivariable logistic regression model. To make parsimonious models, we used a
stepwise backwards selection procedure, using a predefined significance level of
>10% for removing variables from the models.
To adjust for overfitting, we performed internal validation with bootstrapping
(Steyerberg et al., 2001). We generated 500 bootstrap data sets in which the same
23138 Coppus.indd 83 28-09-12 14:55
Chapter 5
84
multivariable logistic regression model was estimated. By analysing the difference
between the model based on the original dataset and the bootstrap estimates, a
shrinkage factor was calculated and used to correct the original model coefficients
and associated probability estimates for overfit.
The discriminative capacity of the models was evaluated by calculating the area
under the receiver- operating curve (AUC). Calibration of the models was assessed
by comparing the calculated probabilities across five risk groups (quintiles) with the
observed proportion of tubal disease in each of these groups. The goodness-of-fit
was evaluated graphically and tested formally with the Hosmer and Lemeshow
test statistic.
Finally, a paper score worksheet was developed, for which the shrunken regression
coefficients were multiplied by 10 and rounded to simplify the computation
for clinical practice. Values for continuous variables were obtained by linear
interpolation. This diagnostic rule was then transformed into a score chart to
simplify computations in clinical practice. Data were analysed using SPSS 12.0.1
(SPSS Inc., Chicago, IL, USA) and S-PLUS 2000 (MathSoft Inc.)
Results
During the study period, data of 7860 couples were collected. Of these, 938
women did not have a regular ovulatory cycle, 636 men had severely impaired
sperm quality and 196 couples had previously undergone fertility surgery or IVF.
In another 568 women, HSG or DLS had been performed in a previous episode of
subfertility. In total, these factors left 5522 couples for inclusion.
In total, 3716 women underwent 2752 hysterosalpingographies, 1527 diagnostic
laparoscopies and 160 transvaginal hydrolaparoscopies (Fig. 1). About 2039 women
underwent only HSG, 808 women underwent only laparoscopy and 147 women
underwent only THL. The remaining 722 women underwent multiple tests: 709
women had HSG and laparoscopy, 3 women had HSG and THL, 9 women had THL
and laparoscopy, and 1 woman underwent all three procedures.
23138 Coppus.indd 84 28-09-12 14:55
Medical history models for tubal pathology
85
5
Consecutive subfertile couples n =7860
N = 5522 • 1806 women no tubal testing • 3716 women tubal testing
Excluded • n = 938 ovulation disorder • n = 636 TMC < 1 x 106 • n = 196 previous IVF or fertility surgery • n = 568 previous tubal testing
Number of tubal patency tests in 3716 women • 2752 HSG • 1527 DLS • 160 THL
FIGURE 1. Study profile
In 1806 women (32%), tubal patency was not assessed invasively. Women that
were tested were significantly less often referred by a gynaecologist (7.4% versus
10.4%, P<0.001), had a longer duration of subfertility (median 1.5 versus 1.4 years,
P<0.001), were less often treated previously with IUI (3.1% versus 6.7%), had
conceived less often previously in the present relationship (30.3% versus 36.5%,
P<0.001) and more often had a history of pelvic surgery (14.2 versus 11.3%,
P=0.008) than women that did not undergo tubal testing.
The median time between the couple’s first appointment and tubal patency
testing with HSG was 3.1 months. For laparoscopy and for THL, these intervals were
6.5 months and 4.9 months respectively. For all women, the median time from
first visit to a definitive diagnosis regarding tubal patency was 4.1 months (5th-95th
percentile 1.1 to 15.8 months). Severe bilateral tubal pathology was diagnosed in
312 women, corresponding with a prevalence of 8.4%. In 369 women, unilateral
pathology was diagnosed, which results in an overall prevalence for combined
uni- or bilateral tubal disease of 18%.
The baseline characteristics of the 3716 couples are shown in Table 1. In total,
15% of data points were missing and subsequently imputed. The spline analyses
showed that the relation between most continuous variables and the probability of
both outcomes was approximately linear. A non-linear relationship was observed
23138 Coppus.indd 85 28-09-12 14:55
Chapter 5
86
for female age, with a marked increase in risk with age over 32 years. We therefore
redefined female age as two linear variables, one for female age below and one for
age above 32 years.
TABLE I. Baseline characteristics of the 3716 included couples
Mean/Median* 5th-95th percentile
Female age (years) 32.4 25.3 – 39.0
Male age (years) 34.9 27.5 – 44.6
Duration of child wish (years) 1.5* 0.7 – 4.3
Subfertility, primary (n) (%) 2312 62.2
Results of the univariable and multivariable analyses are summarized in Table
2. For any tubal pathology, the multivariable analysis showed that prolonged
duration of subfertility, a pregnancy of both partners or of only the male partner in
a previous relationship, male smoking habits, a history of PID, history of Chlamydia
infection, history of ectopic pregnancy and previous pelvic surgery were all factors
that increased the likelihood of tubal pathology. A history of a legally induced
abortion also contributed to a higher probability of tubal pathology, although this
parameter was not statistically significant at the 5% level (P=0.07).
For bilateral tubal pathology, multivariable analysis showed that referral by a
gynaecologist, female age above 32 years, dysmenorrhoea and smoking of the
woman increased the likelihood of disease, as did a longer duration of subfertility,
a pregnancy of both partners in a previous relationship, male smoking habits,
history of PID, and history of Chlamydia infection, ectopic pregnancy and previous
pelvic surgery. In contrast, a previous pregnancy in the current relationship and a
higher male age lowered the chances of bilateral tubal pathology.
The ‘any pathology’ model had a modest overfit of 4%, whereas the ‘bilateral
pathology’ model showed 9% overfit. To improve calibration of the probabilities in
future patients, the regression coefficients of the models were shrunk with these
factors. Both models discriminated modestly between diseased and non-diseased
with an AUC of 0.65 (95% CI: 0.63-0.68) for the ‘any pathology’ model and 0.68 (95%
CI: 0.65-0.71) for the ‘bilateral pathology’ model, respectively.
23138 Coppus.indd 86 28-09-12 14:55
Medical history models for tubal pathology
87
5
TAB
LE II. Results of the uni- and multivariable analysis
Variable
Any tub
al path
ology
Bilateral tub
al path
ology
Un
ivariable an
alysisM
ultivariable an
alysisU
nivariab
le analysis
Multivariab
le analysis
OR
95%C
IP value
OR
95% C
IP value
OR
95% C
IP value
OR
95% C
IP value
Referral by gynaecologist1.52
1.14-2.02 0.005
1.791.24-2.59
0.0021.47
1.00-2.170.05
Female age
o per year <
32o
per year > 32
1.00 1.04
0.97-1.041.01-1.07
0.980.02
1.021.04
0.97-1.071.00-1.09
0.410.06
1.061.01-1.12
0.02
Male age (per year)
1.000.98-1.02
0.980.98
0.97-1.010.25
0.960.94-0.99
0.004
Child w
ish (per year)1.11
1.06-1.18<
0.0011.09
1.03-1.160.002
1.181.11-1.26
<0.001
1.151.08-1.23
<0.001
Previous IUI treatm
ent0.91
0.57-1.460.70
0.910.47-1.76
0.79
Previously conceivedo
current relationo
only male partner in previous
relationo
only female partner in previous
relationo
both partners in previous relation
1.141.51
1.29
2.56
0.95-1.361.07-2.12
0.96-1.75
1.35-3.20
0.150.02
0.09
<0.001
1.59
2.01
1.12-2.27
1.30-3.12
0.01
0.002
0.761.47
1.34
2.87
0.59-1.000.93-2.33
0.89-2.00
1.79-4.60
0.050.10
0.25
<0.001
0.77
2.39
0.58-1.03
1.44-3.95
0.08
0.001
Dysm
enorrhoea1.10
0.92-1.310.29
1.351.06-1.71
0.011.29
1.01-1.650.04
Female sm
oking1.35
1.12-1.62<
0.0011.73
1.35-2.21<
0.0011.35
1.03-1.770.03
Male sm
oking1.36
1.15-1.62<
0.0011.22
1.02-1.460.03
1.641.30-2.07
<0.001
1.331.03-1.72
0.03
History of PID
2.901.96-4.28
<0.001
2.031.34-3.09
0.0013.90
2.50-6.08<
0.0012.66
1.65-4.29<
0.001
History of C
hlamydia
2.081.54-2.80
<0.001
1.673.59-10.2
<0.001
2.691.88-3.87
<0.001
1.901.28-2.82
0.002
History of ectopic pregnancy
9.73 5.9-16.0
<0.001
6.063.59-10.2
<0.001
3.171.80-5.60
<0.001
3.781.74-8.18
0.001
History of induced abortion
1.571.19-2.07
0.0021.34
0.98-1.830.07
1.511.04-2.21
0.03
History of m
iscarriage or life birth1.07
0.90-1.280.44
0.930.73-1.20
0.58
Pelvic surgery excluding Caesarean section3.19
2.57-3.98<
0.0012.23
1.79-2.79<
0.0011.87
1.41-2.48<
0.0011.40
1.02-1.920.004
TMC
(*10*10^6/l)
1.001.00-1.01
0.561.00
1.00-1.010.63
23138 Coppus.indd 87 28-09-12 14:55
Chapter 5
88
The calibration of the model for any tubal pathology is shown in Fig. 2A, and that
for bilateral tubal pathology in Fig. 2B. The calculated probabilities of bilateral tubal
pathology correlated well with the observed proportions of affected women, as all
points were located close to the line of equality (x = y), indicating good calibration
of the model. This was confirmed by the Hosmer and Lemeshow test statistic
(P=0.83). Because of the overestimation of the lowest probabilities for women
with any tubal pathology, the calibration of this model was less optimal (P=0.27).
However, the women with the highest likelihood of tubal disease could still be
distinguished adequately from women with an average probability.
obse
rved
frac
tion 1
- or 2
-side
d tub
al pa
tholo
gy
0,20
0,25
0,30
0,35
0,15
0,10
0,05
0,000,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35
calculated probability of 1- or 2-sided tubal pathology
obse
rved
frac
tion 2
-side
d tub
al pa
tholo
gy 0,20
0,15
0,10
0,05
0,000,00 0,05 0,10 0,15 0,20
calculated probability of 2-sided tubal pathology
FIGURE 2. Calibration plot of the decision rule for uni- or bilateral tubal pathology (left) or bilateral tubal pathology (right)
A summary score can be calculated based on the respective elements in a
couple’s medical history. By plotting this score, the probability of tubal pathology
can subsequently be read from a graph. The score charts for both models are
presented in Fig 3.
The consequences of the use of the clinical decision rules are shown in Table IIIa
and IIIb. For example, deferring tubal testing in case of a probability of bilateral tubal
pathology of less than 5% results in a 25% reduction in the number of invasive tests
procedures, while 90% of women with bilateral tubal disease are still detected. At
this cut-off level, 10 tests have to be performed to detect one case of bilateral tubal
23138 Coppus.indd 88 28-09-12 14:55
Medical history models for tubal pathology
89
5
disease. Increasing the cut-off to a higher pre-test probability of tubal pathology
at which tubal patency testing is performed will result in fewer procedures, but
also in a higher number of women in whom tubal testing is incorrectly postponed.
By planning tubal testing in women at highest risk for tubal pathology early, the
detection rate will increase while delaying testing in the majority of women correctly.
TABLE IIIA. Diagnostic consequence of use of decision rule for bilateral tubal pathology.
Strategy Number of women
undergoing invasive tubal
testing
Number (%) of tested
women with bilateral tubal
pathology
Reduction of number of invasive tubal tests
NND* Number (%) of women in whom tubal testing was
withhold correctly
Number (%) of women not tested
with bilateral pathology
No testing 0 0 3716 (100%)
3404 (91.6%) 312 (8.4%)
Test all with HSG or DLS
3716 (100%) 312 (8.4%) 0 12 0 0
Test if probability of bilateral tubal pathology >5%
2786 (75%) 279 (10%) 930 (25%) 10 897 (96.5%) 33 (3.5%)
Test if probability of bilateral tubal pathology >10%
801 (22%) 139 (17%) 2915 (78%) 6 2742 (94%) 173 (6%)
Test if probability of bilateral tubal pathology >15%
350 (9.4%) 79 (23%) 3366 (84%) 4 3133 (93%) 233 (7%)
*NND: Number of tubal tests needed to diagnose one woman with bilateral tubal disease
TABLE IIIB. Diagnostic consequence of use of decision rule for any tubal pathology.
Strategy Number of women
undergoing invasive
tubal testing
Number (%) of tested
women with any tubal pathology
Reduction of number of invasive tubal tests
NND* Number (%) of women in whom
tubal testing was withhold
correctly
Number (%) of women not tested with any
pathology
No testing 0 0 3716 (100%) 3035 (82%) 681 (18%)
Test all with HSG or DLS 3716 (100%) 681 (18%) 0 5.5 0 0
Test if probability of any tubal pathology >15%
1600 (42%) 422 (26%) 2116 (57%) 3.8 1857 (88%) 259 (12%)
Test if probability of any tubal pathology >20%
921 (25%) 306 (33%) 2795 (75%) 3 2420 (87%) 375 (13%)
Test if probability of any tubal pathology >25%
547 (15%) 208 (38%) 3169 (85%) 2.6 2696 (85%) 473 (15%)
*NND: Number of tubal tests needed to diagnose one woman with any tubal disease
23138 Coppus.indd 89 28-09-12 14:55
Chapter 5
90
Discussion
We developed two clinical decision rules to calculate the probability of any tubal
pathology as well as the probability of severe bilateral tubal pathology in subfertile
women. These rules were derived from the clinical characteristics of 3716 couples
who had been referred for evaluation of their subfertility in a large prospective
multicentre study in the Netherlands. Both models could discriminate moderately
well between women with and women without tubal pathology. Calibration
of the models was good, allowing distinction between women with a high-risk
profile and women with a low-risk profile.
The relative importance of discrimination and calibration depends on the clinical
application of a model (Mol et al., 2005). As our models are intended to counsel
women about the probability that tubal testing would reveal a cause for their
subfertility, the accuracy of the numeric probability (calibration) is relevant, less so
the fact of having tubal disease or not (discrimination).
A strength of this study is the national multicentre design, which enabled us to
include a large number of women. A possible disadvantage of the multicentre
nature is the potential for heterogeneity, due to differences in tubal testing
protocols between hospitals or because of variability in tubal pathology prevalence
between different areas of the country. Therefore we explored whether the centre
as such acted as a confounder for variables that were included in the models. To
do so we added centre as a covariate to the multivariate regression models. This
analysis did not alter the models, nor did it affect the strength of associations.
Data from medical history might suffer from a lack of reliability, and one cannot
exclude that certain items have been reported incorrectly. For example, due to
the design of the study, not all cases that reported a history of PID or Chlamydia
infection were verified by laparoscopy or culture. On the other hand, some women
may have reported incorrectly not having suffered an episode of PID.This will have
caused ‘false-positive’ or ‘false-negative’ histories. One should realise however
that these reports do also occur in daily clinical practice, where the information
from medical history is collected during the fertility workup on a routine basis, a
situation which is reflected by our prospective study.
From a study design perspective, ideally all women would have undergone visual
assessment of their Fallopian tubes. In the present study, 32% of eligible women
23138 Coppus.indd 90 28-09-12 14:55
Medical history models for tubal pathology
91
5
did not undergo tubal testing, as they conceived prior to tubal testing, or were
referred for treatment with IVF, thereby surpassing the need for tubal testing. As our
purpose was to examine the relation between medical history and outcome of tubal
assessment, this mechanism could have led to biased estimates of associations. To
examine the impact of this partial verification, we explored whether there were any
systematic differences between women that did and did not undergo tubal testing.
This was indeed the case, although the absolute differences were small. However,
studies in which the diagnosis is immediately verified in all women are in our opinion
hardly feasible, leaving the present approach as the most practical and possible one.
We chose to develop models for two distinct definitions of tubal pathology; one
in which any type of tubal pathology was considered abnormal, and one in which
only bilateral abnormalities were considered to be abnormal. The latter is in our
opinion clinically the one most relevant, as these women have virtually no chance of
conceiving either spontaneously or after intra-uterine insemination. From a clinical
point of view, it is undesirable that these women are exposed to a long period of
expectant management and they should be identified early in the diagnostic workup
and counselled for IVF. At present, the detection of unilateral tubal pathology is unlikely
to imply a major change in clinical management, unless the woman already has a poor
chance of achieving a treatment independent pregnancy (van der Steeg et al., 2007).
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0 5 10 15 20 25 30 35 40
Pred
icted prob
ability of abn
ormal tu
bal test
Diagnos6c index score
FIGURE 3. Score chart of the bilateral and any tubal pathology models to estimate the probability of tubal disease. (A) Calculation of the sumscore; (B) Probability of disease corresponding to this sum score. Upper line indicates the probability of any tubal pathology. Lower line indicates the probability of bilateral disease.
23138 Coppus.indd 91 28-09-12 14:55
Chapter 5
92
(A)
Scor
e b
ilate
ral m
odel
Scor
e un
i- o
r bila
tera
l mod
el
Wom
an’s
ag
e (y
r)<
3232
-24
34-3
6>
36
Bila
tera
l pat
holo
gy m
odel
00
23
……
……
……
……
….
Dur
atio
n o
f ch
ildw
ish
(yr)
<1
1-2
2-3
3-4
>4
Bila
tera
l pat
holo
gy m
odel
12
34
8…
……
……
……
……
.
Any
pat
holo
gy m
odel
01
23
5…
……
……
……
……
.
Mal
e ag
e (y
r)<
2525
-30
30-3
5>
35
Bila
tera
l pat
holo
gy m
odel
-8-1
0-1
1-1
3…
……
……
……
……
.
Refe
rral
sta
tus
(GP
or g
yn)
GP
Gyn
Bila
tera
l pat
holo
gy m
odel
04
……
……
……
……
….
Cou
ple
con
ceiv
ed in
pre
sent
rela
tion
No
Yes
Bila
tera
l pat
holo
gy m
odel
0-2
……
……
……
……
….
Bot
h c
once
ived
in p
rior
rela
tion
No
Yes
Bila
tera
l pat
holo
gy m
odel
08
……
……
……
……
….
Any
pat
holo
gy m
odel
07
……
……
……
……
….
On
ly m
an c
once
ived
in p
rior
rela
tion
No
Yes
Any
pat
holo
gy m
odel
04
……
……
……
……
….
Dys
men
orrh
oea
No
Yes
Any
pat
holo
gy m
odel
02
……
……
……
……
….
Wom
an s
mok
erN
oYe
s
Bila
tera
l pat
holo
gy m
odel
03
……
……
……
……
….
Mal
e sm
oker
No
Yes
Bila
tera
l pat
holo
gy m
odel
03
……
……
……
……
….
Any
pat
holo
gy m
odel
02
……
……
……
……
….
Prev
ious
sal
pin
giti
sN
o Ye
s
Bila
tera
l pat
holo
gy m
odel
09
……
……
……
……
….
Any
pat
holo
gy m
odel
07
……
……
……
……
….
23138 Coppus.indd 92 28-09-12 14:55
Medical history models for tubal pathology
93
5
Prev
ious
Ch
lam
ydia
infe
ctio
nN
o Ye
s
Bila
tera
l pat
holo
gy m
odel
06
……
……
……
……
….
Any
pat
holo
gy m
odel
05
……
……
……
……
….
His
tory
of e
xtra
-ute
rin
e p
regn
ancy
No
Yes
Bila
tera
l pat
holo
gy m
odel
010
……
……
……
……
….
Any
pat
holo
gy m
odel
017
……
……
……
……
….
His
tory
of l
egal
ly in
duc
ed a
bor
tion
No
Yes
Any
pat
holo
gy m
odel
03
……
……
……
……
….
His
tory
of p
elvi
c su
rger
yN
oYe
s
Bila
tera
l pat
holo
gy m
odel
03
……
……
……
……
….
Any
pat
holo
gy m
odel
08
……
……
……
……
….
Sum
of a
bov
e…
……
……
……
……
.…
……
……
……
……
.
+15
+0
Dia
gnos
tic
ind
ex s
core
……
……
……
……
….
……
……
……
……
….
Proc
edur
e: c
ircle
the
scor
es fo
r eac
h of
the
varia
bles
, tra
nsfe
r to
right
mos
t col
umn
and
add
to g
et th
e di
agno
stic
inde
x sc
ore.
Mar
k th
e sc
ore
in th
e ap
prop
riate
figu
re b
elow
in
ord
er to
read
off
on th
e Y-
axe
the
prob
abili
ty o
f (se
vere
) tub
al p
atho
logy
.
23138 Coppus.indd 93 28-09-12 14:55
Chapter 5
94
We limited the study to those women that underwent their first tubal patency test.
Couples with a history of previous tubal testing, previous reproductive surgery
or previous IVF are less likely to undergo tubal testing than couples without
such a history. We also excluded couples in whom the woman was anovulatory
or in whom the man had severe oligoasthenozoospermia requiring IVF-ICSI. The
rationale for excluding these couples was to prevent biased associations as in
these conditions tubal patency testing will not always be performed routinely.
To our surprise, we found male characteristics to be associated with the probability of
tubal subfertility. Although male age, male smoking habits, as well as a pregnancy in a
previous relationship cannot causally be related to female tubal damage, it is possible
that these male characteristics act as indicators for some unrecorded variables,
such as number of sexual partners or socio-economic status, or reflect lifestyle and
selection mechanisms. We performed sub-analyses in which only female factors were
evaluated, but these models had a significantly lower predictive performance than
the models in which data of the couple were included (data not shown). Furthermore,
the model was developed from data collected in over 30 centres, and we corrected
for a possible confounding cluster effects of male characteristics. However, it might
be possible that these associations do not hold in a different population.
There are two ways in which our decision rule can be used. A clinician can use the
paper score chart, which transforms the total sum score into a probability of tubal
disease. Alternatively, the clinician can use the regression formula, which can be
incorporated into software, i.e. used in an electronic patient file, or a handheld
device such as a PDA.
Preferences of subfertile women regarding the trade-off between the likelihood
of detecting tubal pathology and the risks and discomforts associated with tubal
testing are currently unknown; therefore, a clear cut-off level cannot be advised at
present. The decision rules presented here cannot be used to exclude subfertile
women from tubal patency testing overall, but allow one to estimate a probability
of disease in order to time tubal patency tests adequately. This means that in clinical
practice, tubal testing can be performed directly in case of a high probability of
tubal disease or postponed for several months in women with a low probability.
In conclusion, the present study provides two easy-to-use decision rules that can
accurately express the woman’s probability of (severe) tubal pathology at the
23138 Coppus.indd 94 28-09-12 14:55
Medical history models for tubal pathology
95
5
couple’s first consultation. They could be used to select women for tubal testing
more efficiently.
AppendixThe chance of tubal pathology can be calculated from the multivariable models with the formula: probability = 1/[1 + exp(-β)], where the β for the different models are:
‘any tubal pathology’ model: β = -2.086+ (0.085*duration of child wish) + (0.445*conceiving of only male partner prior relation) + (0.672*both partners conceived prior relation) + (0.193*male smoker) + (0.680*PID) + (0.494*Chlamydia infection) + (1.729*ectopic pregnancy) + (0.280*legally induced abortion) + (0.771*pelvic surgery).
‘bilateral tubal pathology’ model: β = -2.201 + (0.351*referral gynaecologist) + (0.054*female age above 32 years) + (0.129*duration of child wish) + (-0.235*conceived in present relation) + (0.792*both partners conceived prior relation) + (-0.034*male age) + (0.230*dysmenorrhoea) + (0.272*female smoker) + (0.259*male smoker) + (0.889*PID) + (0.583*Chlamydia infection) + (0.961*ectopic pregnancy) + (0.305*pelvic surgery).
23138 Coppus.indd 95 28-09-12 14:55
Chapter 5
96
References
Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007; 22: 1353-1358.
Evers JL. Female subfertility. Lancet 2002; 360:151-159.
Forsey JP, Caul EO, Paul ID, Hull MG. Chlamydia trachomatis, tubal disease and the incidence of symptomatic and asymptomatic infection following hysterosalpingography. Hum Reprod 1990; 5:444-447.
Harrell FE Jr, Lee KL, Pollock BG. Regression models in clinical studies: determining relationships between predictors and response. J Natl Cancer Inst 1988; 80:1198-1202.
Hubacher D, Grimes D, Lara-Ricalde R, de la Jarra J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor subfertility. Fertil Steril 2004; 81:6-10.
Liberty G, Gal M, Halevy-Shalem T, Michaelson-Cohen R, Galoyan N, Hyman J, Eldar-Geva T, Vatashky E, Margalioth E. Lidocaine-Prilocaine (EMLA) cream as analgesia for hysterosalpingography: a prospective, randomized, controlled, double blinded study. Hum Reprod 2007; 22:1335-1339.
Mol BW, Coppus SF, Van der Veen F, Bossuyt PM. Evaluating predictors for the outcome of assisted reproductive technology: ROC-curves are misleading; calibration is not! Fertil Steril 2005; 84;s253–s254 (Abstract).
National Institute for Clinical Excellence (2004) Guideline CG11: Fertility: assessment and treatment for people with fertility problems. (http://www.nice.org.uk/download.aspx?o=CG011fullguideline)
Nederlandse Vereniging voor Obstetrie en Obstetrie (2004) Orienterend fertiliteitsonderzoek [NVOG guideline diagnostic fertility workup]
Steyerberg EW, Eijkemans MJ, Harrell FE Jr., Habbema JD. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000; 19:1059-1079.
Steyerberg EW, Harrell FE Jr., Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001; 54:774-781.
Van der Heijden GJ, Donders AR, Stijnen T, Moons KG. Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 2006; 59:1102-1109.
Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile. Hum Reprod 2007; 22:536-342.
23138 Coppus.indd 96 28-09-12 14:55
23138 Coppus.indd 97 28-09-12 14:55
23138 Coppus.indd 98 28-09-12 14:55
6The capacity of
hysterosalpingography and
laparoscopy to predict natural
conception.
Harold VerhoeveSjors Coppus
Jan Willem van der SteegPieternel Steures
Peter HompesPetra BourdrezPatrick Bossuyt
Fulco van der VeenBen Willem Mol
Human Reproduction 2011, 26: 134-142
The contents of this article were presented as a poster presentation at the 63nd ASRM Annual Meeting, 2007, Washington, USA.
23138 Coppus.indd 99 28-09-12 14:55
Chapter 6
100
Abstract
BACKGROUND: Laparoscopy has been claimed to be superior to
hysterosalpingography (HSG) in predicting fertility. Whether this conclusion
is applicable to a general subfertile population can be questioned as data in
support of this claim were collected in third line centres. The aim of this study
was to assess the prognostic capacity of HSG and laparoscopy in a general
subfertile population.
METHODS: In 38 centres, we prospectively studied a cohort of patients
referred for subfertility between 2002 and 2004, who underwent HSG and/or
laparoscopy as part of their subfertility work-up. Follow-up started immediately
after tubal testing and ended 12 months thereafter. Time to pregnancy was
censored at the date of last contact, when the woman was not pregnant or at
the start of treatment. Kaplan-Meier curves for the occurrence of spontaneous
intrauterine pregnancy were constructed for patients without tubal pathology,
for those with unilateral tubal pathology and for patients with bilateral tubal
pathology at HSG or laparoscopy. Multivariable Cox regression analysis was
used to calculate fecundity rate ratios (FRRs) to express associations between
tubal pathology and the occurrence of an intrauterine pregnancy.
RESULTS: Of the 3301 included patients, 2043 underwent HSG only, 747
underwent diagnostic laparoscopy only and 511 underwent both. At HSG, 322
(13%) patients showed unilateral tubal pathology and 135 (5%) showed bilateral
tubal pathology. At laparoscopy, 167 (13%) showed unilateral tubal pathology
and 215 (17%) showed bilateral tubal pathology. Multivariable analysis resulted
in FRRs of 0.81 (95% confidence interval (CI): 0.59-1.1) for unilateral, and 0.28
(95% CI: 0.13-0.59) for bilateral, tubal pathology at HSG. The FRRs at laparoscopy
were 0.85 (95% CI: 0.47-1.52) for unilateral, and 0.24 (95% CI: 0.11-0.54) for
bilateral, tubal pathology.
CONCLUSIONS: Patients with unilateral tubal pathology at HSG or laparoscopy
had a moderate reduction in pregnancy chances, whereas those with bilateral
tubal pathology at HSG or laparoscopy had a severe reduction in pregnancy
chances. This reduction was similar for HSG and laparoscopy, suggesting
that HSG and laparoscopy have a comparable predictive capacity for natural
conception.
23138 Coppus.indd 100 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
101
6
Introduction
The prevalence of tubal pathology in subfertile populations varies between 11 and
30% and depends on whether one deals with a population in a primary, secondary
or tertiary setting, with the lowest prevalence in a primary care setting (Hull et al.,
1985; Collins et al., 1995; Snick et al., 1997). Chlamydia trachomatis is a major cause
for tubal pathology. Chlamydia infection is often asymptomatic and untreated
Chlamydia can progress to pelvic inflammatory disease (PID), infertility, ectopic
pregnancy, and chronic pelvic pain. An increase in the prevalence of Chlamydia
infections has been observed in women in their reproductive phase of life (Rekart
and Brunham, 2008) and hence tubal pathology remains an important cause of
subfertility.
A number of diagnostic tests are being used in clinical practice to assess tubal
patency as part of the work-up for subfertility. The most commonly used tests
are hysterosalpingography (HSG) and laparoscopy. HSG and Chlamydia Antibody
Titer test (CAT) are often seen as screening tests for the presence of tubal
pathology (den Hartog et al., 2008), whereas laparoscopy is considered the clinical
reference test for diagnosing tubal pathology (Swart et al.,1995; Mol et al., 1999b).
Laparoscopy allows visualization of peri-adnexal adhesions and the presence of
endometriosis, which cannot be done with HSG. Because of their invasive nature,
HSG and laparoscopy are usually reserved as the last investigation in the fertility
work-up. The findings at these diagnostic tests should translate into a prognosis
of natural pregnancy chances, which can be used to counsel patients on whether
their pregnancy chances can be improved by surgery, IUI or IVF.
The relative merits of HSG and laparoscopy for assessing tubal status have
been discussed for many years, reflecting a lack of agreement amongst fertility
subspecialists on which diagnostic tests have to be performed and their prognostic
utility (Helmerhorst et al., 1995; Balash, 2000; Fatum et al., 2002; Lavy et al., 2004;
Perquin et al., 2006).
Some studies have shown that patients with unilateral tubal pathology at HSG do
not have reduced chances of a treatment-independent pregnancy, in contrast to
those with bilateral tubal pathology (Mol et al., 1997). For diagnostic laparoscopy
(DLS) and dye, a moderate reduction of natural pregnancy chances in case
23138 Coppus.indd 101 28-09-12 14:55
Chapter 6
102
of unilateral tubal pathology and a severe reduction in case of bilateral tubal
pathology have been observed (Mol et al., 1999a). However these studies were
retrospective in design, relatively small and had included patients visiting a tertiary
referral hospital. In another study, laparoscopy was found to be a better predictor
of future fertility than HSG (Mol et al., 1999b). This study analysed data of couples
who underwent both HSG and laparoscopy and also consisted of a tertiary care
population. The conclusion that laparoscopy is a better predictor of infertility than
HSG was weakened by the fact that the median interval to laparoscopy after a
normal HSG was 10 months, compared with 4.5 months for women in whom the
HSG showed two-sided tubal abnormalities.
It is not clear if the results of these studies are applicable to the general subfertile
population. The purpose of the study presented here was to evaluate the impact
of unilateral and bilateral tubal pathology at HSG and laparoscopy on treatment-
independent pregnancy rates in a large prospective cohort of subfertile ovulatory
women from mixed secondary and tertiary hospital population. In particular, we
evaluated whether a difference in prognostic capacity exists between HSG and
laparoscopy.
Materials and Methods
Between January 2002 and February 2004, consecutive couples presenting at
the fertility clinic of 38 centres in The Netherlands were asked to participate in
a prospective cohort study. The study was approved by the Institutional Review
Board in each institution. All couples underwent a basic fertility work-up according
to the guidelines of the Dutch Society of Obstetrics and Gynaecology. The details
of this work-up have been described previously (van der Steeg et al., 2007).
PatientsThe present analysis was limited to couples with a regular ovulatory cycle, defined
as a cycle length between 23 and 35 days, with a within cycle variation of less
than 8 days. None of the patients analysed used Clomiphene Citrate. Ovulation
was detected by a basal body temperature chart, midluteal serum progesterone,
or by ultrasonographic monitoring of the cycle. Couples with a history of reversal
of sterilization, tubal surgery, IVF, or previous tubal patency testing were excluded.
23138 Coppus.indd 102 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
103
6
Only those women who underwent HSG and/or laparoscopy as part of their
fertility work-up were included in the analysis presented here.
Duration of subfertility was defined as the period between the time the couple had
started trying to conceive and the time of tubal testing. Female age was calculated
at the time of HSG or laparoscopy. Subfertility was considered to be secondary
if a woman had conceived in this or in a previous partnership, regardless of the
pregnancy outcome.
In all male partners, at least one semen analysis was performed. The total motile
sperm count (TMC) was calculated by multiplying semen volume, sperm
concentration and percentage of motile spermatozoa. Couples in whom semen
analysis showed a severe impairment of semen quality requiring IVF-ICSI (defined
as a total motile count < 1*106), were also excluded from the present analysis.
Tubal testing and follow-upClinics were free to use their own tubal testing protocol. In general, three different
protocols could be distinguished. With the first strategy, tubal patency was
routinely evaluated early in the work-up with either HSG or laparoscopy. In case
of abnormal findings at HSG, a laparoscopy was planned to verify these findings.
With the second strategy, tubal pathology was considered to be absent in case
of a negative CAT and HSG or laparoscopy was only performed in CAT-positive
women. With the third strategy, CAT-negative women were evaluated with HSG,
while laparoscopy was planned in CAT-positive women.
HSG was performed in the follicular phase of the menstrual cycle with either a
water-soluble or oil-based contrast medium. Patients were in a supine position
during the whole procedure. X-ray photographs were taken. Spasmolytic
drugs were allowed to be used. Each HSG was evaluated by a fertility specialist.
Laparoscopy was performed with a double-puncture technique. Methylene blue
was injected at room temperature through a Foley catheter in the uterine cavity.
The amount of methylene blue injected was variable, depending on the time
necessary to assess tubal function. Findings at HSG were classified as no tubal
occlusion, one-sided tubal occlusion or two-sided tubal occlusion and impaired
flow of contrast if contrast was not shown beyond the isthmic portion of the tube.
Findings at laparoscopy were classified as normal, one-sided tubal occlusion or
23138 Coppus.indd 103 28-09-12 14:55
Chapter 6
104
two-sided tubal occlusion and proximal- or distal tubal occlusion. Additional tubal
pathology at laparoscopy, i.e. adhesions disturbing ovum pick-up were classified
separately.
Follow-up started after tubal testing and ended 12 months thereafter. A model was
used to calculate the chances of natural conception (Hunault et al., 2004). Expectant
management was advised if the fertility work-up showed no abnormalities and
couples had a probability of natural conception in the next 12 months of 40% or
higher. Treatment was generally advised to those with a probability below 30%.
For all women lost to follow-up, the general practitioner was sent a questionnaire
and asked about the fertility status of the couple.
AnalysisThe primary end-point in this study was a spontaneous ongoing pregnancy at
12 weeks of gestational age, confirmed by ultrasonography. The first day of the
last menstrual cycle was considered to mark the end of time until spontaneous
conception. Time to pregnancy was censored at the moment treatment (IUI, IVF
or tubal surgery) started within 12 months after counseling, or at the last date of
contact during follow-up, when the couple had no ongoing pregnancy.
To examine if proximal occlusions identified at HSG or laparoscopy had a different
prognostic effect on treatment-independent pregnancy than distal occlusion /
hydrosalpinx or hydrosalpinges, ongoing pregnancy rates were scored in relation
to the findings of tubal testing at HSG and laparoscopy. We classified the findings
at HSG and laparoscopy into three groups and constructed Kaplan-Meier curves
for each i.e. for women without tubal pathology, for those with one-sided tubal
pathology and for women with two-sided tubal pathology, irrespective of their
nature. Subsequently, adjusted fecundity rate ratios (FRRs) and 95% confidence
intervals (CIs) were calculated through multivariable Cox regression modeling.
These FRRs express the instantaneous probability of an ongoing pregnancy for
women with a particular feature relative to the probability in those without that
feature. All analyses were stratified according to centre, to adjust for any potential
clustering effects of type of tubal work-up and for differences in prognostic profile
of women presenting to the participating clinics (Harrell, 2001).
23138 Coppus.indd 104 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
105
6
In case a woman had been evaluated both with HSG and DLS, the results of these
were evaluated separately, i.e. the result of the respective tubal test was used in
the analysis, not the definitive diagnosis regarding tubal pathology in an individual
woman. In those women who underwent both HSG and laparoscopy, we classified
tubal pathology as no abnormalities, unilateral pathology and bilateral pathology
and subsequently constructed 3 x 3 tables to compare tubal occlusion detected at
HSG with occlusion detected at laparoscopy.
Results
Data of 7860 couples were collected. Of these, 938 women (12%) did not have a
regular ovulatory cycle, 636 men (8%) had severely impaired semen quality and
196 couples (2.5%) had previously undergone fertility surgery or IVF. In another
568 women (7%) HSG or laparoscopy had been performed in a previous episode
of subfertility. A further 2221 women (28%) did not undergo HSG or DLS as part of
their fertility work-up, leaving 3301 women (42%) for this analysis (Fig. 1).
The baseline characteristics of the 3301 couples are shown in Table I. Median
duration of subfertility at HSG was 1.8 years (5th-95th percentiles: 1.0-4.3 years) and
at laparoscopy 2.2 years (5th-95th percentiles: 1.2-5.0 years). The median time from
the first visit to HSG was 3.0 months (5th-95th percentiles: 1-11.4), the median time
from the first visit to laparoscopy was 5.9 months (5th-95th percentile: 1.4-16.8). Of
the 3301 women in the analysis, 2043 underwent HSG, 747 women
underwent DLS and in 511 women the tubal status was evaluated with both.
In 1626 of the HSGs (64%) a water-based contrast medium was used and in 359
of the HSG’s (14%) an oil-based contrast medium was used; in 569 (22%) it was
unknown which medium was used. The findings at HSG and laparoscopy and
occurrence of ongoing pregnancies are shown in Tables II and III. The data we
collected allowed us to make a distinction between proximal and distal occlusion/
hydrosalpinx for laparoscopy. In the collected data for HSG, the only distinction we
were allowed to make was between no occlusion, occlusion and impaired flow of
contrast, by which was meant the absence of flow of contrast beyond the isthmic
portion of the fallopian tube. No distinction could be made between proximal and
distal occlusion.
23138 Coppus.indd 105 28-09-12 14:55
Chapter 6
106
TABLE I. Baseline characteristics of 3301 couples at time of procedure
HSG (n=2554) Diagnostic laparoscopy (n=1258)
Mean/Median* 5th-95th percentiles Mean/Median* 5th-95th percentiles
Male age (years) 35.4 28.0 – 45.0 35.1 27.6 – 44.8
Duration of subfertility (years) 1.8* 1.0 – 4.3 2.2* 1.2 – 5.0
Semen analysis – TMC (106) 57.6* 4.0 – 308 59.5* 4.8 – 274
Subfertility, primary (n) (%)Time to tubal test (months) †
15683.0*
61.41.0- 11.4
8065.9*
64.11.4- 16.8
TMC, total motile sperm count.*Value is the median.† time from first consultation to tubal testing
Of the women who had a HSG, 186 (7%) were lost to follow-up and of those who
had a laparoscopy 88 (7%) were lost to follow-up. A natural conception within one
year occurred in 454 (18%) of the women after HSG, in 126 (10%) after laparoscopy
and in 48 (9.4%) of those who underwent both HSG and laparoscopy, which would
correspond with cumulative treatment-independent pregnancy rates of 33% after
HSG and 23% after laparoscopy (Fig. 2). The number of patients, who were still
under study at 6 and 12 months, respectively was 800 and 240 after HSG. For
laparoscopy these numbers were 243 and 81, respectively. In those with unilateral
tubal pathology, these numbers were 145 and 57 after HSG and 46 and 14 after
laparoscopy. For patients with bilateral tubal pathology, the numbers remaining
for analysis at 6 and 12 months were 68 and 20 after HSG in comparison to 74 and
24 after laparoscopy.
TABLE II. Fertility outcome after HSG
Findings at HSG Frequency (%) Number of ongoing
pregnancies
FRR
Bilateral normal 2097 (82%) 350 1
Unilateral impaired flow of contrast * 84(3%) 10 0.68 (0.36-1.3)
Unilateral occlusion 238 (9%) 38 0.89 (0.63-1.3)
Unilateral occlusion, contra-lateral impaired flow of contrast*
25 (1%) 2 0.36 (0.09-1.5)
Bilateral impaired flow of contrast* 51 (2%) 2 0.19 (0.05-0.80)
Bilateral occlusion 59 (2%) 3 0.26 (0.08-0.80)
FFR = Fecundity Rate Ratio*Impaired flow meaning no flow of contrast beyond the isthmic portion of the tube.
In 511 patients both HSG and laparoscopy were performed. Tubal status detected
at HSG compared to tubal status detected at laparoscopy is shown in table IV. HSG
showed one-sided tubal occlusion in 153 (30%) and two-sided tubal occlusion
23138 Coppus.indd 106 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
107
6
in 82 (16%) of these patients. Laparoscopy showed one-sided tubal occlusion in
79 (15%) and two-sided tubal occlusion in 56 (11%) of these patients. In those
patients where HSG showed one-sided occlusion, laparoscopy revealed no
occlusion in 60%, whereas if HSG showed two-sided occlusion, laparoscopy
revealed no occlusion in 44%. If laparoscopy showed one-sided occlusion, HSG
revealed no occlusion in 22% of these patients and if laparoscopy showed two-
sided occlusion, HSG showed no occlusion in 23% of cases.
TABLE III. Fertility outcome after laparoscopy.
Findings at laparoscopy Frequency (%)
Number of ongoing pregnancies
FRR
Bilateral normal 876 (70%) 90 1
Unilateral proximal occlusion 126 (10%) 12 0.58 (0.44-1.7)
Unilateral hydrosalpinx/distal occlusion 25 (2%) 3 1.40 (0.43-4.5)
Unilateral proximal occlusion, contralateral hydrosalpinx/distal occlusion
21 (2%) 3 1.83 (0.56-6.0)
Bilateral proximal occlusion 114 (9%) 3 0.19 (0.06-0.62)
Bilateral hydrosalpinx/distal occlusion 23 (2%) 1 0.34 (0.05-2.5)
Unilateral proximal occlusion/ hydrosalpinx/distal occlusion, contra-lateral adhesions
32 (3%) 1 0.58 (0.08-4.4)
Bilateral patent, unilateral adhesions 16 (1%) 0 0 (0-NE)
Bilateral patent, bilateral adhesions 25 (2%) 0 0 (0-NE)
FFR = Fecundity Rate RatioNE = not estimable
After HSG, the presence of unilateral impaired flow and unilateral occlusion
showed no significant difference in FRR (0.68 and 0.89, respectively, with wide and
overlapping CIs). Similarly, a small difference in FRR (0.19 and 0.26) between bilateral
impaired flow and occlusion (which included proximal and distal occlusion) was
seen (table II). For this reason impaired flow of contrast was considered as complete
occlusion, because this did not prove tubal patency.
With laparoscopy, we found no differences in prognostic outcome for proximal
occlusions identified at laparoscopy compared with distal occlusion / hydrosalpinx
or hydrosalpinges. For unilateral proximal occlusion the FRR was 0.58 and for distal
unilateral occlusion 1.40, but both showed wide and overlapping CIs. For bilateral
proximal occlusion the FRR was 0.19 compared to 0.34 in case of bilateral distal
occlusion, with a wide and overlapping CI in the latter (Table III). We therefore
classified tubal pathology into three main categories, namely no tubal pathology,
unilateral tubal pathology and bilateral tubal pathology.
23138 Coppus.indd 107 28-09-12 14:55
Chapter 6
108
Using this classification, the results of the uni- and multivariable Cox regression
analysis are shown in Table V. At HSG the prevalence of one-sided tubal pathology
was 13% (322 women), and of bilateral tubal pathology 5.3% (135 women). At
laparoscopy, the prevalence of one-sided tubal pathology was 13% (167 women)
and of bilateral tubal pathology 17% (215 women).
TABLE IV. Tubal status detected at HSG as compared to the tubal status detected at laparoscopy.
Laparoscopy Two-sided occlusion One-sided occlusion No occlusion Total
HSG
Two-sided occlusion 31 15 36 82
One-sided occlusion 12 47 94 153
No occlusion 13 17 246 276
Total 56 79 376 511
Bilateral tubal pathology at HSG or laparoscopy was associated with a lower
probability of treatment-independent pregnancy, with an adjusted FRR of 0.28
(95% CI: 0.13-0.59) for HSG and 0.24 (95% CI: 0.11-0.54) for laparoscopy. Unilateral
tubal pathology at HSG or laparoscopy affected the probability of treatment-
independent pregnancy slightly, with an adjusted FRR of 0.81 (95% CI: 0.59-1.11)
for HSG and 0.85 (95% CI: 0.47-1.52) for laparoscopy. A comparable reduction in
the probability of treatment-independent pregnancy was seen for duration of
subfertility with a FRR of 0.90 (95% CI: 0.90 to 1.00) and 0.78 (95% CI: 0.64 to 0.95)
for HSG and laparoscopy respectively. Kaplan-Meier analyses of the cumulative
probability of spontaneous intrauterine pregnancy up to I year are shown in Fig. 2a
for HSG and Fig. 2b for laparoscopy. For HSG the cumulative pregnancy rate after
finding unilateral tubal pathology was slightly reduced, but after bilateral tubal
pathology the cumulative pregnancy rate was markedly reduced. Similar results
were seen for laparoscopy.
23138 Coppus.indd 108 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
109
6
cum
ulat
ive
ongo
ing
pre
gnac
y ra
te (%
)40
30
20
10
0 0 2 4
time after HSG (months)6 8 10 12
FIGURE 2A. Cumulative ongoing pregnancy rate after HSG.
Results of the Kaplan-Meier analysis for HSG. Black line represents bilateral normal HSG. Light grey line represents one-sided abnormal HSG. Dark grey line represents women with a bilateral abnormal HSG. Crosses indicate censored data.
cum
ulat
ive
ongo
ing
pre
gnac
y ra
te (%
)
40
30
20
10
0 0 2 4
time after laparoscopy (months)6 8 10 12
FIGURE 2B. Cumulative ongoing pregnancy rate after laparoscopy.
Results of the Kaplan-Meier analysis for laparoscopy (DLS). Black line represents bilateral normal DLS. light grey line represents one-sided abnormal DLS. Dark grey line represents women with a bilateral abnormal DLS. Crosses indicate censored data.
23138 Coppus.indd 109 28-09-12 14:55
Chapter 6
110
TABLE V. Results of the uni- and multivariable Cox regression analysis
N (%)
Univariable analysis Multivariable analysis
HSG HSG
FRR 95% CI P-value FRR 95% CI P-value
No tubal pathology 2097 (82) 1 - - 1 - -
Unilateral tubal pathology
322 (13) 0.83 0.61-1.13 0.25 0.81 0.59-1.11 0.19
Bilateral tubal pathology
135 (5) 0.25 0.12-0.54 <0.001 0.28 0.13-0.59 0.001
Duration of subfertility (years)
0.89 0.89-0.99 0.02 0.90 0.90-1.00 0.05
Female age (years) 0.95 0.93-0.98 <0.001 0.95 0.93-0.97 <0.001
Secondary subfertility 1.41 1.15-1.72 <0.001 1.53 1.24-1.88 <0.001
N (%)
Laparoscopy Laparoscopy
FRR 95% CI P-value FRR 95% CI P-value
No tubal pathology 876 (70) 1 - - 1 - -
Unilateral tubal pathology
167 (13) 0.90 0.51-1.61 0.73 0.85 0.47-1.52 0.58
Bilateral tubal pathology
215 (17) 0.26 0.12-0.57 0.001 0.24 0.11-0.54 0.001
Duration of subfertility (years)
0.80 0.66-0.97 0.03 0.78 0.64-0.95 0.02
Female age (years) 1.02 0.97-1.08 0.37 1.01 0.96-1.07 0.59
Secondary subfertility 1.66 1.13-2.46 0.01 1.78 1.19-2.66 0.005
Discussion
In this study we compared findings at HSG and laparoscopy with fertility outcome
in untreated subfertile couples, from the time of tubal testing up to I year of follow-
up. For HSG as well as for laparoscopy, patients with two-sided tubal pathology
had significantly worse fertility prospects, whereas fertility prospects in those
with one-sided tubal pathology were only moderately worse than those without
tubal pathology. Both imaging tests showed a comparable reduction in FRRs for
unilateral as well as for bilateral tubal pathology.
Because proximal tubal pathology (occlusion as well as impaired flow) is more
likely to be due to artefacts, such as tubal spasm or ’steal effect’ than distal
tubal occlusion or hydrosalpinx, we analysed whether there was a difference in
FRR between proximal and distal tubal pathology for the detection of natural
pregnancy, but we were unable to detect such a difference. For this reason we
classified the findings at HSG and laparoscopy into three main groups, namely
23138 Coppus.indd 110 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
111
6
no tubal pathology, unilateral tubal pathology and bilateral tubal pathology to
calculate the FRRs.
The fertility rate ratios for two-sided tubal pathology are in accordance with results
from our previous small retrospective Dutch cohort study, which examined the
prognostic value of HSG for fertility outcome, as well as with data of the previous
large prospective Canadian study (Mol et al., 1997, 1999b). However, in the present
study we found a lower FRR in case of bilateral tubal pathology at HSG, compared
with the Canadian cohort. This could be due to the lower prevalence of bilateral
tubal pathology at HSG (5% compared to 24% in the Canadian study) whereas
the prevalence of unilateral tubal pathology at HSG was comparable (13% versus
14%). This might be explained by the fact that in the Canadian study only patients
that underwent both HSG and laparoscopy were analysed. Patients who had an
abnormal HSG but had a natural conception before a laparoscopy was performed
were not part of the analysis. Since natural conception indicates absence of
significant tubal pathology, a selection of patients with a higher prevalence of
tubal pathology took place in this study. Although the present study showed a
similarly low FRR for bilateral tubal pathology at laparoscopy, the prognostic
value of unilateral tubal pathology identified at laparoscopy (FRR 0.85) differed
markedly from that in our previous two studies, which showed a FRR of 0.65 and
0.51 respectively (Mol et al., 1999a,b). Both studies included only women from a
tertiary patient population who had a HSG followed by laparoscopy. Laparoscopy
was usually performed shortly after an abnormal HSG, whereas for women with a
normal HSG the laparoscopy was withheld for a longer time. In the present study,
15% of the patients underwent both HSG and laparoscopy. The median time
between the first visit of a couple and laparoscopy was 5.9 months compared to a
median time in the Canadian study between HSG and laparoscopy of 10 months in
those in whom HSG showed no abnormalities, and 8.5 months in those in whom
HSG showed unilateral tubal occlusion and 4.5 months in those in whom HSG
showed bilateral tubal occlusion, respectively.
In that study there were relatively few women in whom bilateral occlusion was
found at laparoscopy after a normal HSG or one-sided tubal occlusion at HSG (5%)
(Mol et al., 1999b). The longer delay between HSG and laparoscopy in the Canadian
study resulted in a higher proportion of patients with poor prognosis, which
could have negatively influenced the FRR. Selection bias may have influenced
23138 Coppus.indd 111 28-09-12 14:55
Chapter 6
112
the outcome of these previous studies, overestimating the detection rate of tubal
pathology at laparoscopy in comparison to patients who only undergo HSG or a
laparoscopy without a previous HSG.
Of notice is the lack of agreement between HSG and laparoscopy in those patients
who underwent both imaging tests. At laparoscopy bilateral tubal occlusion was
diagnosed in 5% of the patients where previously HSG showed no occlusion.
However, in 60% of the patients where HSG showed one-sided occlusion, and in
44% where HSG showed two-sided occlusion, laparoscopy did not show any tubal
occlusion. In 23% of the patients with two-sided occlusion at laparoscopy, HSG
showed tubal patency, emphasising that laparoscopy cannot be considered the
gold standard for diagnosing tubal pathology (Mol et al. 1999b).
Our finding that the presence of tubal pathology at HSG and laparoscopy has
a similar FRR is of interest. Unilateral tubal pathology reduced the probability of
a natural conception by 20% and bilateral tubal pathology reduced pregnancy
chances by 75%, whether diagnosed at HSG or at laparoscopy. Although we
were not able to directly compare the prognostic capacity of both imaging tests,
because only 15% of the included patients underwent HSG as well as laparoscopy
and as there were several months in between these tests, our findings suggest
that the prognostic capacity of HSG and that of laparoscopy do not differ much.
This is in contrast with previous studies, which concluded that laparoscopy is a
better predictor of treatment-independent pregnancy (Mol et al., 1999b).
The results of our study are in support of the recommendation of the fertility-
guidelines of the National Institute for Clinical Excellence to use HSG to test for
tubal pathology in women who are not known to have co-morbidities (NICE,
2004). Additional advantages of HSG are that a normal HSG reduces the probability
that tubal pathology plays a role in future fertility chances, as we can see in this
study, and which confirms the findings of previous studies (Swart et al., 1995, Mol
et al., 1999b, den Hartog et al., 2008). If an oil-soluble contrast medium is used for
tubal flushing, this can have a positive effect on pregnancy rates (Luttjeboer et
al., 2007). Using HSG in women at low risk for tubal pathology limits the number
of unnecessary laparoscopies because the prevalence of tubal pathology after
normal HSG is low (Bosteels et al., 2007, den Hartog et al., 2008). A randomised
trial did not show an improved pregnancy outcome if diagnostic laparoscopy
23138 Coppus.indd 112 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
113
6
was routinely performed after normal HSG and before treatment with intrauterine
insemination (Tanahatoe et al., 2005).
An important limitation of our study is that not in every patient HSG as well
as laparoscopy was performed. Patients at low risk for tubal pathology were
offered HSG and those with co-morbidities and considered to be at risk for tubal
pathology, were offered laparoscopy. Furthermore, the interpretation of this study,
like previous studies, is hampered by selection bias caused by a difference in
timing of the imaging tests. It is not clear to what extent these limitations influence
the outcome of our study. Correction for these requires a study in which women
are offered CAT, HSG and laparoscopy as part of their subfertility work-up. These
tests should preferably be performed on the same day, or with a minimal delay in
between these tests. To address the clinical consequences of mild tubal pathology,
patients with mild tubal pathology should then be randomized to compare
natural conception with IUI and IVF. Until such a trial has been conducted, the
recommendations regarding expectant management versus treatment, in case of
unilateral pathology, depends on several factors such as age of the woman and
duration of subfertility. In patients with unexplained subfertility, the prognostic
model by Hunault et al. (2004) following tubal testing is advised. The most
important prognostic factors are age of the woman, duration of subfertility
and whether or not the couple has ever had a pregnancy before consultation.
If the chances for natural conception are above 30%, expectant management is
generally recommended. Below 30%, treatment is advised (Steures et al., 2006).
23138 Coppus.indd 113 28-09-12 14:55
Chapter 6
114
References
Balasch J. Investigation of the infertile couple: investigation of the infertile couple in the era of assisted reproductive technology: a time for reappraisal. Hum Reprod 2000; 15: 2251-2257.
Bosteels J, Van Herendael B, Weyers S, D’Hooghe T. The position of diagnostic laparoscopy in current fertility practice. Hum Reprod Update 2007; 13:477-485.
Collins JA, Burrows EA, Wilan AR. The prognosis for live birth among untreated infertile couples. Fertil Steril 1995; 64:22-28.
Den Hartog JE, Lardenoije CM, Severens JL, Land JA, Evers JL, Kessels AG. Screening strategies for tubal factor subfertility. Hum Reprod 2008; 23:1840-1848.
Fatum M, Laufer N, Simon A. Investigation of the infertile couple: should diagnostic laparoscopy be performed after normal hysterosalpingography in treating infertility suspected to be of unknown origin? Hum Reprod 2002; 17:1-3.
Harrell FE jr. Regression Modeling Strategies. With Applications to Linear Models, Logistic Regression and Survival Analysis. Springer Series in Statistics. New York, USA: Springer Science Inc, 2001.
Helmerhorst FM, Oei SG, Bloemenkamp KW and Keirse MJ. Consistancy and variation in infertility investigations in Europe. Hum Reprod 1995; 10:2027-2030.
Hull MG, Glazener CM, Kelly NJ, Conway DI, Foster PA, Hinton RA, Coulson C, Lambert PA, Watt EM, Desai KM. Population study of causes, treatment, and outcome of infertility. Br Med J (Clin Res Ed) 1985; 291:1693-1697.
Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, te Velde ER. Two new prediction rules for spontaneous pregnancy leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004; 19:2019-2026.
Luttjeboer FY, Harada T, Hughes E, Johnson N, Lilford R, Mol BW. Tubal flushing for subfertility. Cochrane Database Syst Rev 2007; 3:CD003718.
Lavy Y, Lev-Sagie A, Holtzer H, Revel A, Hurwitz A. Should laparoscopy be a mandatory component of the infertility evaluation in infertile women with normal hysterosalpingogram or suspected unilateral distal tubal pathology. Eur J Obstet Gynecol Reprod Biol 2004; 114:64-68.
Mol BW, Swart P, Bossuyt PM, van der Veen F. Is hysterosalpingography an important tool in predicting fertility outcome? Fertil Steril 1997; 67:663-669.
Mol BW, Swart P, Bossuyt PM, van der Veen F. Prognostic significance of diagnostic laparoscopy for spontaneous fertility. J Reprod Med 1999a; 44:81-86.
Mol BW, Collins JA, Burrows EA, Van der Veen F, Bossuyt PM. Comparison of hysterosalpingography and laparoscopy in predicting fertility outcome. Hum Reprod 1999b; 14: 1237-1242.
National Institute for Clinical Excellence. Fertility: Assessment and Treatment for People with Fertility Problems. http://www.nice.org.uk, 2004
Perquin DA, Dorr PJ, de Craen AJ, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomised controlled trial. Hum Reprod 2006. 21: 1227-1231.
Rekart ML, Brunham RC. Epidemiology of chlamydial infection: are we losing ground? Sex Transm Infect 2008; 84:87-91.
23138 Coppus.indd 114 28-09-12 14:55
Prognostic significance of HSG and laparoscopy
115
6
Snick HK, Snick TS, Evers JL, Collins JA. The spontaneous pregnancy prognosis in untreated subfertile couple: the Walcheren primary care study. Hum Reprod 1997; 12:1582-1588.
Swart P, Mol BW, van der Veen F, van Beurden M, Redekop WK, Bossuyt PM. The value of hysterosalpingography in the diagnosis of tubal pathology, a meta-analysis. Fertil Steril 1995; 64:486-491.
Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile couples. Hum Reprod 2007; 22:536-542.
Steures P, van der Steeg JW, Hompes PG, Habbema JD, Eijkemans MJ, Broekmans FJ, Verhoeve HR, Bossuyt PM, van der Veen F, Mol BW. Intrauterine insemination with controlled ovarian hyperstimulation versus expectant management for couples with unexplained subfertility and an intermediate prognosis: a randomised clinical trial. Lancet 2006; 368:216-221.
Tanahatoe S, Lambalk CB, Hompes PG. The role of laparoscopy in intrauterine insemination: a prospective randomized reallocation study. Hum Reprod 2005; 20: 3225-3230.
23138 Coppus.indd 115 28-09-12 14:55
23138 Coppus.indd 116 28-09-12 14:55
7Chlamydia trachomatis IgG
seropositivity is associated with lower
natural conception rates in ovulatory
women without visible tubal
pathology.
Sjors CoppusJolande Land
Brent OpmeerPieternel SteuresRene Eijkemans
Peter HompesFulco van der Veen
Ben Willem MolJan Willem van der Steeg
Human Reproduction 2011, 26: 3061-3071
The contents of this article were presented as an oral presentation at the 25th ESHRE Annual Meeting, 2009, Amsterdam, Netherlands.
23138 Coppus.indd 117 28-09-12 14:55
Chapter 7
118
Abstract
BACKGROUND: The relation between Chlamydia trachomatis infection and
subsequent tubal damage is widely recognized. As such, C. trachomatis
antibody (CAT) testing can be used to triage women for immediate tubal
testing with hysterosalpingography (HSG) or laparoscopy. However, once
invasive tubal testing has ruled out tubal pathology, CAT serology status is
ignored, as its clinical significance is currently unknown. This study aimed
to determine whether positive CAT serology is associated with lower
spontaneous pregnancy rates in women in whom HSG and/or diagnostic
laparoscopy showed no visible tubal pathology.
METHODS: We studied ovulatory women in whom HSG or laparoscopy
showed patent tubes. Women were tested for C. trachomatis immunoglobulin
G (IgG) antibodies with either micro-immunofluorescence (MIF) or an ELISA.
CAT serology was positive if the MIF titre was ≥ 1:32 or if the ELISA index was
> 1.1. The proportion of couples pregnant without treatment was estimated
at 12 months of follow-up. Time to pregnancy was considered censored at
the date of the last contact when the woman was not pregnant or at the
start of treatment. The association between CAT positivity and an ongoing
pregnancy was evaluated with Cox regression analyses.
RESULTS: Of the 1882 included women without visible tubal pathology, 338
(18%) had a treatment- independent pregnancy within 1 year [estimated
cumulative pregnancy rate 31%; 95% confidence interval (CI): 27%-35%].
Because of differential censoring after 9 months of follow-up, regression
analyses were limited to the first 9 months after tubal testing. Positive C.
trachomatis IgG serology was associated with a statistically significant 33%
lower probability of an ongoing pregnancy [adjusted fecundity rate ratio
0.66 (95% CI 0.49 to 0.89)].
CONCLUSIONS: Even after HSG or laparoscopy has shown no visible tubal
pathology, subfertile women with a positive CAT have lower pregnancy
chances than CAT negative women. After external validation, this finding
could be incorporated into existing prognostic models.
23138 Coppus.indd 118 28-09-12 14:55
CAT testing and natural conception
119
7
Introduction
The devastating effect of Chlamydia trachomatis infections on the female
reproductive tract is widely recognized (Paavonen and Eggert-Kruse, 1999). The
diagnostic value of C. trachomatis immunoglobulin G (IgG) antibody titre (CAT)
testing has been studied extensively, showing fairly good screening properties. In
the Netherlands, this has resulted in a recommendation to include CAT testing as
part of the routine fertility workup of subfertile couples (NVOG, 2004).
CAT is mainly used to identify women at high risk of tubal pathology and to
triage them for further tubal testing (Coppus et al., 2007a). CAT serology status is
ignored however, once invasive testing has ruled out tubal pathology, as its clinical
significance is currently unknown. The causal relationship between Chlamydia
infections and tubal factor subfertility has been well established, but the question
whether positive CAT serology is associated with lower pregnancy rates has
been studied less often. It has been postulated that Chlamydia infections have
a detrimental effect on the endometrium resulting in impaired implantation and
lower pregnancy rates (Witkin, 1999; Neuer et al., 2000). Yet studies examining the
association between elevated C. trachomatis IgG antibody levels and pregnancy
rates in an IVF population were not uniformly conclusive. Whereas some studies
suggested a lower implantation and a lower ongoing pregnancy rate (Rowland
et al., 1985; Lunenfeld et al., 1989; Witkin et al., 1994; Pacchiarotti et al., 2009), other
work suggested that IVF outcome was not associated with CAT status (Osser et al.,
1990; Tasdemir et al., 1994; Claman et al., 1996; Sharara et al., 1997; Spandorfer et al.,
1999).
The prognostic value of C. trachomatis antibodies on spontaneous pregnancy
rates in subfertile women is equally uncertain. To the best of our knowledge, four
previous studies have examined this relation, with conflicting results (Idahl et al.,
2004; Kelz et al., 2006; Perquin et al., 2007; El Hakim et al., 2009). As a consequence, it
is still unclear whether positive CAT serology is associated with lower spontaneous
pregnancy rates once invasive tubal testing has ruled out tubal pathology. The
objective of this study was to assess in a prospective consecutive series of subfertile
couples, whether evidence of a past Chlamydia infection affects the probability of
spontaneous pregnancy in women without visible tubal pathology.
23138 Coppus.indd 119 28-09-12 14:55
Chapter 7
120
Materials and Methods
PatientsBetween January 2002 and February 2004, consecutive couples presenting at the
fertility clinics of 38 hospitals in The Netherlands were invited to participate in a
prospective cohort study. Institutional Review Board approval for this study was
obtained from each participating hospital. For all couples, a thorough medical
history was taken, including previous pelvic inflammatory disease (PID) and
Chlamydia infection (Coppus et al., 2007b), and all couples underwent a basic
fertility work-up according to the guidelines of the Dutch Society of Obstetrics and
Gynaecology. The details of this workup have been described in detail elsewhere
(van der Steeg et al., 2007).
Couples with a history of reversal of tubal ligation, tubal surgery, IVF, or previous
tubal patency testing were excluded. The present study was limited to couples
with a regular ovulatory cycle, defined as a cycle length between 23 and 35 days
with a within cycle variation of less than 8 days. Ovulation was confirmed by a basal
body temperature chart, mid-luteal serum progesterone, or by ultrasonographic
monitoring of the cycle. Duration of subfertility was defined as the period between
the date the couple had started unprotected intercourse and date of tubal testing.
We calculated female age at the time of tubal testing. Subfertility was considered
to be secondary if a woman had conceived in this or in a previous partnership,
regardless of the pregnancy outcome.
Semen analysis was performed at least once, and a total motile sperm count
(TMC) was calculated by multiplying semen volume, semen concentration and
percentage of progressive motile spermatozoa. Couples in whom semen analysis
showed a severe impairment of semen quality requiring IVF-ICSI (defined as a TMC
< 1 x 106) were excluded from the present analysis.
Chlamydia antibody testing was performed either with a micro-immuno-
fluorescence (MIF) technique (BioMerieux, Paris, France) or with an ELISA (Medac
GmbH, Wedel, Germany; Savyon Diagnostics LTD, Marne La Valle, France). CAT
testing was performed according to the manufacturer’s instructions and the test
result was considered positive at an MIF titre of ≥ 1:32 or an ELISA index of >1.1
(NVOG, 2004). Endocervical swabs were not routinely taken.
23138 Coppus.indd 120 28-09-12 14:55
CAT testing and natural conception
121
7
Tubal patency was evaluated by hysterosalpingography (HSG) and/or diagnostic
laparoscopy, depending on the local protocol of the participating hospital. Tubal
pathology was considered absent if the HSG showed passage of contrast medium
through both tubes and normal intra-abdominal spread of contrast medium. When
laparoscopy was used, tubal pathology was considered absent if there were no signs
of tubal obstruction or severe adhesions interfering with ovum pickup. We did not
take into account minimal stage endometriosis. In case a woman underwent both
HSG and diagnostic laparoscopy, which was often done in case of an abnormal HSG,
the results of the laparoscopy were decisive. No therapeutic interventions were
performed during these procedures. Only women with bilateral patent tubes were
included in the analyses; all women with intra-abdominal pockets of contrast after
HSG or extensive adhesions at laparoscopy were excluded.
Follow-up started after tubal testing and ended 12 months thereafter. Primary
end-point was spontaneous conception resulting in an ongoing pregnancy at 12
weeks of gestational age. The first day of the last menstrual cycle was considered
to mark the end of time until spontaneous conception.
Statistical analysis Time to spontaneous conception resulting in an ongoing pregnancy was
considered censored at the moment treatment had started within 12 months
after testing, or at the last date of contact during follow-up if the couple did not
have an ongoing pregnancy at that date. If a woman became lost to follow-up
within 12 months after evaluation of tubal status, her general practitioner was sent
a questionnaire to document the most recent fertility status of the couple.
Kaplan-Meier curves for the chances of spontaneous ongoing pregnancy over
time were constructed for CAT positive and CAT negative women separately. The
difference in time to pregnancy between these groups was tested for significance
with the log-rank test statistic. Subsequently, we evaluated the influence of CAT
positivity using Cox regression modelling, expressing the effect as a fecundity rate
ratio (FRR) with corresponding 95% confidence intervals (CIs). An FRR expresses
the probability of an ongoing pregnancy per time unit for women with a particular
feature relative to the probability in women without that feature. We estimated
unconditional and conditional FRRs. For the conditional FRR, we built a multivariable
Cox model by additionally including the following known prognostic factors:
23138 Coppus.indd 121 28-09-12 14:55
Chapter 7
122
duration of subfertility, female age at tubal testing, and secondary subfertility
(Hunault et al., 2004).
The proportional hazard assumption of the Cox model was checked separately
for each variable before performing the regression analysis. Such checking was
done graphically by inspecting the graphs of loge [-log
eS(t)] versus log
e(t) for each
dichotomous covariate to assess whether the curves run parallel. Additionally, time-
dependent covariates were included and we checked whether their coefficient
differed significantly from zero. All data were analysed using SPSS 12.0.1 (SPSS Inc.,
Chicago, IL, USA).
Results
Data of 7860 couples were collected during the study period. After excluding
couples with anovulation or severe male factor subfertility, and those that had
undergone previous tubal testing, IVF or tubal surgery, data of 5522 ovulatory
subfertile women were available. In 1569 women (28%) CAT was not measured; in
1602 other women (29%) CAT was measured but no tubal testing was performed.
Tubal pathology was diagnosed in 469 (20%) at HSG or laparoscopy. The data of
the remaining 1882 women with no visible tubal pathology and a known CAT
result were included in this analysis (Fig. 1).
Baseline characteristics of the 1882 included couples are shown in Table 1. CAT
positive women were slightly, but statistically significantly older and more often
suffered from a secondary subfertility than did CAT negative women. In 1244
women (66%) tubal pathology was excluded by HSG, whereas in 638 of women
(34%) this was done by laparoscopy. CAT testing was performed by MIF in 443
women (24%) and by ELISA in 1439 women (76%). Overall, 438 women (23%) had
a positive CAT result.
Follow-up status is shown in Fig. 1. The median duration of follow-up, from time of
tubal testing to pregnancy or start of treatment was 3.6 months (interquartile range
1.4 - 7.4 months). Of all included couples, 338 (18%) had an ongoing pregnancy
without treatment within 1 year after evaluation of tubal function, including
four multiple pregnancies. In 29 women pregnancy resulted in a miscarriage
23138 Coppus.indd 122 28-09-12 14:55
CAT testing and natural conception
123
7
and one pregnancy was ectopic, resulting in an overall miscarriage and ectopic
pregnancy rate of 8.6% and 0.3% respectively. A total of 1162 couples (62%) had
started treatment within 12 months; 363 others (19%) neither started treatment
nor became pregnant. The estimated overall fraction of untreated couples with a
treatment independent pregnancy at 12 months of follow-up for all women was
31% (95% CI: 27%-35%).
Consecutive subfertile womenN=7860
Excluded - Anovulation n= 938 - TMC<1*106 n= 636 - previous IVF or tubal surgery n= 196 - previous tubal testing n= 568
Excluded - CAT not measured n= 1569
Follow-up Pregnancy without treatment in 1 year N= 338 (18%)
• Ongoing pregnancy, singleton N= 304 (16.2%) • Ongoing pregnancy, multiple N= 4 (0.2%) • Miscarriage N= 29 (1.5%) • Ectopic pregnancy N= 1 (0.1%)
No spontaneous pregnancy in 1 year N= 1544 (82%)
• Treatment < 12 months N= 1162 (62%) • No conception in 12 months N= 363 (19%) • Lost to follow-up N= 19 (1%)
Subfertile ovulatory women included in cohortN=5522
Excluded - tubal pathology n= 469
Excluded - no tubal testing n= 1602
Ovulatory women with CAT testing and normal tubesN=1882
Available CAT resultN=3953
CAT testing and tubal testingN=2351
FIGURE 1. Flow of patients
23138 Coppus.indd 123 28-09-12 14:55
Chapter 7
124
TABLE I. Baseline characteristics of 1882 couples
CAT positive (n = 438) CAT negative (n = 1444) P-value
Median 5th-95th percentile Median 5th-95th percentile
Female age (year) 34.0 27.0-39.3 32.2 25.2-39.3 <0.001
Male age (year) 36.2 29.1-46.8 34.1 27.5-44.3 <0.001
Duration of subfertility (year) 2.0 1.1-5.2 1.9 1.1-4.1 <0.001
Semen analysis – TMC (*106) 52.2 4.8-284 50.9 3.2-285 0.73
Secondary subfertility 46% 36% <0.001
History of PID 5.3% 1.2% <0.001
History of Chlamydia 14.8% 2.4% <0.001
The Kaplan-Meier curves of women with a positive CAT status and those with a
negative CAT status are shown in Fig. 2. There was a significant difference (P = 0.02).
The proportional hazards assumption had to be rejected for the effect of CAT status
on pregnancy chances over time (P < 0.001). Further analyses revealed that this
deviance from proportionality was not constant over time, so we could not model
the effect with a time-dependent covariate. Based on the loge
[-logeS(t)] versus
loge(t) plots, we came to the conclusion that there was differential censoring after
9 months. We therefore decided to limit the analysis to the first 9 months after
tubal testing. In this time period, positive C. trachomatis IgG serology had a strong
effect on time to natural conception, with 35% lower pregnancy chances (FRR
0.65, 95% CI 0.48-0.87) (Table II).
Multivariable Cox regression analysis showed that the conditional FRR was similar
to the unconditional one, indicating that the lower pregnancy rates in CAT positive
women were also present when known prognostic factors for spontaneous
pregnancy were taken into account. This makes it unlikely that the observed
association can be attributed to a confounding effect of duration of subfertility,
female age and type of subfertility (Table II). As laparoscopy is considered the
reference standard to exclude tubal pathology (NVOG, 2004), we performed a
sub-analysis in which we compared women who had HSG only with those who
underwent laparoscopy, which produced similar results.
23138 Coppus.indd 124 28-09-12 14:55
CAT testing and natural conception
125
7
cum
ula
tive
on
go
ing
pre
gn
acy
rate
(%)
40
30
20
10
0 0 2 4
time after spontaneous pregnancy (months)6 8 10 12
FIGURE 2. Spontaneous pregnancy rates for CAT IgG negative (upper line) and CAT positive (lower line) without visible tubal pathology.
TABLE II. The effect of C. trachomatis IgG serology on spontaneous pregnancy at 9 months follow-up*
Unconditional Conditional
FRR 95% CI P-value FRR 95% CI P-value
Positive CAT 0.65 0.48-0.87 <0.001 0.66 0.49-0.89 0.007
Duration of subfertility (per year) 0.77 0.68-0.88 <0.001
Female age (per year) 0.97 0.94-0.99 0.03
Secondary subfertility 1.88 1.49-2.36 <0.001
Results obtained in Cox regression analyses; FRR, fecundity rate ratio; 95% CI, 95% confidence interval; *Analysis limited to first 9 months of follow-up due to differential censoring
Discussion
In this large prospective multicentre study, we observed that subfertile ovulatory
women with a positive CAT result but without visible tubal pathology at HSG or
laparoscopy, had significantly lower spontaneous pregnancy rates in the first 9
months after tubal testing than comparable women with a negative CAT test. This
effect was also observed when other patient characteristics (secondary subfertility,
female age and duration of subfertility) were taken into account. Pregnancy rates
in CAT positive women were about a third lower.
23138 Coppus.indd 125 28-09-12 14:55
Chapter 7
126
Strength of our study is the multicentre design through which a large number of
couples could be included. Although no uniform CAT assay was used in all clinics,
the majority used ELISA assays, which are less laborious and reader-dependent
than MIF. It has been shown that ELISAs in general show less cross-reactivity with
other Chlamydia species than the MIF Biomerieux assay used in the present study
(Gijssen et al., 2001; Land et al., 2003), although a recent study in general showed
better diagnostic performance of MIF compared with ELISA (Broeze et al., 2011).
However, as the CAT assays that were used in this study are all commercially
available and routinely used in hospitals throughout the country, we feel that our
study reflects routine daily practice, thereby improving the generalizability of the
results of the study.
For tubal testing, both HSG and diagnostic laparoscopy were used in the
participating hospitals. In the present study, hospitals were free to use their own
protocol for evaluating tubal status. This generated the practice variation from
which our analysis could benefit. The Dutch guideline on the diagnostic subfertility
work-up (NVOG, 2004) recommends laparoscopy for CAT positive patients, and HSG
or no tubal assessment in case of a negative CAT. Strict adherence to this guideline
would have introduced a bias into our research, but in our study about half of CAT
positive women underwent HSG, whereas 30% of CAT negative women underwent
diagnostic laparoscopy. We acknowledge that, ideally, tubal status in all women
would have been verified by laparoscopy, but feel that such a design would be
hardly feasible for ethical reasons and due to restricted availability as a result of high
costs and limited operating time. Around 10% of women with a normal HSG show
tubal pathology on laparoscopy (Mol et al., 1999; den Hartog et al., 2008; Verhoeve
et al., 2011). Taking into account that more CAT negative women underwent tubal
testing with HSG than did CAT positive women, the effect of CAT found in the
present study cannot be attributed to differential verification. Instead, it is because
of this variability in the use of HSG and laparoscopy that we were able to evaluate
the effect of CAT positivity on pregnancy chances. In addition to this differential
verification, women who were tested with CAT were only partially verified. In the
present study, 46% of women with an available CAT result did not undergo invasive
tubal testing, of which 79% were CAT negative. The present study addressed
women in whom pathology had been ruled out, but had different CAT status. We
therefore believe that the main impact of the partial verification was a reduction in
the precision, rather than a bias of fecundity rate estimates.
23138 Coppus.indd 126 28-09-12 14:55
CAT testing and natural conception
127
7
In the present study we tried to control for confounding effects of tubal pathology
by limiting the analysis to those women without tubal abnormalities on HSG
and/or laparoscopy. In addition, the confounding effect of the use of assisted
reproductive technologies was eliminated by censoring time to pregnancy
whenever treatment was started. Unfortunately, we were not able to analyse CAT
titres as a continuous or categorical variable, taking into account the quantitative
nature of this test, as the majority of clinics used an ELISA assay of which the results
are reported as either positive or negative and no titres are available. Nevertheless,
cut-off values advocated in the guideline of the Dutch Society for Obstetrics and
Gynaecology were used, i.e. a titre of ≥1:32 in case of MIF and an index of >1.1 in
case of ELISA. We feel this approach reflects daily clinical practice, where the CAT
result is usually interpreted by the clinician as either being positive or negative.
Our results are in contrast with that of three other studies, which found no
significant effect of C. trachomatis antibodies on pregnancy rates (Idahl et al., 2004;
Perquin et al., 2007; El Hakim et al., 2009). The most recent study (El Hakim et al.,
2009) explored the relation between CAT titres and probability of pregnancy in
women with normal-looking Fallopian tubes at laparoscopy. However, with a total
sample size of 174 women, the power of the study to detect an association was
limited, due to the formation of multiple subgroups, including endometriosis.
Idahl and colleagues, studied 244 women followed for a mean of 37 months, but
unfortunately their analysis did not take into account time to pregnancy (Idahl et
al., 2004). Instead, pregnancy was evaluated as a dichotomous outcome, ignoring
the fact that fecundity is a gradual continuum based on time to pregnancy.
Furthermore, the analysis did not correct for the presence of tubal pathology
and type of pregnancy (spontaneous or treatment-related). The third study,
using data of 153 women who participated in a randomized controlled trial,
evaluated the prognostic significance of CAT on cumulative pregnancy rates at
18 months follow-up (Perquin et al., 2007). No statistically significant difference
in cumulative pregnancy rates was found (FRR 0.73, 95% CI 0.42-1.25). While this
study used a time-to-event analysis, the interpretation of its results is hampered
by heterogeneity in outcomes, as the authors did not solely evaluate treatment
independent pregnancy rates, but also included pregnancies achieved after intra-
uterine insemination, IVF, tubal surgery and other therapeutic options.
23138 Coppus.indd 127 28-09-12 14:55
Chapter 7
128
The results of our study are in accordance with a previous study, which showed in
170 studied women that the number of treatment independent pregnancies was
lower with increasing CAT MIF titres (Kelz et al., 2006). However, a drawback of that
study is the fact that it did not correct for the effects of tubal pathology visible at
HSG or laparoscopy related to a previous Chlamydia infection. As higher MIF titres
are associated with a higher likelihood and severity of tubal disease (Akande et al.,
2003), it is possible that an indirect association of titres on spontaneous pregnancy
rates was measured; an association which in fact was due to the presence of tubal
pathology.
We found a marked difference in effect of CAT positivity on pregnancy rates at 9
months of follow-up. This difference might be explained by the increasing duration
of subfertility in those couples who did not conceive, which has been shown to be
associated with a decrease in spontaneous pregnancy rates (van der Steeg et al.,
2007). It might be that the remaining group of women at a later time in follow-up
differs from the total cohort at the beginning of follow-up, as it can be hypothesized
that when pregnancy has not occurred after 3 or 4 years, spontaneous pregnancy
chances are low, due to a preponderance of severe -undefined- pathology that
overrules other test results, such as the CAT that we investigated. Alternatively,
the marked difference between pregnancy rates in CAT positive and CAT negative
women at 9 months of follow-up could be explained by selection bias due to
non-random censoring, i.e. higher proportions of CAT positive couples are starting
treatment.
Two hypotheses could explain why CAT seropositivity influences pregnancy rates
negatively even in case of patent tubes. First, it is known that C. trachomatis may
persist in the upper female genital tract and this persistent exposure to the micro-
organism could result in a chronic inflammatory response that is responsible
for damage to the Fallopian tubes (Patton et al., 1994; Den Hartog et al., 2005). It
has been suggested that these persistent C. trachomatis infections also elicit an
autoimmune response to human heat shock proteins (HSPs) due to their structural
similarity with Chlamydia HSP (Neuer et al., 1997; Witkin, 1999). Human HSPs play
an important role in early pregnancy (Witkin, 2002), and animal as well as human
research indicates that autoimmunity to human HSP exerts a negative influence
on embryo development and implantation (Witkin et al., 1994; Witkin 1999, Neuer
et al., 2000). From this it has been hypothesized that women with Chlamydia
23138 Coppus.indd 128 28-09-12 14:55
CAT testing and natural conception
129
7
IgG antibodies, as marker of a previous infection, might be at risk for impaired
implantation and lower pregnancy rates. Whether women who do not develop
tubal pathology after Chlamydia infection are able to clear the micro-organism
from their genital tracts before tubal pathology and an autoimmune response to
human HSPs are elicited remains to be elucidated. Some research has focused on
persistent endometrial infections that might either affect embryonic development
or the implantation capacity of the endometrium, leading to implantation failure
(Kamiyama et al., 2004; Romero et al., 2004; Den Hartog 2010). A second hypothesis
to explain lower fecundity in CAT positive women without visible tubal pathology,
is that intratubal microdamage may have resulted from a previous Chlamydia
infection that cannot be detected with conventional patency tests such as HSG
or laparoscopy. In addition, these tests are known to be imperfect, with some
between-reader variability, but it is unlikely that this would explain the substantially
lower pregnancy rates in CAT positive women.
We showed that in 9 months follow-up subfertile women with a positive CAT
test without visible tubal pathology on HSG or laparoscopy have a 33% lower
probability of pregnancy as compared to CAT negative women. This finding could
be useful in counselling subfertile couples on their spontaneous fertility prospects.
If our findings hold in external validation, CAT has to be incorporated in future
prognostic models on spontaneous pregnancy. Our finding demonstrates that
even in absence of tubal pathology, decreased fecundity is a late effect of lower
genital tract Chlamydia infections. This might increase the late costs of Chlamydia
infections, and as such alter the cost-effectiveness of Chlamydia screening
(Land et al., 2010). CAT apparently does not only have diagnostic value, but also
gives valuable additional prognostic information on the probability to conceive
spontaneously, even after tubal testing has shown no visible tubal pathology.
23138 Coppus.indd 129 28-09-12 14:55
Chapter 7
130
References
Akande VA, Hunt LP, Cahill CJ, Caul EO, Ford WC, Jenkins JM. Tubal damage in infertile women: prediction using chlamydia serology. Hum Reprod 2003; 18:1841-1847.
Broeze KA, Opmeer BC, Coppus SF, van Geloven N, Alves MF, Anestad G, Bhattacharya S, Allen J, Guerra-Infante MF, den Hartog JE et al. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011; 17:301-310.
Claman P, Amimi MN, Peeling RW, Toye B, Jessamine P. Does serologic evidence of remote Chlamydia trachomatis infection and its heat shock protein (CHSP60) affect in vitro fertilization-embryo transfer outcome? Fertil Steril 1996; 65:146-149.
Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007; 22:1353-1358.
Coppus SF, Verhoeve HR, Opmeer BC, van der Steeg JW, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW. Identifying subfertile ovulatory women for timely tubal patency testing: a clinical decision rule based on medical history. Hum Reprod 2007;22:2685-2692.
Den Hartog JE, Land JA, Stassen FR, Kessels AG, Bruggeman CA. Serological markers of persistent C. trachomatis infections in women with tubal factor subfertility. Hum Reprod 2005; 20:986-990.
Den Hartog JE, Lardenoije CM, Severens JL, Land JA, Evers JL, Kessels AG. Screening strategies for tubal factor subfertility. Hum Reprod 2008; 23:1840-1848.
Den Hartog JE. Academic Thesis, Maastricht University 2010.
El Hakim EA, Epee M, Draycott T, Gordon UD, Akande VA. Significance of positive Chlamydia serology in women with normal-looking Fallopian tubes. Reprod Biomed Online 2009; 19:847-851.
Gijssen AP, Land JA, Goossens VJ, Leffers P, Bruggeman CA, Evers JL. Chlamydia pneumoniae and screening for tubal factor subfertility. Hum Reprod 2001; 16:487-491.
Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, te Velde ER. Two new prediction rules for spontaneous pregnancy leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004, 19:2019-2026.
Idahl A, Boman J, Kumlin U, Oloffson JI. Demonstration of Chlamydia trachomatis IgG antibodies in the male partner of the infertile couple is correlated with a reduced likelihood of achieving pregnancy. Hum Reprod 2004; 19:1121-1126.
Kamiyama S, Teruya Y, Nohara M, Kanazawa K. Impact of detection of bacterial endotoxin in menstrual effluent on the pregnancy rate in in vitro fertilization and embryo transfer. Fertil Steril 2004; 82:788-792.
Keltz MD, Gera PS, Moustakis M. Chlamydia serology screening in infertility patients. Fertil Steril 2006; 85:752-754.
Land JA, Gijsen AP, Kessels AG, Slobbe ME, Bruggeman CA. Performance of five serological Chlamydia antibody tests in subfertile women. Hum Reprod 2003; 18:2621-2627.
Land JA, van Bergen JE, Morré SA, Postma MJ. Epidemiology of Chlamydia trachomatis infection in women and the cost-effectiveness of screening. Hum Reprod Update 2010; 16:189-204.
Lunenfeld E, Shapiro BS, Sarov B, Sarov I, Insler V, Decherney AH. The association between chlamydial-specific IgG and IgA antibodies and pregnancy outcome in an in vitro fertilization program. J In Vitro Fert Embryo Transf 1989; 6:222-227.
Mol BW, Collins JA, Burrows EA, van der Veen F, Bossuyt PM. Comparison of hysterosalpingography and laparoscopy in predicting fertility outcome. Hum Reprod 1999; 14:1237-1242.
23138 Coppus.indd 130 28-09-12 14:55
CAT testing and natural conception
131
7
Nederlandse Vereniging voor Obstetrie en Gynaecologie. Orienterend fertiliteitsonderzoek [Dutch Society for Obstetrics and Gynaecology: guideline diagnostic fertility workup] 2004.
Neuer A, Lam KN, Tiller FW, Kiesel L, Witkin SS. Humoral immune response to membrane components of Chlamydia trachomatis and expression of human 60 kDa heat shock protein in follicular fluid of in-vitro fertilization patients. Hum Reprod 1997; 12:925-929.
Neuer A, Spandorfer SD, Giraldo P, Dieterle S, Rosenwaks Z, Witkin SS. The role of heat shock proteins in reproduction. Hum Reprod Update 2000; 6:149-159.
Osser S, Persson K, Wramsby H, Liedholm P. Does previous Chlamydia trachomatis infection influence the pregnancy rate of in vitro fertilization and embryo replacement? Am J Obstet Gynecol 1990; 162:40-44.
Paavonen J and Eggert-Kruse W. Chlamydia trachomatis: impact on human reproduction. Hum Reprod Update 1999; 5:433-447.
Pacchiarotti A, Sbracia M, Mohamed MA, Frega A, Pacchiarotti A, Espinola SM, Aragona C. Autoimmune response to Chlamydia trachomatis infection and in vitro fertilization outcome. Fertil Steril 2009; 91:946-948.
Patton DL, Cosgrove Sweeney YT, Kuo CC. Demonstration of delayed hypersensitivity in Chlamydia trachomatis salpingitis in monkeys: a pathogenic mechanism of tubal damage. J Infect Dis 1994; 169:680-683.
Perquin DA, Beersma MF, de Craen AJ, Helmerhorst FM. The value of Chlamydia trachomatis-specific IgG antibody testing and hysterosalpingography for predicting tubal pathology and occurrence of pregnancy. Fertil Steril 2007; 88:224-226.
Romero R, Espinoza J, Mazor M. Can endometrial infection/inflammation explain implantation failure, spontaneous abortion, and preterm birth after in vitro fertilization? Fertil Steril 2004; 82:799-804.
Rowland GF, Forsey T, Moss TR, Steptoe PC, Hewitt J, Darougar S. Failure of in vitro fertilization and embryo replacement following infection with Chlamydia trachomatis. J In Vitro Fert Embryo Transf 1985; 2:151-155.
Sharara FI, Queenan JT, Springer RS, Marut EL, Scoccia B, Scommegna A. Evaluated serum Chlamydia trachomatis IgG antibodies. What do they mean for IVF pregnancy rates and loss? J Reprod Med 1997; 42:281-286.
Spandorfer SD, Neuer A, LaVerda D, Byrne G, Liu HC, Rosenwaks Z, Witkin SS. Previously undetected Chlamydia trachomatis infection, immunity to heat shock proteins and tubal occlusion in women undergoing in-vitro fertilization. Hum Reprod 1999; 14:60-64.
Tasdemir I, Tasdemir M, Kodama H, Sekine K, Tanaka T. Effect of chlamydial antibodies on the outcome of in vitro fertilization (IVF) treatment. J Assist Reprod Genet 1994; 11:104-106.
Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile. Hum Reprod 2007; 22:536-342.
Verhoeve HR, Coppus SF, van der Steeg JW, Steures P, Hompes PG, Bourdrez P, Bossuyt PM, van der Veen F, Mol BW. The capacity of hysterosalpingography and laparoscopy to predict natural conception. Hum Reprod 2011, 26:134-142.
Witkin SS, Sultan KM, Neal GS, Jeremias J, Grifo JA, Rosenwaks Z. Unsuspected Chlamydia trachomatis infection and in vivo fertilization outcome. Am J Obstet Gynecol 1994; 171:1208-1214.
Witkin SS. Immunity to heat shock proteins and pregnancy outcome. Infect Dis Obstet Gynecol 1999; 7:35-38.
Witkin SS. Immunological aspects of genital clamydia infections. Best Pract Res Clin Obstet Gynaecol 2002; 16:865-874.
23138 Coppus.indd 131 28-09-12 14:55
23138 Coppus.indd 132 28-09-12 14:55
Part 3Discussion, summary and epilogue
23138 Coppus.indd 133 28-09-12 16:57
23138 Coppus.indd 134 28-09-12 14:55
8General discussion
23138 Coppus.indd 135 28-09-12 14:55
Chapter 8
136
It is estimated that tubal pathology accounts for subfertility in around 10 to 30
percent of couples (Evers 2002). The subfertility work-up is therefore usually
ended with invasive tubal patency testing, such as hysterosalpingography (HSG)
or laparoscopy with tubal testing, of which the latter is still regarded as the gold
standard for diagnosing tubal pathology. In the late nineties, 89% of all fertility
specialists in the USA routinely performed diagnostic laparoscopy in the fertility
work-up (Glatstein 1997). The majority of diagnostic laparoscopies will not reveal
any abnormality (Forman et al., 1993), as a result of which its role in current fertility
practice is heavily debated (Tanahatoe et al., 2003; Lavy et al., 2004; Tanahatoe et al.,
2005; Bosteels et al., 2008).
As laparoscopy is an expensive investigation, and not without risk of complications
(Jansen et al., 1997; Chapron et al., 1998; Härkki-Siren et al., 1999), some authors
have searched for less invasive alternatives like hysterosalpingo-contrast-
sonography (Schlief and Deichert, 1991) or transvaginal hydrolaparoscopy (Gordts
et al., 1998). However, these newer test have not gained widespread use in daily
clinical practice, as a result of which most clinicians still use hysterosalpingography
and/or diagnostic laparoscopy to conclude the fertility work-up. The reasons for
this conservatism are unclear, but it might be due to uncertainty about the validity
of published reports on such test alternatives (Chapter 2).
Other groups have searched for better screening strategies to only subject
those women with a high probability of finding tubal abnormalities to invasive
tubal testing. The majority of this work has focused on the value of Chlamydia
trachomatis antibody (CAT) testing (Mol et al., 1997; den Hartog et al., 2006;
Broeze et al., 2011). CAT testing is not routinely performed worldwide, and is not
a recommended investigation in international guidelines on the fertility work-up.
Some authors have explored the predictive value of medical history taking, and
concluded that medical history is of limited use for a policy of restricted use of
laparoscopy (Johnson et al., 2000; Hubacher et al., 2004). This conclusion has been
disputed by others (Thomas et al., 2001; Donnez and Jadoul, 2004; McComb, 2004;
Tulandi and Platt, 2004).
23138 Coppus.indd 136 28-09-12 14:55
General discussion
137
8
Diagnostic aspects of tubal testing
For the work presented in this thesis, we decided to go back to the basis and
explore the grounds on which guidelines advocate medical history taking for
selecting women for invasive tubal testing (Luttjeboer et al., 2009). We found
that a history of complicated appendicitis, pelvic surgery, ectopic pregnancy and
endometriosis are strong risk indicators for tubal pathology in subfertile women.
Due to limitations related to conventional meta-analytic techniques (Broeze et al.,
2009), we were unable to estimate exact probabilities of tubal pathology, and to
correct for dependency.
Therefore, we developed two prediction models for (severe) tubal pathology that
express the probability of tubal pathology, and correct for mutual dependence
(Chapters 4 and 5). While the first model was derived in a relatively small data set,
and used as a pilot study for the development of the large-scale model, all factors
from this model were also present in the larger model. We showed for any tubal
pathology, that a prolonged duration of subfertility, a pregnancy of both partners
or of only the male partner in a previous relationship, male smoking habits,
a history of PID, history of Chlamydia infection, history of ectopic pregnancy, a
history of legally induced abortion and previous pelvic surgery are all factors that
increase the likelihood of tubal pathology. Clinically more relevant is the model for
bilateral tubal pathology, for which it was shown that referral by a gynaecologist,
female age above 32 years, dysmenorrhoea and smoking of the woman increase
the likelihood of disease, as does a longer duration of subfertility, a pregnancy
of both partners in a previous relationship, male smoking habits, history of PID,
history of Chlamydia infection, ectopic pregnancy and previous pelvic surgery. In
contrast, a previous pregnancy in the current relationship and a higher male age
lower the probability of bilateral tubal pathology.
The association of male characteristics with female tubal disease was remarkable,
as male age, male smoking habits, as well as having fathered a pregnancy in a
previous relationship cannot causally be related to tubal damage in a woman.
Possibly these male characteristics act as indicators for some unrecorded variables,
such as number of sexual partners or socio-economic status, or they reflect lifestyle
and selection mechanisms.
23138 Coppus.indd 137 28-09-12 14:55
Chapter 8
138
Although both models are able to make an almost perfect distinction between
women with a low and high probability of tubal pathology, our models have not
been externally validated yet, so caution has to be taken to apply them already in daily
clinical practice. Reassuringly though, a recent study from Israel confirmed several of
the risk factors identified in our study, including increasing age, secondary subfertility,
and male smoking habits (Farhi et al., 2011). Our findings were also confirmed in an
individual patient data-meta analysis (Broeze et al., 2012), which showed a similar
predictive performance for medical history taking as did our models. Interestingly, this
study also showed that both CAT and HSG independently improve the performance of
the model when compared with the results of diagnostic laparoscopy.
Prognostic aspects of tubal testing
It is our opinion that medical history taking is certainly of value in selecting those
women that might benefit from early tubal testing. The next question is which
tubal test should then be applied. As the purpose of the fertility work-up has shifted
from a purely diagnostic question to a prognostic orientation, it is not so much the
question which tubal test is the most accurate, but to what extent an abnormal
test result interferes with chances of natural conception. To answer this question
we related findings at HSG and laparoscopy to spontaneous pregnancy (Chapter
6). We found that women with unilateral tubal pathology at HSG or laparoscopy
have a moderate, non-significant reduction in spontaneous pregnancy chances,
whereas those with bilateral tubal pathology at HSG or laparoscopy have a
severe reduction in the ability to achieve natural conception. This reduction in
fertility prospects was similar for HSG and laparoscopy, suggesting that HSG and
laparoscopy have a comparable predictive capacity for natural conception.
It is noteworthy that unilateral pathology is of only minor prognostic relevance,
confirming that the goal of tubal testing should be to detect women with bilateral
tubal pathology, so that they can be treated with IVF. Studies on the prediction
of pregnancy after intra-uterine insemination also failed to show a statistically
significant impact of unilateral tubal pathology on pregnancy rates (Steures et al.,
2004). In addition, no evidence-based treatment policies are known for women
affected by unilateral tubal pathology. In case of unilateral abnormalities on
HSG, one could consider several options, e.g. one could perform a diagnostic
23138 Coppus.indd 138 28-09-12 14:55
General discussion
139
8
laparoscopy. In 95% of cases though, laparoscopic findings will not change the
HSG-based treatment plan (Lavy et al., 2004). Alternatively, one could aim to
increase pregnancy chances with IUI with mild ovarian stimulation or bypass the
Fallopian tube with IVF. We found only one case-control study in the literature,
which evaluated IUI/COH in women with unilateral tubal pathology on HSG as
compared to IUI/COH in women with unexplained subfertility. Similar pregnancy
rates were found in both groups (Farhi et al., 2007). One of the studies that still
needs to be performed therefore is a randomized controlled trial comparing
expectant management, IUI/COH and IVF in women with unilateral tubal
pathology, evaluating pregnancy rates and costs.
Currently, after invasive tubal testing has shown bilateral patent tubes, the result
of CAT testing is generally ignored (Chapter 7). We questioned whether evidence
of a past Chlamydia infection affects the probability of spontaneous pregnancy in
women without visible tubal pathology, and found that subfertile women with
a positive CAT test but without visible tubal pathology on HSG or laparoscopy
have a 33% lower probability of pregnancy as compared to CAT negative women.
Results from previous studies on this topic are conflicting, which urges the need for
external validation of our findings. If confirmed, it would imply a new prognostic
factor that could be incorporated when developing updated prognostic models
for treatment independent pregnancy.
Ultimate value of tubal testing
Although much effort is being put in evaluating accuracy of different screening
strategies, the main question is whether ultimately tubal testing will increase the
likelihood of conception. For laparoscopy, there is no evidence that the procedure
itself increases this likelihood (Al-Fadhli et al., 2006; Luttjeboer et al., 2007). For
hysterosalpingography, there is some evidence that performing the procedure
with oil-soluble contrast medium increases the probability of natural conception
when compared with water-soluble contrast medium. However, the evidence is
based on aggregated data of two trials, where statistical heterogeneity was present
and the higher quality trial failed to show a significant difference. Currently, the
H2Oil study is recruiting women to answer the question whether we should use
oil-based contrast medium during a diagnostic HSG (H2Oil trial number: NTR3270).
23138 Coppus.indd 139 28-09-12 14:55
Chapter 8
140
One randomized controlled trial examined the value of a standard laparoscopy
after normal hysterosalpingography before start of intrauterine insemination
and found that a standard laparoscopy with therapeutic interventions does not
increase the likelihood of conception (Tanahatoe et al., 2005). Another randomized
study showed that early tubal testing did not increase spontaneous pregnancy
rates if compared with delayed tubal testing (Lindborg et al., 2009). Whether a
tubal test strategy should start with hysterosalpingography or whether this test
can be abandoned was evaluated in a randomised trial (Perquin et al., 2006). In
this study the fertility workup was completed with either HSG with water-based
contrast or immediate laparoscopy. In case of no abnormalities at HSG, women
were managed expectantly for 6 months. In case of no pregnancy after this time
period, women underwent a laparoscopy. In case of an abnormal HSG, women
underwent laparoscopy within one to two months. At 18 months of follow-up, no
differences in pregnancy rates between both strategies were noted. The authors
concluded that HSG has no role in the fertility work-up , but we criticized this study
due to an inadequate methodological design (as no specific treatment strategies
followed a specific test result) and showed that a strategy starting with HSG results
in a 30% reduction in the number of laparoscopies (Coppus et al., 2006). Although
more and more authors advocate that HSG is a useless test in modern fertility
practice (den Hartog et al., 2008; Lim et al., 2011), we feel that given the current
evidence HSG still deserves a prominent role in the fertility workup.
In conclusion, a difference in outcome between different tubal testing strategies
can currently only be made by referring women with bilateral tubal pathology for
IVF early. To end the debate on whether we should perform tubal testing, and if
so, with which tests, a large scale multilevel randomized trial is needed, comparing
different screening strategies with explicit treatment rules in case of abnormal test
results. Only by performing such a study, the true value and cost-effectiveness of
tubal testing can be revealed.
23138 Coppus.indd 140 28-09-12 14:55
General discussion
141
8
ReferencesAl-Fadhli R, Sylvestre C, Buckett W, Tulandi T. A randomized study of laparoscopic chromopertubation with lipiodol versus saline in infertile women. Fertil Steril 2006; 85:505-507.
Bosteels J, Van Herendael B, Weyers S, D’Hooghe T. The position of diagnostic laparoscopy in current fertility practice. Hum Reprod Update 2007; 13:477-85.
Broeze KA, Opmeer BC, Bachmann LM, Broekmans FJ, Bossuyt PM, Coppus SF, Johnson NP, Khan KS, ter Riet G, van der Veen F, van Wely M, Mol BW. Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine. BMC Med Res Methodol 2009; 27:9:22.
Broeze KA, Opmeer BC, Coppus SF, van Geloven N, Alves MF, Anestad G, Bhattacharya S, Allen J, Guerra-Infante MF, den Hartog JE, Land JA, Idahl A, van der Linden PJ, Mouton JW, Ng EH, van der Steeg JW, Steures P, Svenstrup HF, Tiitinen A, Toye B, van der Veen F, Mol BW. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011; 17:301-310.
Broeze KA, Opmeer BC, Coppus SF, van Geloven N, den Hartog JE, Land JA, van der Linden PJ, Ng EH, van der Steeg JW, Steures P, van der Veen F, Mol BW. Integration of patient characteristics and the results of Chlamydia antibody testing and hysterosalpingography in the diagnosis of tubal pathology: an individual patient data meta-analysis. Hum Reprod, 2012; 27;2979-2790.
Chapron C, Querleu D, Bruhat MA, Madelenat P, Fernandez H, Pierre F, Dubuisson JB. Surgical complications of diagnostic and operative gynaecological laparoscopy: a series of 29,966 cases. Hum Reprod 1998; 13:867-72.
Coppus SF, Opmeer BC, van der Veen F, Bossuyt PM, Mol BW. Routine use of hysterosalpingography prior to diagnostic laparoscopy in the fertility workup. Hum Reprod 2006; 21:2726-2727.
Den Hartog JE, Morré SA, Land JA. Chlamydia trachomatis-associated tubal factor subfertility: Immunogenetic aspects and serological screening. Hum Reprod Update 2006; 12:719-730.
Den Hartog JE, Lardenoije CM, Severens JL, Land JA, Evers JL, Kessels AG. Screening strategies for tubal factor subfertility. Hum Reprod 2008; 23:1840-1848.
Donnez J, Jadoul P. Taking a history in the evaluation of infertility: obsolete or venerable tradition? Fertil Steril 2004; 81:16-17.
Evers J. Female subfertility. Lancet 2002; 360: 151-159.
Farhi J, Ben-Haroush A, Lande Y, Fisch B. Role of treatment with ovarian stimulation and intrauterine insemination in women with unilateral tubal occlusion by hysterosalpingography. Fertil Steril 2007; 88:396-400.
Farhi J, Homburg R, Ben-Haroush A. Male factor infertility may be associated with a lower risk for tubal abnormalities. Reprod Biomed Online 2011; 22:335-340.
Forman RG, Robinson JN, Mehta Z, Barlow DH. Patient history as a simple predictor of pelvic pathology in subfertile women. Hum Reprod 1993; 8:53-55.
Glatstein IZ, Harlow BL, Hornstein MD. Practice patterns among reproductive endocrinologists: the infertility evaluation. Fertil Steril 1997; 67:443-51.
Gordts S, Campo R, Rombauts L, Brosens I. Transvaginal hydrolaparoscopy as an outpatient procedure for infertility investigation. Hum Reprod 1998; 13:99-103.
H2Oil study: www.studies-obsgyn.nl/h2olie
23138 Coppus.indd 141 28-09-12 14:55
Chapter 8
142
Härkki-Siren P, Sjöberg J, Kurki T. Major complications of laparoscopy: a follow-up Finnish study. Obstet Gynecol 1999; 94:94-8.
Hubacher D, Grimes D, Lara-Ricalde R, de la Jara J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor infertility. Fertil Steril 2004; 81:6-10.
Jansen FW, Kapiteyn K, Trimbos-Kemper T, Hermans J, Trimbos JB. Complications of laparoscopy: a prospective multicentre observational study. BJOG 1997; 104:595-600.
Johnson NP, Taylor K, Nadgir AA, Chinn DJ, Taylor PJ. Can diagnostic laparoscopy be avoided in routine investigation for infertility? BJOG 2000; 107:174-178.
Land JA, Evers JL, Goossens VJ. How to use Chlamydia antibody testing in subfertility patients. Hum Reprod 1998; 13:1094-1098.
Lavy Y, Lev-Sagie A, Holtzer H, Revel A, Hurwitz A. Should laparoscopy be a mandatory component of the infertility evaluation in infertile women with normal hysterosalpingogram or suspected unilateral distal tubal pathology? Eur J Obstet Gynecol Reprod Biol 2004; 114:64-68.
Lim CP, Hasafa Z, Bhattacharya S, Maheshwari A. Should a hysterosalpingogram be a first-line investigation to diagnose female tubal subfertility in the modern fertility workup? Hum Reprod 2011; 26:967-971.
Lindborg L, Thorburn J, Bergh C, Strandell A. Influence of HyCoSy on spontaneous pregnancy: a randomized controlled trial. Hum Reprod 2009; 24: 1075-1079.
Luttjeboer FY, Verhoeve HR, van Dessel HJ, van der Veen F, Mol BW, Coppus SF. The value of medical history taking as risk indicator for tuboperitoneal pathology: a systematic review. BJOG 2009; 116:612-625.
Luttjeboer F, Harada T, Hughes E, Johnson N, Lilford R, Mol BW. Tubal flushing for subfertility. Cochrane Database Syst Rev 2007; 18: CD003718.
McComb PF. Clinical history is the essence of medicine. Fertil Steril 2004; 81:13-15.
Mol BW, Dijkman B, Wertheim P, Lijmer J, van der Veen F, Bossuyt PM. The accuracy of serum chlamydial antibodies in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1997; 67:1031-7.
Perquin DA, Dörr PJ, de Craen AJ, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomized controlled trial. Hum Reprod 2006; 21:1227-1231.
Schlief R, Deichert U. Hysterosalpingo-contrast sonography of the uterus and fallopian tubes: results of a clinical trial of a new contrast medium in 120 patients. Radiology 1991; 178:213-5.
Steures P, van der Steeg JW, Mol BW, Eijkemans MJ, van der Veen F, Habbema JD, Hompes PG, Bossuyt PM, Verhoeve HR, van Kasteren YM, van Dop PA. Prediction of an ongoing pregnancy after intrauterine insemination. Fertil Steril 2004; 82:45-51.
Tanahatoe SJ, Hompes PG, Lambalk CB. Investigation of the infertile couple: should diagnostic laparoscopy be performed in the infertility work up programme in patients undergoing intrauterine insemination? Hum Reprod 2003; 18:8-11.
Tanahatoe SJ, Lambalk CB, Hompes PG. The role of laparoscopy in intrauterine insemination: a prospective randomized reallocation study. Hum Reprod 2005; 20:3225-3230.
Thomas DK, Mannion P, Coughlin L. Can diagnostic laparoscopy be avoided in routine investigation for infertility? BJOG 2001; 108:127-128.
Tulandi T, Platt R. The art of taking a history. Fertil Steril 2004; 81:11-12.
23138 Coppus.indd 142 28-09-12 14:55
23138 Coppus.indd 143 28-09-12 14:55
23138 Coppus.indd 144 28-09-12 14:55
9Summary
23138 Coppus.indd 145 28-09-12 14:55
Chapter 9
146
Subfertility is customarily defined as failure to conceive after regular unprotected
sexual intercourse for one year. The prevalence of subfertility is around 14%,
affecting one in seven couples. It has been estimated that in around 10-30% of
these couples, subfertility is due to tubal pathology. These tubal abnormalities
include tubal obstruction and pelvic adhesions which can be due to previous
infection, endometriosis, or previous surgery. To rule in or rule out this pathology, the
fertility work-up in subfertile couples is usually concluded with tubal patency testing.
Currently, laparoscopy is considered to be the best available test for diagnosing tubal
abnormalities, but it has several drawbacks. First, it is an invasive surgical procedure
that requires general anaesthesia, both of which carry associated risks. Secondly, as
an expensive investigation that requires operating time and dedicated personnel,
its availability is not unlimited. Therefore, triage is needed to limit the number of
unnecessary laparoscopies, while maintaining a high diagnostic yield. Nationally and
internationally, a large practice variation exists in the timing of testing, the test used,
and the basis on which women are selected for tubal testing. While some guidelines
advocate medical history taking to triage women for tubal patency testing, others
advocate Chlamydia antibody testing, or testing in all women. At the start of this
project, test research on which these guideline are based on, had been conducted
in isolation from its clinical context.
The work presented in the studies in this thesis started with a focus on
methodological issues in diagnostic and prognostic research in general. Hereafter,
we researched multivariable diagnostic models, to estimate pretest probabilities
for tubal pathology that can help to distinguish women with a high and low
probability of (severe) tubal disease at first consultation, and explored the
added value of Chlamydia antibody testing to medical history taking. Finally, we
investigated the prognostic significance of various tubal pathology tests like HSG,
laparoscopy and CAT testing.
Chapter 1 gives an outline and describes the objectives of this thesis.
Chapter 2 describes the results of a study in which we evaluated the impact of
publication of the Standards for Reporting of Diagnostic Accuracy Studies (STARD)
statement on the quality of reporting of test accuracy studies in two leading
journals in the field of reproductive medicine. 24 articles published in 1999 (pre-
STARD) and 27 published in 2004 (post-STARD) that reported on the diagnostic
23138 Coppus.indd 146 28-09-12 14:55
Summary
147
9
performance of a test in comparison with a reference standard were included.
Each article was scored on a 25 items checklist, dealing with the study question,
study participants, study design, test methods, reference standard, statistical
methods, reporting of results and conclusions. The mean number of reported
STARD items for articles published in 1999 was 12.1 ± 3.3 (range 6.5-20) and 12.4
± 3.2 (range 7-17.5) in 2004, after publication of the STARD statement. Overall, less
than half of the studies reported adequately on 50% or more of the STARD items.
The reporting of individual items showed a wide variation. There was no significant
improvement in mean number of reported items for the articles published after
the introduction of the STARD statement. From these data we concluded that
authors of test accuracy studies in the two leading fertility journals poorly reported
the design, conduct, methodology and statistical analysis of their study, which
hampers the ability to interpret these studies on their scientific soundness. Strict
adherence to the STARD guidelines should therefore be encouraged.
Chapter 3 presents an opinion article on methodological considerations when
evaluating prediction models in reproductive medicine. Such models are used
to to calculate the probability of pregnancy without treatment, as well as the
probability of pregnancy after ovulation induction, intrauterine insemination or
in vitro fertilization. The performance of such prediction models is often evaluated
with a receiver operating characteristic (ROC) curve. The area under the ROC
curve, also known as c-statistic, is then used as a measure of model performance.
However, the value of this c-statistic is low for most prediction models in
reproductive medicine. We demonstrate that low values of the c-statistic are to be
expected in these prediction models, but we also show that this does not imply
that these models are of limited use in clinical practice. Arguments are given why
the calibration of the model which is the correspondence between model-based
probabilities and observed pregnancy rates, the availability of a clinically useful
distribution of probabilities, the ability of a model to correctly classify couples for
treatment decisions, and reclassification indices are more meaningful concepts for
model evaluation.
Chapter 4 presents an exploratory investigation on whether information
obtained from medical history taking can be quantified in a probabilistic model.
We also explored whether CAT testing has additional value to medical history
taking, given differences in prior probabilities. To answer these questions, data
23138 Coppus.indd 147 28-09-12 14:55
Chapter 9
148
of 207 consecutive subfertile women were used to create multivariable logistic
regression models for the prediction of tubal disease as diagnosed by diagnostic
laparoscopy. The model with data of medical history only had an area under the
receiver operating characteristic curve (AUC) of 0.65 (95% CI 0.56-0.74). Addition of
CAT increased the AUC to 0.70 (95% CI 0.62-0.78) (P=0.065). CAT was positive in 40
women and showed a sensitivity of 0.37 (95% CI 0.26-0.49) for a specificity of 0.88
(95% CI 0.82-0.93). In CAT positive women a blank medical history did not decrease
the probability of tubal disease. Of the 167 women tested CAT negative, 23 (14%)
still had a high probability of disease due to their medical history and 11 of them
(48%) showed tubal abnormalities on diagnostic laparoscopy. From these data we
concluded that CAT testing adds valuable information to a woman’s risk profile
based on her medical history. The combination of medical history taking and CAT
testing has a better yield for diagnosing tubal disease than either of these alone.
Chapter 5 investigates the ability of medical history taking alone to predict tubal
pathology at first consultation of the subfertile couple, as the ultimate aim of tubal
testing is to identify women with bilateral tubal pathology in a timely manner,
so they can be treated with IVF or tubal surgery. For this purpose, data of 3716
women who underwent tubal patency testing as part of their routine fertility
workup, and participated in a nationwide cohort study were used. Elements
from their medical history were related to the presence of tubal pathology. With
multivariable logistic regression, we constructed two diagnostic models. In the first
model tubal disease was defined as occlusion and/or severe adhesions of at least
one tube, whereas in a second model tubal disease was defined as the presence
of bilateral abnormalities. Both models discriminated moderately well between
women with and women without tubal disease with an area under the receiver-
operating characteristic curve (AUC) of 0.65 (95% CI: 0.63-0.68) for any tubal
pathology and 0.68 (95% CI: 0.65-0.71) for bilateral tubal pathology, respectively.
However, the models could make an almost perfect distinction between women
with a high and a low probability of tubal pathology. A decision rule in the form
of a simple diagnostic score chart was developed for application of the models in
clinical practice. This decision rule could be used to select women for diagnostic
laparoscopy more efficiently.
Chapter 6 explores the prognostic capacity of hysterosalpingography and
laparoscopy in a general subfertile population. From data that were prospectively
23138 Coppus.indd 148 28-09-12 14:55
Summary
149
9
collected in 38 fertility centres, we selected those women who underwent
HSG and/or laparoscopy as part of their subfertility work-up. Follow-up started
immediately after tubal testing and follow-up was ended 12 months thereafter.
We correlated findings at HSG or laparoscopy with time to pregnancy. Follow-
up was censored at the date of last contact, when the woman was not pregnant
or at the start of fertility treatment. Kaplan-Meier curves for the occurrence of
spontaneous intrauterine pregnancy were constructed for women without tubal
pathology, for those with unilateral tubal pathology and for women with bilateral
tubal pathology at HSG or laparoscopy. With multivariable Cox regression analysis
the association between tubal pathology and the occurrence of an intrauterine
pregnancy was expressed in fecundity rate ratios (FRRs). In total, 3301 women
were included, of whom 2043 underwent HSG only, 747 underwent diagnostic
laparoscopy only and 511 underwent both. At HSG, 322 (13%) women showed
unilateral tubal pathology and 135 (5%) showed bilateral tubal pathology. At
laparoscopy, 167 (13%) showed unilateral tubal pathology and 215 (17%) showed
bilateral tubal pathology. Multivariable analysis resulted in FRRs of 0.81 [95%
confidence interval (CI): 0.59-1.1] for unilateral, and 0.28 (95% CI: 0.13-0.59) for
bilateral, tubal pathology at HSG. The FRRs at laparoscopy were 0.85 (95% CI: 0.47-
1.52) for unilateral, and 0.24 (95%CI: 0.11-0.54) for bilateral, tubal pathology. From
this study it was concluded that women with unilateral tubal pathology at HSG or
laparoscopy had moderate, non-significant reduction in spontaneous pregnancy
chances, whereas those with bilateral tubal pathology at HSG or laparoscopy had a
severe reduction in the ability to achieve spontaneous conception. This reduction
in fertility prospects was similar for HSG and laparoscopy, suggesting that HSG and
laparoscopy have a comparable predictive capacity for natural conception.
Chapter 7 investigates the prognostic significance of positive Chlamydia
trachomatis antibodies for spontaneous pregnancy once tubal pathology has been
ruled out. We studied ovulatory women in whom HSG or laparoscopy showed
patent tubes. Women were tested for C. trachomatis immunoglobulin G (IgG)
antibodies with either micro-immunofluorescence (MIF) or an ELISA. CAT serology
was positive if the MIF titre was ≥ 1:32 or if the ELISA index was > 1.1. The proportion
of couples pregnant without treatment was estimated at 12 months of follow-up.
Time to pregnancy was considered censored at the date of the last contact when
the woman was not pregnant or at the start of treatment. The association between
CAT positivity and an ongoing pregnancy was evaluated with Cox regression
23138 Coppus.indd 149 28-09-12 14:55
Chapter 9
150
analyses. Of the 1882 included women without visible tubal pathology, 338 (18%)
had a treatment- independent pregnancy within 1 year [estimated cumulative
pregnancy rate 31%; 95% confidence interval (CI): 27%-35%]. Because of violation
of Cox regression analysis assumptions after 9 months of follow-up, which could
not be explained on sound clinical nor statistical grounds, regression analyses
were limited to the first 9 months after tubal testing. Nevertheless, positive C.
trachomatis IgG serology was associated with a statistically significant 33% lower
probability of an ongoing pregnancy [adjusted fecundity rate ratio 0.66 (95% CI
0.49 to 0.89)]. We concluded that CAT testing does not only have diagnostic value,
but also prognostic value, even after HSG or laparoscopy has shown no visible
tubal pathology, as subfertile women with a positive CAT have lower pregnancy
chances than CAT negative women. After external validation, this finding could be
incorporated into existing prognostic models.
Chapter 8 presents a discussion of the findings of this thesis, clinical implications
and future research recommendations.
Chapter 10 provides an epilogue in which the results of three different theses on
tubal pathology from our study group are integrated.
23138 Coppus.indd 150 28-09-12 14:55
23138 Coppus.indd 151 28-09-12 14:55
23138 Coppus.indd 152 28-09-12 14:55
9Samenvatting
23138 Coppus.indd 153 28-09-12 14:55
Chapter 9
154
Subfertiliteit wordt gewoonlijk gedefinieerd als het uitblijven van zwangerschap na
regelmatige onbeschermde samenleving gedurende 12 maanden. De prevalentie
van subfertiliteit bedraagt rond de 14%, ofwel 1 op de 7 koppels dat probeert
zwanger te worden zal ermee geconfronteerd worden. Er wordt verondersteld
dat bij 10-30% van deze stellen de subfertiliteit het gevolg is van tubapathologie.
Eerdere infecties, endometriose, of chirurgische ingrepen in het verleden kunnen
geleid hebben tot occlusie van de tuba of adhesies. Om deze afwijkingen aan te
tonen danwel uit te sluiten ondergaan subfertiele paren vaak tubadiagnostiek als
afronding van het fertiliteitonderzoek. Momenteel wordt de laparoscopie met het
testen van de doorgankelijkheid van de tubae gezien als de meest informatieve
vorm van tubadiagnostiek, maar laparoscopie heeft verschillende nadelen.
Allereerst is het een invasieve chirurgische procedure onder algehele anesthesie,
met bijbehorende risico’s. Ten tweede is de beschikbaarheid niet ongelimiteerd, als
gevolg van hoge kosten, benodigde operatietijd en personeel. Daarom is gezocht
naar manieren om juist die vrouwen waarbij men ook verwacht afwijkingen te
vinden een laparoscopie te laten ondergaan, en tegelijkertijd het aantal ‘negatieve’
laparoscopieën te verminderen.
Nationaal en internationaal bestaat een grote praktijkvariatie in het type tubatest,
het tijdstip waarop de tubatest verricht wordt, en indicatie. Sommige richtlijnen
bevelen een zorgvuldige anamnese aan om vrouwen voor tubadiagnostiek te
selecteren, andere richtlijnen pleiten voor het gebruik van de Chlamydia antistof
test, terwijl weer andere richtlijnen tubadiagnostiek bij alle subfertiele vrouwen
adviseren. Een probleem bij de interpretatie van het onderzoek dat tot op heden
gedaan is, is dat het meeste test evaluatie onderzoek geïsoleerd van de klinische
context verricht is. Voor het onderzoek beschreven in dit proefschrift werd
gestart met het ophelderen van methodologische principes in diagnostisch en
prognostisch onderzoek. Hierna werden multivariabele diagnostische modellen
ontwikkeld, om voorafkansen op tubapathologie te genereren die kunnen helpen
vrouwen met een hoge kans op (ernstige) tubapathologie te onderscheiden van
vrouwen met een lage kans. Verder werd onderzocht wat de toegevoegde waarde
van de Chlamydia antistof test is bovenop hetgeen al bekend is uit zorgvuldig
afnemen van de anamnese. Ten slotte werd de prognostische waarde voor het
voorspellen van een spontane zwangerschap onderzocht voor de meest gebruikte
tubatesten testen zoals HSG, laparoscopie en de Chlamydia antistof test.
23138 Coppus.indd 154 28-09-12 14:55
Samenvatting
155
9
Hoofdstuk 1 beschrijft de achtergrond en doelstellingen van dit proefschrift.
Hoofdstuk 2 beschrijft de resultaten van een onderzoek waarin de gevolgen van
het Standards for Reporting of Diagnostic Accuracy Studies (STARD) statement op
de kwaliteit van rapportage van test accuratesse onderzoek bekeken werd. Hiervoor
werden in twee toptijdschriften op het gebied van de voortplantingsgeneeskunde
24 artikelen uit 1999 (pre-STARD), en 27 artikelen uit 2004 (post-STARD) nader
onderzocht. Studies die rapporteerden over de waarde van een diagnostische
test vergeleken met een referentiestandaard werden geïncludeerd. Elk artikel
werd gescoord aan de hand van een checklist bestaand uit 25 onderdelen,
waarin zaken als de studie vraag, samenstelling van de studiepopulatie, opzet
van de studie, testmethoden, referentiestandaard, statistiek, en rapportage van
resultaten en conclusie aan de orde komen. Het gemiddeld aantal gerapporteerde
STARD onderdelen was 12.1 ± 3.3 (spreiding 6.5-20) voor artikelen uit 1999 (pre-
STARD), en 12.4 ± 3.2 (spreiding 7-17.5) voor artikelen uit 2004 (post-STARD). In
totaal werd in minder dan de helft van de artikelen 50% of meer van de STARD
onderdelen adequaat gerapporteerd, met een ruime spreiding in de rapportage
van individuele onderdelen. Er werd geen significante verbetering gezien in het
gemiddeld aantal gerapporteerde zaken in de artikelen gepubliceerd na invoering
van het STARD statement. Wij concludeerden derhalve dat auteurs van test
accuratesse onderzoek in de twee vooraanstaande tijdschriften op het gebied van
de voortplantingsgeneeskunde heel matig hun opzet, uitvoering, methodologie
en statistische analyse van hun onderzoek rapporteren. Dit belemmert de
mogelijkheid om deze studies goed op hun waarde te kunnen beoordelen. Strikte
naleving van de STARD criteria voor publicatie van een test accuratesse onderzoek
dient daarom sterk aangemoedigd te worden.
Hoofdstuk 3 beschrijft een opiniërend artikel over de methodologische
overwegingen wanneer prognostische modellen binnen de voortplantings-
geneeskunde geëvalueerd worden. Vaak worden dergelijke modellen gebruikt
om de kans op spontane zwangerschap te berekenen, maar ook om de kans
op zwangerschap na ovulatie inductie, intra-uteriene inseminatie of na in vitro
fertilisatie uit te drukken. Het prestatievermogen van zulke prognostische
predictiemodellen wordt frequent geëvalueerd met een receiver operating
characteristic (ROC) curve. Het oppervlak onder de ROC curve, ook wel c-statistic
indien tijd tot zwangerschap gemodelleerd wordt, wordt dan gebruikt als maat
23138 Coppus.indd 155 28-09-12 14:55
Chapter 9
156
voor model prestatie. Echter, voor de meeste predictiemodellen binnen de
voortplantingsgeneeskunde is deze c-statistic waarde laag. In dit hoofdstuk
wordt aangetoond dat lage waarden van de c-statistic verwacht kunnen worden
in dergelijke predictiemodellen, maar dat dit niet betekent dat deze modellen
onbruikbaar zijn in de dagelijkse klinische praktijk. Beargumenteerd wordt
waarom de calibratie van een model (de overeenkomst tussen voorspelde kansen
afkomstig van het model en de geobserveerde zwangerschapscijfers), een klinisch
bruikbare verdeling van kansen, het vermogen van een model om koppels in te
delen in categorieën die buikbaar zijn om behandelbeslissingen te nemen, en
reclassificatie indices belangrijkere begrippen zijn voor evaluatie van modellen.
Hoofdstuk 4 beslaat een verkennend onderzoek naar de mogelijkheid om door
middel van een probabilistisch model, gebaseerd op onderdelen uit de anamnese,
een kansschatting op tubapathologie te maken. Daarnaast werd onderzocht of
testen op aanwezigheid van Chlamydia trachomatis antistoffen (CAT) toegevoegde
waarde heeft, wanneer de verschillen in kans op tubapathologie (gebaseerd op de
anamnese), meegenomen wordt. Om deze vragen te beantwoorden werden data
van 207 opeenvolgende subfertiele vrouwen gebruikt voor het construeren van
multivariabele logistische regressie modellen, die tubapathologie - vastgesteld
door middel van laparoscopie - kunnen voorspellen. Het model gebaseerd op
alleen factoren uit de anamnese toonde een oppervlak onder de ROC curve
(AUC) van 0.65 (95% BI 0.56-0.74). Toevoegen van CAT aan het anamnese model
verhoogde de AUC tot 0.70 (95% BI 0.62-0.78) [p=0.065]. CAT was positief in 40
vrouwen, met een sensitiviteit van 0.37 (95% CI 0.26-0.49) en een specificiteit van
0.88 (95% CI 0.82-0.93). In CAT positieve vrouwen verlaagde een blanco anamnese
de kans op tubapathologie niet. In de 167 CAT negatieve vrouwen, waren er 23
(14%) die ondanks afwezigheid van Chlamydia antistoffen toch een hoge kans op
tubapathologie hadden op basis van hun anamnese. Bij 11 van deze 23 vrouwen
(48%) werden inderdaad afwijkingen bij de laparoscopie gezien. Op basis van
deze gegevens kan geconcludeerd worden dat CAT een toegevoegde waarde
heeft aan het risico profiel van de vrouw gebaseerd op de anamnese, en dat een
combinatie van anamnese en CAT een betere diagnostische waarde heeft dan een
van beide alleen.
Hoofdstuk 5 onderzoekt in een veel groter cohort de waarde van de anamnese
om bij het eerste consult een inschatting van de kans op tubapathologie te
23138 Coppus.indd 156 28-09-12 14:55
Samenvatting
157
9
kunnen maken. Het uiteindelijke doel van tubadiagnostiek is namelijk vrouwen
met dubbelzijdige tubapathologie - met een vrijwel uitzichtloze kans op
spontane zwangerschap - te identificeren, zodat zij met in-vitro-fertilisatie
(IVF) of tubachirurgie behandeld kunnen worden. Om de onderzoeksvraag te
beantwoorden, gebruikten we de gegevens van 3716 vrouwen die een tubatest
ondergingen binnen een nationaal cohort onderzoek naar de waarde van de
testen uit het oriënterend fertiliteitonderzoek. Elementen uit de anamnese werden
gerelateerd aan tubapathologie. Met behulp van multivariabele logistische
regressie analyses werden twee diagnostische modellen geconstrueerd. In het
eerste model werd tubapathologie gedefinieerd als occlusie en/of ernstige
adhesies van tenminste één tuba, terwijl in het tweede model tubapathologie
werd gedefinieerd als bilaterale afwijkingen. Beide modellen konden matig goed
vrouwen met en vrouwen zonder tubapathologie onderscheiden, met AUC’s van
0.65 (95% BI 0.63-0.68) voor elke vorm van tubapathologie, en 0.68 (95% BI 0.65-
0.71) voor bilaterale tubapathologie. De modellen konden echter bijna perfect de
vrouwen met een heel hoge kans op afwijkingen onderscheiden van de vrouwen
met een lage kans op afwijkingen. Een simpele diagnostische score kaart werd
ontwikkeld om beide modellen in de dagelijkse praktijk eenvoudig te kunnen
gebruiken. Enkele rekenvoorbeelden laten zien dat een meer mathematisch
gebruik van de anamnese zou kunnen leiden tot een efficiëntere inzet van de
beperkt beschikbare diagnostische laparoscopie.
Hoofdstuk 6 beschrijft de prognostische waarde van hysterosalpingografie
en laparoscopie voor het optreden van een spontane zwangerschap in een
doorsnee subfertiele populatie. Uit gegevens die prospectief verzameld waren in
38 fertiliteitcentra werden die vrouwen geselecteerd die als onderdeel van het
fertiliteitonderzoek een HSG danwel een laparoscopie ondergingen. Vrouwen
werden tot 12 maanden na het moment van tubadiagnostiek opgevolgd. Follow-
up gegevens werden gecensureerd op het moment van het laatste patiëntcontact,
indien een vrouw niet zwanger werd binnen de follow-up periode, of op het
moment dat een fertiliteitbehandeling gestart werd. De bevindingen van het HSG
of de laparoscopie werden gecorreleerd aan de tijd tot spontane zwangerschap.
Kaplan-Meier curves werden gemaakt voor vrouwen zonder afwijkingen, voor
vrouwen met unilaterale tubapathologie en voor vrouwen met dubbelzijdige
afwijkingen bij het HSG of de laparoscopie. Met multivariabele Cox regressie analyse
werd de associatie tussen tubapathologie en het optreden van een zwangerschap
23138 Coppus.indd 157 28-09-12 14:55
Chapter 9
158
gemodelleerd. In totaal konden 3301 vrouwen geïncludeerd worden. Hiervan
ondergingen 2042 vrouwen een HSG, 747 een diagnostische laparoscopie en 511
vrouwen zowel een HSG als laparoscopie. Het HSG was unilateraal afwijkend in 322
vrouwen (13%), en bilateraal afwijkend bij 135 vrouwen (5%). Bij de laparoscopie
was in 167 vrouwen (13%) sprake van unilaterale pathologie, en in 215 vrouwen
(17%) bilaterale pathologie. Multivariabele analyses resulteerden in een fertiliteit
ratio (FR) van 0.81 (95% BI 0.59-1.1) voor unilaterale, en 0.28 (95% BI 0.13-0.59)
voor bilaterale tubapathologie op het HSG. Voor laparoscopie bedroeg de HR
0.85 (95% BI 0.47-1.52) voor unilaterale, en 0.24 (95% BI 0.11-0.5) voor bilaterale
tubapathologie. Op basis van dit onderzoek kan geconcludeerd worden dat
vrouwen met eenzijdige tuba afwijkingen op HSG en laparoscopie een matige, niet
significante verlaagde spontane zwangerschapskans hebben, terwijl een bilateraal
afwijkend HSG of laparoscopie de kans op spontane zwangerschap in hoge mate
verminderd. Deze verlaagde fertiliteit prognose was van gelijke grootte voor het
HSG en de laparoscopie, hetgeen inhoudt dat beide testen een vergelijkbare
prognostische waarde voor het optreden van spontane zwangerschap hebben.
Hoofdstuk 7 beschrijft een onderzoek naar de prognostische waarde van
positieve Chlamydia trachomatis antistoffen op het optreden van een spontane
zwangerschap als invasieve tubadiagnostiek geen afwijkingen heeft laten zien.
Hiervoor werd een cohort ovulatoire vrouwen, bij wie een HSG of laparoscopie
doorgankelijke tubae toonde, en die getest waren op de aanwezigheid
van C. trachomatis immunoglobuline (IgG) antistoffen met hetzij micro-
immunofluorescentie technieken (MIF) danwel een ELISA test, onderzocht. CAT
serologie werd als positief beschouwd indien de MIF titer ≥ 1:32 was of indien
de ELISA index > 1.1 bedroeg. De spontane zwangerschapskans na 12 maanden
follow-up werd berekend, waarbij tijd tot zwangerschap gecensureerd werd
op het moment van het laatste patiëntcontact, indien een vrouw niet zwanger
werd binnen deze periode, of op het moment dat een fertiliteitbehandeling
gestart werd. De associatie tussen positieve CAT en doorgaande zwangerschap
werd gemodelleerd met Cox regressie analyses. Van de 1882 vrouwen met
doorgankelijke tubae, werden er 338 (18%) spontaan zwanger binnen het jaar
(geschatte cumulatieve zwangerschapskans 31%; 95% BI 27%-35%). Omdat de
aannames voor het toepassen van Cox regressie analyses geschonden werden
na 9 maanden follow-up, wat noch op basis van klinische, noch op basis van
statistische gronden verklaard kon worden, moesten de regressie analyses tot 9
23138 Coppus.indd 158 28-09-12 14:55
Samenvatting
159
9
maanden follow-up duur beperkt worden. Toch waren positieve C. trachomatis
IgG antistoffen duidelijk en statistisch significant geassocieerd met een 33%
lagere kans op doorgaande zwangerschap (gecorrigeerde fertiliteit ratio 0.66
(95% BI 0.49-0.89). Wij concluderen daarom dat CAT in het fertiliteitonderzoek
niet alleen diagnostisch, maar ook prognostisch relevant is, zelfs nadat HSG of
laparoscopie evidente tubapathologie heeft uitgesloten. Subfertiele vrouwen
met een positieve CAT maar met normale tubae hebben een duidelijk lagere kans
op spontane conceptie dan de CAT negatieve vrouwen bij wie tubapathologie
uitgesloten is. Na externe validatie in een andere studie, zou deze nieuwe
bevinding toegevoegd kunnen worden aan bestaande prognostische modellen
voor spontane zwangerschap.
Hoofdstuk 8 tenslotte geeft een algemene bespreking van de resultaten uit
dit proefschrift, beschrijft de klinische implicaties en bevat aanbevelingen voor
toekomstig onderzoek.
Hoofdstuk 10 bestaat uit een epiloog, in welke de resultaten van drie proefschriften
uit onze tubapathologie onderzoeksgroep geïntegreerd worden.
23138 Coppus.indd 159 28-09-12 14:55
23138 Coppus.indd 160 28-09-12 14:55
10Epilogue
23138 Coppus.indd 161 28-09-12 14:55
Chapter 10
162
With an estimated prevalence between 11 and 30% in subfertile populations,
tubal pathology is an important cause for subfertility (Hull et al., 1985, Collins et al.,
1995, Snick et al., 1997). The American Society of Reproductive Medicine (ASRM)
recommends a careful medical history and physical examination to identify
symptoms and signs suggesting a specific cause for subfertility, which can be
the focus of subsequent diagnostic evaluation. The National Institute for Clinical
Excellence (NICE) guideline advises the use of patient characteristics to decide
whether tubal testing should be performed and women without comorbidities
should be offered HSG. The guideline of the Dutch Society for Obstetrics and
Gynaecology (NVOG) mentions the use of patient characteristics as a first step in
the diagnostic strategy (ASRM, 2006; NICE, 2004; NVOG, 2004).
Although all these guidelines recommend medical history and physical
examination as a primary evaluation in the fertility work-up, a clear evidence
based recommendation in whom, when and which tubal patency tests should
be performed is not provided. As a consequence there is wide variation in clinical
practice concerning the use of tubal patency tests. A diagnostic strategy in which
all available information from the medical history and physical examination is
integrated with the results of tubal patency tests could potentially lead to more
cost-effective testing of tubal pathology.
We have performed and published several studies on patient characteristics and
these diagnostic tubal tests since the publication of these guidelines. In one study
we developed decision rules to express the probability of tubal pathology at the
first consultation based on patient characteristics only (Coppus et al., 2007). In
another study we showed that the addition of Chlamydia trachomatis Antibody
Test (CAT) to a diagnostic model based on patient characteristics increased the
AUC for the diagnosis of any tubal pathology from 0.65 to 0.70, although not
significantly (Coppus et al., 2007).
In a separate IPD-analysis, it was shown that from three commonly used Chlamydia
Antibody Tests, the Micro Immuno Fluorescence (MIF) test, showed a moderate
ability to discriminate between women with and without tubal pathology, but
performed best of the three CAT tests (Broeze et al., 2011). Additional testing for
a high-sensitive CRP (hs-CRP), a possible marker for persistence of a Chlamydia
trachomatis infection, to the CAT, increased the diagnostic accuracy of CAT, but
23138 Coppus.indd 162 28-09-12 14:55
Epilogue
163
10
the result of that study requires confirmation before it is implemented in clinical
practice (Den Hartog et al. 2008).
In another study we provided decision rules in which information from medical
history and physical examination were combined with the results of tubal testing in
order to calculate the predicted probability of tubal pathology (Broeze et al., 2012).
The combination of patient characteristics with CAT and HSG results provided the
best diagnostic performance for the diagnosis of bilateral tubal pathology.
We also showed that the diagnostic performance of HSG is invariant over several
subgroups of patients, suggesting that HSG is able to diagnose both any and bilateral
tubal pathology equally in all subfertile women and is a useful screening test for all
subfertile women. Of note is that in women at low risk for tubal pathology (i.e. no
risk indicators in the history and a negative CAT result), the sensitivity of HSG was
low, but the specificity remained stable (Broeze et al., 2011). This is most likely due to
false positive results at laparoscopy, which is the standard reference test in diagnostic
studies on tubal patency. In women at low-risk for tubal pathology and a normal HSG,
laparoscopy can show abnormalities which, we think, are often caused by technical
problems at laparoscopy. These technical problems can consist of vaginal leakage
of dye, low pressure at chromopertubation, premature ending of the procedure,
difference in flow when one tube is patent or invisibility of the fimbrial ends.
In another study we found that HSG and laparoscopy show comparable
performance in predicting natural conception, indicating that from that
perspective there is no preference for one of these tests (Verhoeve et al., 2011).
One randomised trial showed no additional advantage of diagnostic laparoscopy if
this was performed following a normal HSG, on treatment decision and pregnancy
outcome (Tanahatoe et al., 2005) and, in another randomised trial, the number of
diagnostic laparoscopies was substantially reduced if diagnostic laparoscopy was
preceded by HSG (Perquin et al., 2006).
Combining the results of these studies, it can be concluded that medical history
and physical examination can differentiate between women at low and at high risk
for tubal pathology. Identification of those women at highest risk for bilateral tubal
pathology, who have the lowest chances for natural conception, is best obtained
by combining patient characteristics with CAT and HSG.
23138 Coppus.indd 163 28-09-12 14:55
Chapter 10
164
In a cost-effective analysis, different diagnostic strategies for presence of tubal
pathology were assessed. In this study, patient characteristics were taken into
account and obtained from IPD-analyses (Broeze et al., 2012), the prognostic
model of Hunault for unexplained subfertility was used (Hunault et al., 2004)
as well as a prognostic model for pregnancy outcome after IVF treatment in
a Dutch cohort (Lintsen et al., 2007). This study showed that no diagnostic test
and expectant management is the most cost-effective strategy until the age of
38 years, and no diagnostic test but direct treatment from the age of 39 years.
If, however, a diagnostic tubal test is planned, a strategy of first HSG followed by
diagnostic laparoscopy, where HSG shows bilateral tubal pathology, followed by
management depending on the test result, is the most cost-effective strategy
(Verhoeve et al., 2012; submitted).
We suggest the following for tubal patency tests in the fertility work-up; in women
until 38 years and at low risk for tubal pathology based on medical history, physical
examination and CAT result, expectant management and no diagnostic test for at
least 12 months is justified and will reduce the number of unnecessary invasive
diagnostic tests, complications and cost. An HSG followed by laparoscopy, if HSG
shows bilateral occlusion, should be considered, if conception does not occur
after expectant management and if a couple prefers fertility treatment other
than IVF. In women with bilateral distal occlusion, HSG can be helpful to decide
whether laparoscopic salpingostomy is preferable above or before IVF, although
randomised evidence for this is lacking. In women 39 and older, direct treatment
is the most cost-effective scenario, irrespective of the medical history, physical
examination and CAT result. It is not to be expected that every couple is prepared
to start directly with IVF treatment. The second best strategy is then to prove tubal
patency by HSG and if the tubes are found to be open, couples can be counselled
to choose between expectant management, intra uterine insemination and IVF,
obviously taking the prognosis for natural conception into account (Hunault et
al., 2004; Steures et al., 2006). In some women sonographically visible bilateral
hydrosalpinges may be detected before tubal testing. In these women direct
laparoscopy is advised and can be combined at the same time with salpingectomy
or laparoscopic tubal occlusion, since it has been shown that this improves IVF-
outcome (Johnson et al., 2010).
23138 Coppus.indd 164 28-09-12 14:55
Epilogue
165
10
Our recommendation and findings can serve as a framework for a new evidence
based guideline for the diagnosis of tubal pathology in subfertile couples. Several
aspects still need to be addressed and considered. Although our decision rules
showed good calibration (the correspondence between model-based probabilities
and observed tubal pathology rates) and can be easily applied in clinical practice,
external validation is still required (Leushuis et al., 2009).
Also further research is needed concerning the finding of unilateral tubal pathology.
Although our findings did not show a significant reduction in pregnancy rates
in this group of women (Mol et al., 1999; Verhoeve et al., 2011), the pregnancy
rate may be overestimated due to use of conventional methods of analysis (van
Geloven et al., 2012). It is thus possible that in case of unilateral tubal pathology,
active management such as surgery, IUI or IVF may result in significantly higher
pregnancy rates. To answer this question requires a randomised controlled trial in
this group of women.
Finally, we recommend expectant management and deference of tubal testing in a
substantial number of couples. A recent survey amongst patients and professionals
in the Netherlands showed that not only patients’ appreciation of expectant
management was moderate, but also the professionals’ adherence to expectant
management. Improvement of adherence may be obtained by providing more
information material to patients about prognostic models and providing protocols
and training to professionals and by improving their communications skills (van
den Boogaard et al., 2012). Tubal tests may have additional effects on patients’
health apart from the consequences of subsequent management decisions
(Bossuyt and McCaffery, 2009; Lenhard et al., 2005). These additional effects, such
as knowing the cause of the subfertility, being reassured that tubes are patent or
anxiety provoked when the tests reveal bad news, have not been studied. The
value of such information may influence the decision to test or not and should be
studied.
23138 Coppus.indd 165 28-09-12 14:55
Chapter 10
166
References
ASRM Practice Committee Reports. Optimal evaluation of the infertile female. Fertil Steril 2006; 86:s264-s267.
Bossuyt PM and McCaffery K. Additional patient outcomes and pathways in evaluations of testing. Med Decis Making 2009; 29:E30-E38.
Broeze KA, Opmeer BC, Coppus SF, Van Geloven N, Den Hartog JE, Land JA, Van der Linden PJQ, Ng EHY, Van der Steeg JW, Steures P, Van der Veen F, Mol BW. Integration of patient characteristics and the results of Chlamydia antibody testing and hysterosalpingography in the diagnosis of tubal pathology: an individual patient data meta-analysis. Hum Reprod 2012; 27:2979-2790.
Broeze K, Opmeer BC, Coppus SF, Van Geloven N, Alves MF, Ånestad G, Bhattacharya S, Allan J, Guerra-Infante MF, Den Hartog JE et al. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011; 17:301-310.
Coppus SF, Verhoeve HR, Opmeer BC, van der Steeg JW, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW. Identifying subfertile ovulatory women for timely tubal patency testing: a clinical decision rule based on medical history. Hum Reprod 2007; 22:2685-2692.
Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007; 22:1353-1358.
Collins JA, Burrows EA, Wilan AR. The prognosis for live birth among untreated infertile couples. Fertil Steril 1995; 64:22-8.
Hull MG, Glazener CM, Kelly NJ, Conway DI, Foster PA, Hinton RA, Coulson C, Lambert PA, Watt EM, Desai KM. Population study of causes, treatment, and outcome of infertility. BMJ 1985; 291:1693-7.
Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, van der Veen F, te Velde ER. Two new prediction rules for spontaneous pregnancy leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004; 19:2019-2026.
Johnson N, van Voorst S, Sowter MC, Strandell A, Mol BW. Surgical treatment for tubal disease in women due to undergo in vitro fertilisation. Cochrane Database Syst Rev 2010; 20: CD002125.
Lenhard W, Breitenbach E, Ebert H, Schindelhauer-Deutscher HJ, Henn W. Psychological benefit of diagnostic certainty for mothers of children with disabilities: lessons from Down syndrome. Am J Med Genet 2005, 133A: 170-175.
Leushuis E, van der Steeg JW, Steures P, Bossuyt PM, Eijkemans MJ, van der Veen F, Mol BW, Hompes PG. Prediction models in reproductive medicine: a critical appraisal. Hum Reprod Update 2009; 15:537-52.
Lintsen AM, Eijkemans MJ, Hunault CC, Bouwmans CA, Hakkaart L, Habbema JD, Braat DD. Predicting ongoing pregnancy chances after IVF and ICSI: a national prospective study. Hum Reprod 2007; 22:2455-2462.
Mol BW, Collins JA, Burrows EA, van der Veen F, Bossuyt PM. Comparison of hysterosalpingography and laparoscopy in predicting fertility outcome. Hum Reprod 1999; 14:1237-1242.
www.nice.org.uk/pdf/download.aspx?)0=CGO11fullguideline
23138 Coppus.indd 166 28-09-12 14:55
Epilogue
167
10
NVOG Richtlijn Orienterend Fertiliteits-onderzoek (OFO) Werkgroep Voortplantings Endocrinologie en Fertiliteit. Nederlandse Vereniging voor Obstetrie en Gynaecologie (NVOG). 2004.
Perquin DAM, Dorr PJ, de Craen AJM, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomised controlled trial. Hum Reprod 2006; 21:1227-1231.
Snick HK, Snick TS, Evers JL, Collins JA. The spontaneous pregnancy prognosis in untreated subfertile couple: the Walcheren primary care study. Hum Reprod 1997; 12:1582-1588
Steures P, van der Steeg JW, Hompes PG, Habbema JD, Eijkemans MJ, Broekmans FJ, Verhoeve HR, Bossuyt PM, van der Veen F, Mol BW; Collaborative Effort on the Clinical Evaluation in Reproductive Medicine. Intrauterine insemination with controlled ovarian hyperstimulation versus expectant management for couples with unexplained subfertility and an intermediate prognosis: a randomised clinical trial. Lancet 2006; 368:216-221.
Tanahatoe S, Lambalk CB, Hompes PG. The role of laparoscopy in intrauterine insemination: a prospective randomized reallocation study. Hum Reprod 2005; 20:3225-3230.
Van Geloven N, Broeze KA, Bossuyt PM, Zwinderman AH, Mol BW. Treatment should be considered a competing risk when predicting natural conception in subfertile women. Hum Reprod 2012; 27:889-95.
Verhoeve HR, Moolenaar LM, Hompes P, Van der Veen F, Mol BW. Cost-effectiveness of tubal patency tests. 2012 Submitted.
Verhoeve HR, Coppus SF, van der Steeg JW, Steures P, Hompes PG, Bourdrez P, Bossuyt PM, van der Veen F, Mol BW. The capacity of hysterosalpingography and laparoscopy to predict natural conception. Hum Reprod 2011; 26: 134-142.
23138 Coppus.indd 167 28-09-12 14:55
23138 Coppus.indd 168 28-09-12 14:55
Part 4Appendices
23138 Coppus.indd 169 28-09-12 14:55
23138 Coppus.indd 170 28-09-12 14:55
11List of co-authors and their affiliations
23138 Coppus.indd 171 28-09-12 14:55
Chapter 11
172
23138 Coppus.indd 172 28-09-12 14:55
173
11
List of co-authors
Prof. Dr. S. Bhattacharya, Department of Obstetrics and Gynaecology, Aberdeen
Maternity Hospital, Aberdeen, UK.
Drs. P. Boudrez, Department of Obstetrics and Gynaecologie, VieCuri Medical
Centre, Venlo/Venray.
Prof. Dr. P.M. Bossuyt, Department of Clinical Epidemiology, Biostatistics and Bio-
informatics, Academic Medical Centre, Amsterdam.
Dr. M.J. Eijkemans, Julius Centrum, University Medical Centre Utrecht.
Dr. P.G. Hompes, Department of Obstetrics and Gynaecology, Vrije Universiteit
Medical Centre, Amsterdam.
Prof. Dr. J.A. Land, Department of Obstetrics and Gynaecology, University Medical
Centre Groningen, Groningen.
Dr. S. Logan, Department of Obstetrics and Gynaecology, Aberdeen Maternity
Hospital, Aberdeen, UK.
Prof. Dr. B.W. Mol, Department of Obstetrics and Gynaecology, Academic Medical
Centre, Amsterdam / Máxima Medical Centre, Veldhoven
Dr. B.C. Opmeer, Department of Clinical Epidemiology, Biostatistics and Bio-
informatics, Academic Medical Centre, Amsterdam.
Dr. J.W. van der Steeg, Department of Obstetrics and Gynaecology, Academic
Medical Centre, Amsterdam.
Dr. P. Steures, Department of Obstetrics and Gynaecology, Academic Medical
Centre, Amsterdam.
Prof. Dr. F. van der Veen, Centre for Reproductive Medicine, Academic Medical
Centre, Amsterdam
Drs. H.R. Verhoeve, Department of Obstetrics and Gynaecology, Onze Lieve Vrouwe
Gasthuis, Amsterdam.
23138 Coppus.indd 173 28-09-12 14:55
23138 Coppus.indd 174 28-09-12 14:55
11List of publications
23138 Coppus.indd 175 28-09-12 14:55
Chapter 11
176
List of publications
Coppus SF, van der Veen F, Bossuyt PM, Mol BW. Quality of reporting of test accuracy studies in reproductive medicine: impact of the standards for reporting of diagnostic accuracy (STARD)-initiative. Fertil Steril 2006, 86(5): 1321-1329.*
Coppus SF, Mol BW. Diagnostic accuracy of three-dimensional hysterosalpingo-contrast sonography [letter]. Acta Obstet Gynecol Scand 2006, 85(4): 508-509.
Coppus SF, Opmeer BC, van der Veen F, Bossuyt PM, Mol BW. Routine use of hysterosalpingography prior to diagnostic laparoscopy in the fertility workup [letter]. Hum Reprod 2006, 21(10): 2725-2726.
Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007, 22(5): 1353-1358.*
Van Leeuwen M, Opmeer BC, Coppus SF, Mol BW. Fasting capillary glucose as a screening test for gestational diabetes [letter]. BJOG 2007, 114(3): 372.
Geomini PM, Coppus SF, Kluivers KB, Bremer GL, Kruitwagen RF, Mol BW. Is three-dimensional ultrasonography of additional value in the assessment of adnexal masses? Gynecol Oncol 2007, 106(1): 153-159.
Coppus SF, Langenveld J, Oei SG. Een onderschatte techniek voor het opheffen van schouderdystocie: baren op handen en knieën (‘all fours manoeuvre’). Ned Tijdschr Geneeskd 2007, 151(27): 1493-1497.
Coppus SF, Verhoeve HR, Opmeer BC, van der Steeg JW, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW. Identifying subfertile ovulatory women for timely tubal patency testing: a clinical decision rule based on medical history. Hum Reprod 2007, 22(10): 2685-2692.*
Coppus SF, Emparanza JI, Hadley J, Kulier R, Weinbrenner S, Arvanitis TN, Burls A, Cabello JB, Decsi T, Horvath AR, Kaczor M, Zanrei G, Pierer K, Stawiarz K, Kunz R, Mol BW, Khan KS. A clinically integrated curriculum in Evidence-Based Medicine for just-in-time learning through on-the-job training: The EU-EBM project. BMC Med Educ 2007, 7:46.
Kulier R, Hadley J, Weinbrenner S, Meyerrose B, Decsi T, Horvath AR, Nagy E, Emparanza JI, Coppus SF, Arvanitis TN, Burls A, Cabello JB, Kaczor M, Zanrei G, Pierer K, Stawiarz K, Kunz R, Mol BW, Khan KS. Harmonizing evidence-based medicine teaching: a study of the outcomes of e-learning in five European countries. BMC Med Educ 2008, 8:27.
Luttjeboer FY, Verhoeve HR, van Dessel HJ, van der Veen F, Mol BW, Coppus SF. The value of medical history taking as risk indicator for tuboperitoneal pathology: a systematic review. BJOG 2009, 116(5): 612-625.
Broeze KA, Opmeer BC, Bachmann LM, Broekmans FJ, Bossuyt PM, Coppus SF, Johnson NP, Khan KS, ter Riet G, van der Veen F, van Wely M, Mol BW. Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine. BMC Med Res Methodol 2009, 9:22.
23138 Coppus.indd 176 28-09-12 14:55
177
11
List of publications
Weinbrenner S, Meyerrose B, Vega-Perez A, Kulier R, Coppus SF, Kunz R. [EUebm – integrating a Europe-wide harmonised training and continuing medical education in evidence based medicine (BbM) with patient care]. Z Evid Fortbild Qual Gesundhwese 2009, 103: 35-39.
Coppus SF, van der Veen F, Opmeer BC, Mol BW, Bossuyt PM. Evaluating prediction models in reproductive medicine. Hum Reprod 2009, 24(8): 1774-1778.*
Kulier R, Coppus SF, Zamora J, Hadley J, Malick S, Das K, Weinbrenner S, Meyerrose B, Decsi T, Horvath AR, Nagy E, Emparanza JI, Arvanitis TN, Burls A, Cabello JB, Kaczor M, Zanrei G, Pierer K, Stawiarz K, Kunz R, Mol BW, Khan KS. The effectiveness of a clinically integrated e-learning course in evidence-based medicine: a cluster randomised controlled trial. BMC Med Educ, 2009, 9:21.
Kunz R, Nagy E, Coppus SF, Emparanza JI, Hadley J, Kulier R, Weinbrenner S, Arvanitis TN, Burls A, Cabello JB, Desci T, Horvath AR, Walzak J, Kaczor MP, Zanrei G, Pierer K, Schaffler R, Suter K, Mol BW, Khan KS. How far did we get? How far to go? A European survey on post-graduate courses in evidence-based medicine. J Eval Clin Pract, 2009, 15(6): 1169-1204.
Hadley J, Kulier R, Zamora J, Coppus SF, Weinbrenner S, Meyerrose B, Decsi T, Horvath AR, Nagy E, Emparanza JI, Arvanitis TN, Burls A, Cabello JB, Kaczor M, Zanrei G, Pierer K, Kunz R, Wilkie V, Wall D, Mol BW, Khan KS. Effectiveness of an e-learning course in evidence-based medicine for foundation (internship) training. J R Soc Med 2010, 103(7): 288-294.
Verhoeve HR, Coppus SF, van der Steeg JW, Steures P, Hompes PG, Bourdrez P, Bossuyt PM, van der Veen F, Mol BW. The capacity of hysterosalpingography and laparoscopy to predict natural conception. Hum Reprod 2011, 26(1): 134-142.*
Broeze KA, Opmeer BC, van Geloven N, Coppus SF, Collins JA, den Hartog JE, van der Linden PJ, Marianowski P, Ng EH, van der Steeg JW, Steures P, Strandell A, van der Veen F, Mol BW. Are patient characteristics associated with the accuracy of hysterosalpingography in diagnosing tubal pathology? An individual patient data meta-analysis. Hum Reprod Update 2011, 17(3): 293-300.
Broeze KA, Opmeer BC, Coppus SF, van Geloven N, Alves MF, Anestad G, Bhattachary S, Allan J, Guerra-Infante MF, Den Hartog JE, Land JA, Idahl A, van der Linden PJ, Mouton JW, Ng EH, van der Steeg JW, Steures P, Svenstrup HF, Tiitinen A, Toye B, van der Veen F, Mol BW. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011, 17(3): 301-310.
Coppus SF, Land JA, Opmeer BC, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW, van der Steeg JW. Chlamydia trachomatis IgG seropositivity is associated with lower natural conception rates in ovulatory subfertile women without visible tubal pathology. Hum Reprod 2011, 26(11): 3061-3071.*
Broeze KA, Opmeer BC, Coppus SF, van Geloven N, den Hartog JE, Land JA, van der Linden PJ, Ng EH, van der Steeg JW, Steures P, van der Veen F, Mol BW. Integration of patient characteristics and the results of Chlamydia antibody testing and hysterosalpingography in the diagnosis of tubal pathology: an individual patient data meta-analysis. Hum Reprod 2012, 27(10): 2979-2790.
* presented in this thesis
23138 Coppus.indd 177 28-09-12 14:55
23138 Coppus.indd 178 28-09-12 14:55
11Dankwoord
23138 Coppus.indd 179 28-09-12 14:55
Chapter 11
180
Dankwoord
Promoveren doe je niet alleen, dat is me in al die jaren wel duidelijk geworden.
Allereerst een woord van dank aan al die patiënten die hun gegevens ter
beschikking hebben gesteld voor dit onderzoek. Daarnaast ben ik aan een aantal
specifieke personen dank verschuldigd voor de totstandkoming en afronding van
dit proefschrift.
Prof. Dr. B.W. Mol, beste Ben Willem. Ik weet nog goed hoe het allemaal begonnen
is, jij als jonge klare in Veldhoven, ik als jonge ANIOS. Of ik even met je mee wilde
komen, en vervolgens de vraag of onderzoek doen niks voor mij was. Na enig
beraad was mijn antwoord uiteraard ja, en 1 maand later hing ik totaal uitgeput aan
de telefoon omdat ik je tempo niet bij kon houden. Het vertrouwen dat je in me
gehad hebt de afgelopen jaren is mij heel dierbaar, en ondanks enkele haperingen
in de progressie is het resultaat nu daar. Ik waardeer je humor, tomeloze inzet en
werklust, en uitgesproken loyaliteit en vriendschap. Gelieve niet meer om 4.30 am
te bellen. Thx mate!
Prof. Dr. F. van der Veen, beste Fulco. Daar kwam ik in het kielzog van BW in het
AMC aangezet. Totaal geen notie van voortplantingsgeneeskunde, laat staan
van het doen van onderzoek. Vanaf t=0 ging je er echter vol voor, en kwamen
de karakteristieke commentaren met gezwinde spoed retour. Heerlijk scherp, niet
zelden zonder humor, maar ook totaal onwetend van de Wordknop ‘wijzigingen
bijhouden’. Ik weet zeker dat je het wereldrecord ‘abstract corrigeren voor de
deadline’ gebroken hebt. Ook al werd ik bijna tot wanhoop gedreven toen we
voor revisieronde 17 hadden bereikt voor een bepaald manuscript, het feit dat dit
artikel zonder verdere revisie direct geaccepteerd werd heeft je gelijk bewezen!
Dank.
Dr. B.C. Opmeer, beste Brent, dank voor al je statistische hulp en acceptatie van
een Brabander op je kamer in het AMC. Wat was het fijn om op je terug te kunnen
vallen als SPSS weer eens iets heel anders deed dan dat ik voor ogen had. Dat ik
ooit nog eens statistiek onderwijs zou gaan geven had ik tijdens mijn geneeskunde
studie niet gedacht, maar het heeft me goed gedaan!
23138 Coppus.indd 180 28-09-12 14:55
Dankwoord
181
11
Dankwoord
Prof. Dr. P.M.M. Bossuyt, beste Patrick. Dank voor het creëren van een plekje op
de afdeling Epidemiologie en Biostatistiek. Voor enkele van de stukken in dit
proefschrift is jouw hulp onontbeerlijk geweest, hoewel ik je mailtje over het
Chlamydia paper nog steeds niet opnieuw durf te openen. Gelieve voortaan
in een eerder stadium te constateren dat we toch net die andere analytische
techniek hadden moeten kiezen. Jouw opvatting, kennis en kunde van de klinische
epidemiologie, en diagnostisch onderzoek in het bijzonder is van wereldniveau.
Prof. dr. M.P.M. Burger, prof. dr. J.A. Land, prof. dr. J.L.H. Evers, prof. dr. M.J. Heineman,
prof. dr. E.W. Steyerberg en prof. dr. S. Bhattacharya dank voor het beoordelen van
mijn manuscript.
Dr. J.W. van der Steeg, beste Jan Willem – Dr. P. Steures, lieve Pieternel. 1000x dank
dat in jullie OFO data mocht rondneuzen!
Dear Bhatty, thanks for allowing me to explore your tubal pathology data.
Beste co-auteurs, zonder een ieder bij naam te noemen heeft ieder van jullie iets
essentieels aan de individuele manuscripten bijgedragen. Dank.
Maatschap gynaecologie en verloskunde van het Máxima Medisch Centrum in
Veldhoven. Wat een mooi stel zijn jullie bij elkaar en wat heb ik veel geleerd bij
jullie. Dank voor het niet aflatend vertrouwen, en de mogelijkheid om naast al
tijdens mijn ANIOS tijd gedeeltelijk vrij geroosterd te worden om mijn eerste
stappen in de wetenschap te zetten. Die laparoscopische hysterectomie heb ik
toch maar mooi in de pocket!
Huidige en oud-collega assistenten gynaecologie van het MMC. Wat heerlijk om
met jullie te werken. Lieve Judith, beste dr. van Laar, wat was het fijn om met
jouw hoogte- en dieptepunten te delen. Lieve Josje, beste dr. Langenveld, ook
met jou was het heerlijk zuchten en steunen. Ik was best trots dat je met mij je
lekenpraatje wilde oefenen. Dank voor jullie promotie tips en tricks. Stijn, Tamara,
Rafli, Minouche, Femke, Josien, Leonie, Joost, en alle jonkies, jullie kunnen het ook!
Lieve verpleging, kraamverzorgenden, verloskundigen, polikliniek medewerksters
en secretaresses in Veldhoven en Maastricht, excuses als ik met mijn hoofd
23138 Coppus.indd 181 28-09-12 14:55
Chapter 11
182
weer eens bij andere zaken was dan de kliniek. Dank voor alle kopjes koffie,
geïnteresseerde praatjes en het aanhoren van mijn geklaag. Zonder jullie geen
gynaecologie/verloskunde!
Beste stafleden en junior staf van het Maastricht Universitair Medisch Centrum.
Sorry voor mijn ‘ja maar…’, dank voor alle interesse in mijn vorderingen, en de
plezierige sfeer op de werkvloer. Beste Janneke, het moment dat we per ongeluk
ontdekten dat we zo ongeveer de helft van elkaars boekje gereviewed hadden
blijft hilarisch!
Harold Verhoeve en Kimiko Broeze, mijn tuba-collega’s. Een tuba kan in Bes, C of Es
gestemd staan, maar voor allen geldt dat je veel adem moet hebben. Dat hebben
we wel gehad volgens mij, een lange adem. Dank voor jullie hulp bij het tuba
project.
Dr. L.G.M. Mulders, mijn mentor. Beste Leon, vanaf de zijlijn heb je mijn vorderingen
steeds in de gaten gehouden en me achter de broek gezeten. Dank voor je
relativeringsvermogen, je ‘draadje van Coppus’, en je soms heerlijk ontwapende
aanpak als pragmatisch gynaecoloog. We lijken soms verdacht veel op elkaar.
Beste staf voortplantingsgeneeskunde, beste staf Klinische Epidemiologie, beste
collega onderzoekers uit het AMC. Hoewel vreemde eend in de bijt, heb ik me altijd
ontzettend in jullie clubjes opgenomen gevoeld. Bij besprekingen, congressen in
binnen- en buitenland werd ik direct in de AMC club opgenomen. Ontzettend
veel dank daarvoor, het heeft me enorm goed gedaan. Wat wel betekent dat ik nu
op nationale congressen vaak tijd tekort kom om iedereen te spreken.
Lieve vrienden en vriendinnen, dank voor alle gezellige feesten, borrels en
stapavonden, waardoor de geplande zondagochtend werken vaak alweer
verloren ging. Hoewel de meesten van jullie weinig aanstalten maken om zelf te
promoveren, zijn jullie erg goed in het wegnemen van alle promotiestress! Beste
medemuzikanten, ik probeer weer wat vaker op de repetities te zijn!
Lieve familie en schoonfamilie, het is zo fijn om samen met jullie te zijn!
Mijn paranimfen, mr. Michiel Dankers en mr. drs. Jerom Coppus. Dank voor al jullie
23138 Coppus.indd 182 28-09-12 14:55
Dankwoord
183
11
Dankwoord
relativerende opmerkingen en onuitputtelijke voorraad consumptiebonnen. Ook
al hebben jullie geen kaas gegeten van geneeskunde, op juridisch vlak zijn jullie
een kei. Wat na genoeg erosieve veranderingen dan weer dicht bij grind en zand
komt….
Lonneke en Léonieke (mét streepje!), lieve tweelingzusjes. Wat gezellig dat jullie
gewoon binnen komen vallen voor een glas sinas. En wat fijn dat ik jullie ‘broerke’
mag zijn!
Jerom, beste broer, als spreken pudding zou zijn, was jij al lang gepromoveerd tot
doctor Oetker.
Beste Sjaak, dank voor je fantastische dochter. Dank voor je nuchtere kijk op het
leven. Wat is het ontzettend jammer dat Hanneke dit niet meer mee mag maken.
Lieve mama, lieve papa, zonder jullie had ik dit alles niet gekund. Als je het talent
hebt gekregen moet je het ook gebruiken was jullie motto. Dat heb ik geprobeerd.
Ontzettend bedankt voor het altijd creëren van de mogelijkheden hiervoor. Pap,
ook al heb je dat niet gewild, je was een behoorlijk vertragende factor bij het
afronden van dit proefschrift… Wat zou je ontzettend trots geweest zou zijn. Ik
ben trots dat ik 35 jaar na dato in jouw voetsporen treedt.
Lieve Elke. Wat ben jij een kanjer! Punt.
Yfke en Fiene, kleine droppies, papa gaat voorlezen!
23138 Coppus.indd 183 28-09-12 14:55
23138 Coppus.indd 184 28-09-12 14:55
11About the author
23138 Coppus.indd 185 28-09-12 14:55
Chapter 11
186
23138 Coppus.indd 186 28-09-12 14:55
187
11
About the author
Sjors Coppus werd geboren op 17 januari 1979 te Deurne. In 1997 behaalde hij
zijn gymnasiumdiploma aan het Sint Willibrord Gymnasium te Deurne. Hetzelfde
jaar startte hij met de opleiding Geneeskunde aan de Universiteit Maastricht. Na
het cumlaude afsluiten van de doctoraalfase behaalde hij zijn artsexamen in
november 2003. Hierna werkte hij als ANIOS gynaecologie/obstetrie in het
Máxima Medisch Centrum te Veldhoven, waarbij hij in een latere fase als parttime
onderzoeker werd aangesteld in het AMC te Amsterdam. In 2007 werd een jaar
doorgebracht op de afdeling Klinische Epidemiologie van het AMC te
Amsterdam, in welke periode hij betrokken was bij het EU-EBM project, een
internationaal project naar uniformering van evidence based medicine onderwijs
binnen Europa. In 2008 startte hij met de opleiding tot gynaecoloog in het
Máxima Medisch Centrum (opleiders prof. dr. S.G. Oei en dr. M.Y. Bongers). In
datzelfde jaar verwierf hij een ZonMW AGIKO stipendium. Het academische deel
van de opleiding werd in 2009 en 2010 doorgebracht in het Maastricht
Universitair Medisch Centrum (opleiders prof. dr. G.G. Essed, prof. dr. R.F.
Kruitwagen en dr. G.A. Dunselman). In 2012 keerde hij terug naar het Máxima
Medisch Centrum voor het laatste jaar van de opleiding (opleiders dr. M.Y.
Bongers, dr. C.A. Koks en dr. J.W. Maas). Sjors Coppus is getrouwd met Elke
Verhees. In 2012 werden zij de trotse ouders van een dichoriale tweeling, Yfke en
Fiene. Per 1 januari 2013 is hij aangesteld als staflid gynaecologie in het UMC St.
Radboud te Nijmegen met aandachtsgebied endoscopische chirurgie.
23138 Coppus.indd 187 28-09-12 14:55
23138 Coppus.indd 188 28-09-12 14:55
23138 Coppus.indd 189 28-09-12 14:55
23138 Coppus.indd 190 28-09-12 14:55
23138 Coppus.indd 191 28-09-12 14:55
23138 Coppus.indd 192 28-09-12 14:55
Diagnostic and prognosticaspects of tubal patency testing
Sjors Coppus
Diagnostic and prognostic aspects of tubal patency testing Sjors Coppus
23138 Coppus_Omsl 01-10-12 14:16 Pagina 1