Diagnostic and prognostic aspects of tubal patency testingChapter 2 Quality of reporting of test...

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)

UvA-DARE (Digital Academic Repository)

Diagnostic and prognostic aspects of tubal patency testing

Coppus, S.F.P.J.

Publication date2012Document VersionFinal published version

Link to publication

Citation for published version (APA):Coppus, S. F. P. J. (2012). Diagnostic and prognostic aspects of tubal patency testing.

General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an opencontent license (like Creative Commons).

Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, pleaselet the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the materialinaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letterto: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. Youwill be contacted as soon as possible.

Download date:11 May 2021

https://dare.uva.nl/personal/pure/en/publications/diagnostic-and-prognostic-aspects-of-tubal-patency-testing(99faf135-bfec-4e09-8034-e43444fd6c5c).html

Diagnostic and prognosticaspects of tubal patency testing

Sjors Coppus

Diagnostic and prognostic aspects of tubal patency testing Sjors Coppus

23138 Coppus_Omsl 01-10-12 14:16 Pagina 1

Diagnostic and prognostic aspects of tubal patency testing

Sjors Coppus

23138 Coppus.indd 1 28-09-12 14:55

Financial support for the publication of this thesis has been kindly provided by:

Conceptus BV, Goodlife Pharma, MSD Nederland, Johnson & Johnson Medical BV,

ChipSoft BV, Ferring Pharmaceuticals, Covidien Nederland BV, Guerbet Nederland

BV, Karl Storz Nederland BV, ERBE Nederland BV, Hologic Benelux BV, Memidis

Pharma BV, Medical Dynamics, Stöpler Instrumenten & Apparaten BV

Cover: Fallopian tube, original tissue paper sculpture by Polly and Tom Verity

Layout: Ferdinand van Nispen, Citroenvlinder-DTP.nl, Bilthoven

Print: GVO Drukkers & Vormgevers B.V. | Ponsen & Looijen, Ede

ISBN: 978-90-6464-586-0

© 2012 Sjors Coppus

23138 Coppus.indd 2 28-09-12 14:55

Diagnostic and prognostic aspects

of tubal patency testing

ACADEMISCH PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Universiteit van Amsterdam

op gezag van de Rector Magnificus

prof. dr. D.C. van den Boom

ten overstaan van een door het college

voor promoties ingestelde commissie,

in het openbaar te verdedigen in de Agnietenkapel

op 4 december 2012, te 10.00 uur

door

Sjors Franciscus Petrus Joannes Coppus

Geboren te Deurne

23138 Coppus.indd 3 28-09-12 14:55

Promotiecommissie

Promotores: Prof. Dr. B.W.J. Mol

Prof. Dr. F. van der Veen

Copromotor: Dr. B.C. Opmeer

Overige leden: Prof. Dr. S. Bhattacharya

Prof. Dr. M.P.M. Burger

Prof. Dr. J.L.H. Evers

Prof. Dr. M.J. Heineman

Prof. Dr. J.A. Land

Prof. Dr. E.W. Steyerberg

Faculteit der Geneeskunde

23138 Coppus.indd 4 28-09-12 14:55

That slender and narrow seminal duct rises,

fibrous and pale, from the horns of the uterus itself;

becomes, when it has gone a little bit away, appreciably broader,

and curls like a branch until it comes near the end,

then losing the horn-like curl, and becoming very broad,

has a distinct extremity which appears fibrous and fleshy through its red color,

and its end is torn and ragged like the fringe of well-worn garments,

and it has a wide orifice

which lies always closed through the ends of the fringe falling together;

and, if these be carefully separated and opened out,

they resemble the orifice of a brass trumpet.

Wherefore since the seminal duct from its beginning to its end

has a likeness to the bent parts of this classic instrument,

separated or attached, therefore,

it has been called by me uteri tuba (trumpet of the uterus).

Gabriele Fallopius, 1561

23138 Coppus.indd 5 28-09-12 14:55

Contents

Chapter 1 Introduction 9

Part 1 Methodological considerations 25

Chapter 2 Quality of reporting of test accuracy studies in reproductive

medicine: impact of the standards for reporting of diagnostic

accuracy (STARD)-initiative.

Fertility and Sterility 2006, 86: 1321-1329.

27

Chapter 3 Evaluating prediction models in reproductive medicine.

Human Reproduction 2009, 24: 1774-1778.

47

Part 2 Clinical studies 61

Chapter 4 The predictive value of medical history taking and Chlamydia

IgG ELISA antibody testing (CAT) in the selection of subfertile

women for diagnostic laparoscopy: a clinical prediction model

approach.


63

Chapter 5 Identifying subfertile ovulatory women for timely tubal

patency testing: a clinical decision rule based on medical

history.


79

Chapter 6 The capacity of hysterosalpingography and laparoscopy to

predict natural conception.


99

Chapter 7 Chlamydia trachomatis IgG seropositivity is associated with

lower natural conception rates in ovulatory subfertile women

without visible tubal pathology.


117

Part 3 Discussion, summary and epilogue 133

Chapter 8 General discussion 135

Chapter 9 Summary 145

Samenvatting 153

Chapter 10 Epilogue 161

23138 Coppus.indd 6 28-09-12 17:06

Part 4 Appendices 169

Chapter 11 List of co-authors and their affiliations

List of publications

Dankwoord

About the author

171

175

179

185

23138 Coppus.indd 7 28-09-12 16:57

23138 Coppus.indd 8 28-09-12 14:55

1Introduction

23138 Coppus.indd 9 28-09-12 14:55

Chapter 1

10

Subfertility is customarily defined as failure to conceive after regular unprotected

sexual intercourse for one year. The prevalence of subfertility is around 14%,

affecting one in seven couples (Hull et al., 1985). Data from historical populations

estimate the average prevalence of subfertility to be 5.5%, 9.4% and 19.7%,

respectively, at ages 25-29 years, 30-34 years and 35-39 years (Bongaarts, 1982). In

the Netherlands, this percentage is between 12% and 17%, depending on female

age (Bonsel and van der Maas, 1994).

A diagnosis of unexplained subfertility is usually made after it has been

demonstrated that a woman has a regular ovulation, patent Fallopian tubes with

no evidence of peritubal adhesions, fibroids or endometriosis and that the male

partner has normal semen quality (Simon and Laufer, 1993). The standard fertility

work-up starts with a thorough medical history and physical examination followed

by the assessment of ovulation and a semen analysis. Assessment of tubal

patency is usually reserved as last test in this work-up. Tubal disease includes tubal

obstruction and pelvic adhesions due to infection, endometriosis and previous

surgery and accounts for subfertility in 10-30% of the couples (Evers, 2002). Several

diagnostic tests to assess the tubal condition are available to the clinician.

Hysterosalpingography (HSG)

HSG was first performed in 1910 by Rindfleish, who injected a bismuth solution in

the uterine cavity (Rindfleish, 1910). In 1914, the first report on HSG for determining

tubal patency with an oil-soluble contrast medium was published (Carey, 1914).

Carey used Collargol which could cause significant tissue damage and was painful

to the women (Hanlo, 1930). Because of these serious adverse events, its use was

abandoned and a tubal insufflation test was introduced by Rubin in 1920 (Rubin,

1920). Rubin insufflated oxygen (later carbon dioxide) under pressure through the

cervical canal into the uterine cavity. Tubal patency was determined by presence of

air under the diaphragm on X-ray, by auscultation of air flow into the abdomen or

a drop in pressure during insufflation. In 1925, Heuser was the first to report on the

use of lipiodol in HSGs (Heuser, 1925). Lipiodol, like Collargol an oil-soluble contrast

medium but with a lower viscosity, proved less toxic and was widely accepted

(Hanlo, 1930). Gradually lipiodol was replaced by a water-soluble contrast medium

(WSCM) for several reasons: lower cost, better imaging of the tubal mucosal folds

23138 Coppus.indd 10 28-09-12 14:55

Introduction

11

1and ampullary rugae (Soules and Spandoni, 1982), lower viscosity and more

prompt demonstration of tubal patency reducing the need for a delayed film,

lower likelihood of persistence of contrast medium within the pelvic cavity and of

complications such as anaphylaxis or long-term lipogranuloma formation. At HSG

the contrast medium is slowly injected through the cervical canal into the uterine

cavity. By fluoroscopy (continuous X-ray imaging) the flow of contrast can be

followed and the uterine cavity and lumen of the fallopian tube can be visualized.

HSG is performed as an outpatient procedure and can be painful. Some clinics

still use oil-soluble contrast medium (OSCM), because no severe side-effects have

been reported since the introduction of fluoroscopy screening during injection of

contrast and the abandonment of the procedure when intravasation of contrast

occurs (Lindequist et al., 1991). Also, lipogranuloma formation has not been

reported in women with patent tubes following OSCM (Acton et al., 1988).

Nevertheless, most clinics use WSCM these days, although there is evidence that

flushing of the tubes with OSCM causes less pain and has a positive effect on

pregnancy rates when compared with WSCM (Glatstein et al., 1998; Luttjeboer et al.,

2007). The main complication of HSG is pelvic infection after the procedure, which

can occur in 1-3% of all cases and up to 10% if tubal pathology is present (Stumpf

and March, 1980; Pittaway et al., 1983; Forsey et al., 1990). The risk of infection can

be reduced by Chlamydia culture of the cervix and prophylactic antibiotics in

women with a medical history of pelvic infections (Pittaway et al, 1983), although

some authors advocate prophylactic antibiotics before uterine instrumentation

instead of endocervical screening (Land et al., 2002).

Laparoscopy and dye

The first diagnostic laparoscopy (DL) in humans was applied by Jacobeus in Sweden

in 1910, by inducing a pneumoperitoneum and introducing a cystoscope into the

peritoneal cavity. Initially this technique to diagnose intra-abdominal abnormalities

was mainly used by physicians. Kalk in Germany was principally responsible for

developing laparoscopy into an effective diagnostic and surgical procedure in the

early 1930s (Trimbos-Kempers, 1981; Te Linde’s Operative Gynecology, 1992). It

was not until 1946 before Palmer in France introduced laparoscopy and dye test

(synonyms: dye pertubation, chromopertubation) into gynaecologic practice for

23138 Coppus.indd 11 28-09-12 14:55

Chapter 1

12

the assessment of tubal patency and published his first 250 cases (Palmer, 1946).

In this procedure, which requires general anaesthesia and operating facilities, a

water-soluble dye (usually methylene blue), is injected into the uterine cavity.

Through the laparoscope, the pelvic cavity can be inspected for presence of

pelvic-peritoneal adhesions, endometriosis and tubal patency. Although the uterine

anatomy can be assessed, intrauterine abnormalities can only be visualized if the

procedure is combined with hysteroscopy. This can easily be performed in the same

setting. Like HSG, DL is an invasive procedure and can result in complications such

as infection, vascular-, intestinal- and urologic injuries, which have been reported to

vary between 0.06 and 1.5% (Jansen et al., 1997; Chapron et al., 1998; Härkki-Siren

et al., 1999). In the nineties of the previous century, laparoscopy was integrated as

an essential diagnostic procedure in infertility protocols (ASRM guidelines, 1992;

WHO/Rowe et al., 1993), based on one publication that reported abnormal findings

at laparoscopy in 75% of 24 couples with otherwise unexplained subfertility (Drake

et al., 1977). In 1997, it was reported that 96% of the reproductive endocrinologists

in the USA routinely performed an HSG and 89% a diagnostic laparoscopy in the

diagnostic fertility work-up (Glatstein et al., 1997).

A diagnostic laparoscopy can be combined with interventions like coagulation

or excision of minimal to mild endometriosis, adhesiolysis or cystectomy. In a

randomized controlled trial, it was found that laparoscopic resection or ablation of

minimal and mild endometriosis and adhesiolysis enhances fecundity in subfertile

women (Marcoux et al., 1997). They reported that one in eight women with this

condition should benefit from surgery, but the monthly fecundity rate among

women who underwent surgery (6.1%) was much lower than the rate expected

in fertile women (20%). Whether DL with dye pertubation has a positive effect on

pregnancy rate is not known. One randomised controlled trial showed no difference

between pertubation with an OSCM versus a WSCM (Al Fahdli et al., 2006).

Chlamydia trachomatis antibody testing

Chlamydia trachomatis infection, a sexually transmitted disease (STD), is a major

cause of pelvic inflammatory disease (PID), which can lead to chronic abdominal

pain, ectopic pregnancy and tubal pathology (Weström and Wolner-Hanssen, 1993;

Paavonen and Eggert-Kruse, 1999). The causal relationship between Chlamydia

23138 Coppus.indd 12 28-09-12 14:55

Introduction

13

1trachomatis and genital tract infection was established in 1958 (Collier et al.,

1958). C. trachomatis targets the epithelial cells of the cervix. In the Netherlands an

estimated 35 000 cases of Chlamydia cervicitis occur annually (Health Council of

the Netherlands, 2004).

Early infection can be detected by culture of cervical swabs and treated effectively

with antibiotics. However, the majority of these infections is asymptomatic and

remains therefore undiagnosed. Most women will effectively clear a Chlamydia

trachomatis infection by a normal immune response and have a low risk of

complications. Still, untreated infection can through ascendance result in PID

and tubal factor subfertility (Peipert, 2003; Den Hartog et al., 2006). There are no

prospective controlled trials available on how frequently women tested positive

for Chlamydia cervicitis will develop tubal infertility (Land et al., 2010). A correlation

between the age of a woman, severity of infections, the number of PID episodes

and the risk of tubal pathology has been demonstrated (Weström, 1980; Weström

et al., 1992). The risk of tubal pathology varied from 10% after one episode of PID to

20% after two episodes and 40% after three episodes (Weström et al., 1992).

Since an association between serum C. trachomatis IgG antibodies and tubal

pathology was noted, Chlamydia antibody tests (CAT) were introduced as

screening tests in the work-up of subfertile couples to provide the clinician with

a risk estimate of the presence of tubal pathology (Punnonen et al., 1979; Den

Hartog et al., 2006). CAT is a non-invasive, non-expensive and easy to perform test,

but does not provide images of the uterine and tubal anatomy nor of the pelvis.

Other diagnostic tests

Apart from HSG, laparoscopy and CAT testing; other tools are also used to assess

tubal patency. Falloposcopy is a technique with which through a thin fibreoptic

scope the lumen of the fallopian tubes can directly be visualized (Kiss et al.,

1993; Dechaud et al., 1998; Lundberg et al., 1998). Hystero-contrast-sonography

(HyCoSy) is a method where tubal patency is tested by injection of contrast into

the uterine cavity and patency is assessed by transvaginal ultrasound (Schlief

and Deichert, 1991). Transvaginal hydrolaparoscopy (THL) is performed as a

needle puncture technique of the pouch of Douglas using an adjusted Veress-

23138 Coppus.indd 13 28-09-12 14:55

Chapter 1

14

needle trocar system and warm normal saline is injected through this system

into the pelvic cavity. It allows direct visualization of peritubal - and peri-adnexal

adhesions and endometriotic implants. Tubal patency can be assesses by injection

of methylene blue into the uterine cavity like at laparoscopy (Gordts et al., 1998).

These techniques will not be further discussed, because few studies are available

that evaluate the diagnostic accuracy and predictive performance on pregnancy

rates of these tests.

The performance of HSG, laparoscopy and CAT in terms of accuracy and fertility prognosis.

Since the early nineteen seventies many studies on the performance of HSG and

CAT for the assessment of tubal pathology have been published. In these studies

laparoscopy is usually the reference test. The ideal (or ‘gold standard’) test for

tubal disease would correctly identify all women with tubal disease. It would be a

sensitive test (i.e. the test result would be abnormal in all women with the disease)

and it would also be specific (i.e. the test result would be normal only in women

without the disease).

In a systematic review from 20 studies published between 1968 and 1994 on the

diagnostic accuracy of HSG compared with laparoscopy, only 3 studies could

be identified in which the judgment of tubal pathology at laparoscopy was

independent of the knowledge of HSG results and thus preventing observer bias

(Swart et al., 1995). A meta-analysis of these studies showed a sensitivity of 65%

and a specificity of 83% for HSG for diagnosing tubal patency. HSG was unreliable

for diagnosing peri-adnexal adhesions (Swart et al., 1995).

The accuracy of serum Chlamydia antibodies (CAT) was assessed in another meta-

analysis (Mol et al., 1997). This study showed that the discriminative capacity of

Chlamydia antibody titers depended on the immuno-essays used and that the

diagnosis of any tubal pathology was comparable with that of HSG in diagnosing

tubal occlusion (Mol et al., 1997). The negative predictive value (NPV) of CAT in

subfertile women was 75-90%, making the presence of tubal pathology in women

with a negative CAT less likely, whereas the positive predictive value (PPV) of CAT

ranged between 30 and 65%, indicating a high false-positive rate if laparoscopy is

performed in CAT positive women (Den Hartog et al., 2006).

23138 Coppus.indd 14 28-09-12 14:55

Introduction

15

1In a study performed in 1995, diagnostic laparoscopy failed to be an ideal predictor

for future fertility (Collins et al., 1995). In a prospective cohort study of 794 women,

laparoscopy performed better than HSG as a predictor of future fertility (Mol

et al., 1999). In this study HSG showed a fecundity rate ratio (FRR) of 0.8 and 0.49

for one-sided and two-sided tubal occlusion, whereas in laparoscopy these were

0.51 and 0.15, respectively. After a normal HSG or a HSG with one-sided occlusion,

laparoscopy showed two-sided occlusion in 5% of these women and their fertility

prospects were almost zero. If two-sided tubal occlusion was detected on HSG but

not during laparoscopy, fertility prospects were slightly impaired. Fertility prospects

after bilateral occlusion on HSG were strongly impaired in case laparoscopy showed

one-sided and two-sided occlusion, with FRRs of 0.38 and 0.19, respectively.

Background of the research

Many reports of diagnostic tests, including the before mentioned studies, contain

methodological shortcomings that can result in an overestimation of diagnostic

test accuracy (Lijmer et al., 1999). A number of issues in the design and conduct of

diagnostic accuracy studies can lead to bias or variation. The sources of bias and

variation for which there is the most evidence are demographic features, disease

prevalence and severity, partial verification bias, clinical review bias and observer

and instrument variation (Whiting et al., 2004). In diagnostic accuracy studies the

results of a test are compared with those of a reference standard, the best available

method to detect presence or absence of disease. These results are then expressed

in accuracy statistics, such as sensitivity and specificity. In studies on the prognostic

performance of tests, test results are compared with a clinically relevant end point

and statistics are calculated to express the predictive capacity of the test. A group of

scientist and editors developed the STARD (Standards for Reporting of Diagnostic

Accuracy) statement to improve the reporting quality of diagnostic accuracy and to

help authors and readers to assess the potential for bias in a study and to evaluate

the generalizability of the results (Bossuyt et al., 2003).

Similarly, for improvement on the reporting of meta-analyses, QUORUM (Quality

of Reporting of Meta-analyses) and MOOSE (Meta-analyses of observational

Studies in Epidemiology) are available (Moher et al., 2001; Stroup et al., 2000). A

problem with these conventional meta-analyses in diagnostic studies is that they

23138 Coppus.indd 15 28-09-12 14:55

Chapter 1

16

statistically pool the results of individual diagnostic studies which typically examine

the accuracy of a test in isolation from medical history and clinical examination,

or do not adjust for overlap of information captured by clinical history, physical

examination and additional tests. How useful a test will be in practice remains

therefore uncertain (Miettinen and Caro, 1994; Bachman et al., 2003; Khan et al.,

2003; Broeze et al., 2009).

Not only in meta-analyses, but also in individual studies, the diagnostic performance

of HSG and CAT has been assessed in isolation of patient characteristics, and the

sensitivity and specificity of HSG and CAT have been assumed to be stable across

subgroups of women (Swart et al., 1995; Mol et al., 1997; Perquin et al., 2006). Since

conventional systematic reviews and meta-analyses are based on aggregate data

at the study level and not at the level of subgroups, this is unavoidable. The use

of data at the patient level in an individual patient data (IPD) meta-analysis could

overcome this limitation, because the accuracy of tests can be assessed in relation

to other patient characteristics and allows the development or evaluation of

diagnostic algorithms for individual patients (Broeze et al., 2009).

Despite clinical research done so far and reflecting the limitations of this research

as discussed, there is still inconsistency between fertility guidelines about the best

diagnostic strategy for tubal patency testing (NICE, 2004; NVOG, 2004; ASRM, 2006)

and there is no consensus on which test should be initially used, or on the most

effective or cost-effective sequence of tests (Fatum et al., 2002; Perquin et al., 2006;

Bosteels et al., 2007; Den Hartog et al., 2008). In many fertility clinics, CAT is used

ahead of more invasive tests such as HSG or laparoscopy, whereas in other clinics

CAT is not used at all and all women are directly referred for HSG or even laparoscopy.

Because of these inconsistencies in tubal assessment of subfertile women and the

importance of early identification of women with bilateral tubal pathology, who

have significantly reduced conception chances, our research has focused on the

effective and efficient use of tools and resources available in clinical practice.

We therefore critically reappraised the literature dealing with tubal pathology

in the subfertile population by using STARD criteria and IPD meta-analysis of

diagnostic and prognostic studies if possible and used the data obtained from

a large multicentre prospective cohort study among subfertile couples that had

23138 Coppus.indd 16 28-09-12 14:55

Introduction

17

1completed their basic fertility work-up collected in 38 clinics in the Netherlands

between 2002 and 2004. The results of these studies will be presented in 3 theses.

- Diagnostic tests for tubal pathology from a clinical and economic perspective

(H.R. Verhoeve)

- Diagnostic and prognostic aspects of tubal patency testing (S.F. Coppus)

- Diagnosing tubal pathology -the individual approach- (K.A. Broeze)

Background and outline of the thesis

During initiation of this research project, we noticed a wide practice variation

between clinics on work-up for diagnosing tubal pathology. Internationally, even

larger heterogeneity existed in tubal work-up. We therefore decided to focus on

diagnostic and prognostic aspects of tubal patency testing, and identified several

knowledge gaps about the clinical usefulness of medical history taking, CAT and

HSG in daily practice. To answer these knowledge gaps, we first had to gain insight

in methodological issues, which were developing very rapidly at the time of start

of the research presented in this thesis. As a result, the research presented in this

thesis starts with a focus on methodological issues in diagnostic and prognostic

research in general. Hereafter, we investigate the value of multivariable diagnostic

models based on medical history taking, to estimate pretest probabilities for tubal

pathology that might help to distinguish women with a high and low probability

of (severe) tubal disease at first consultation. Furthermore, the added value of

Chlamydia antibody testing upon medical history taking is explored. Finally, we

research the prognostic significance of various tubal pathology tests like HSG,

laparoscopy and CAT testing for spontaneous pregnancy.

Chapter 2 focuses on the quality of reporting of test accuracy studies in

reproductive medicine. We evaluate whether the introduction of the STARD

statement has led to a better reporting of test accuracy studies in two leading

journals within the field of reproductive medicine.

In chapter 3, we report on the methodological considerations when combining

the results of several diagnostic tests into models to predict the outcome of

such diagnoses over time. The importance of a clinically relevant spread in well

23138 Coppus.indd 17 28-09-12 14:55

Chapter 1

18

calibrated probabilities is emphasized, rather than a focus on sensitivity and

specificity of a model.

In chapter 4, we present the results of a pilot study on the use of multivariable

prediction models in the diagnosis of tubal pathology in subfertile women. Using

the data of 163 women from a Scottish prospective cohort study, strategies of

medical history taking alone, CAT testing alone, or an integrated use of both are

compared.

In chapter 5, data of 3716 women who underwent tubal patency testing as

part of their routine fertility workup are used to relate elements in their medical

history to the presence of tubal pathology. With multivariable logistic regression,

we construct two diagnostic models. One in which tubal disease is defined as

occlusion and/or severe adhesions of at least one tube, whereas in a second model,

tubal disease is defined as the presence of bilateral abnormalities. For both models,

discrimination and calibration is evaluated, and the models are transformed into

simple diagnostic score charts for application of the models in clinical practice.

In chapter 6, we evaluate the impact of unilateral and bilateral tubal pathology

detected at HSG or laparoscopy on treatment-independent pregnancy rates, as

compared with women in whom invasive tubal testing showed no abnormalities.

In chapter 7, we describe the results of a cohort study of 1882 women in whom,

after CAT testing, invasive tubal testing with HSG or laparoscopy showed no tubal

pathology. The effect of CAT-seropositive status on spontaneous pregnancy rates

is evaluated.

Chapter 8 provides a general discussion of the results presented in this thesis and

outlines their clinical implications. Finally suggestions for future research are given.

Chapter 9 summarizes the data presented in this thesis.

Chapter 10 provides an epilogue in which the results of three theses on tubal

pathology from our study group are integrated.

23138 Coppus.indd 18 28-09-12 14:55

Introduction

19

1References

Acton CM, Devitt JM, Ryan EA. Hysterosalpingography in infertility--an experience of 3,631 examinations. Aust N Z J Obstet Gynaecol 1988; 28:127-33.

Al-Fadhli R, Sylvestre C, Buckett W, Tulandi T. A randomized study of laparoscopic chromopertubation with lipiodol versus saline in infertile women. Fertil Steril 2006; 85:505-7.

American Fertility Society (1992) Investigation of the Infertile Couple. American Fertility Society, Birmingham, AL, USA.

Bachmann LM, ter Riet G, Clark TJ, Gupta JK, Khan KS. Probability analysis for diagnosis of endometrial hyperplasia and cancer in postmenopausal bleeding: an approach for a rational diagnostic workup. Acta Obstet Gynecol Scand 2003; 82:564-9.

Bongaarts J. Infertility after the age 30: a false alarm. Fam Plann Perspect 1982; 14:75-8.

Bonsel GJ, van der Maas PJ. Aan de wieg van de toekomst. Scenario’s voor de zorg rond de menselijke voorplanting 1995-2010. Bohn Stafleu van Loghum. Houten 1994.

Bosteels J, Van Herendael B, Weyers S, D’Hooghe T. The position of diagnostic laparoscopy in current fertility practice. Hum Reprod Update 2007; 13:477-85.

Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Lijmer JG, Moher D, Rennie D, de Vet HC. Standards for Reporting of Diagnostic Accuracy. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003; 326:41-4.

Broeze KA, Opmeer BC, Bachmann LM, Broekmans FJ, Bossuyt PM, Coppus SF, Johnson NP, Khan KS, ter Riet G, van der Veen F, van Wely M, Mol BW. Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine. BMC Med Res Methodol 2009; 27:9:22.

Cary WH. Note on determination of patency of fallopian tubes by the use of Collargol and the X-ray shadow. Am J Obstet 1914; 69:462.

Chapron C, Querleu D, Bruhat MA, Madelenat P, Fernandez H, Pierre F, Dubuisson JB. Surgical complications of diagnostic and operative gynaecological laparoscopy: a series of 29,966 cases. Hum Reprod 1998; 13:867-72.

Collier LH, Duke-Elder S, Jones BR. Experimental trachoma produced by cultured virus. Br J Ophthalmol 1958; 42:705-20.

Collins JA, Burrows EA, Wilan AR. The prognosis for live birth among untreated infertile couples. Fertil Steril 1995; 64:22-8.

Dechaud H, Daures JP, Hedon B. Prospective evaluation of falloposcopy. Hum Reprod 1998; 13:1815-8.

den Hartog JE, Morré SA, Land JA. Chlamydia trachomatis-associated tubal factor subfertility: Immunogenetic aspects and serological screening. Hum Reprod Update 2006; 12:719-30.

den Hartog JE, Lardenoije CM, Severens JL, Land JA, Evers JL, Kessels AG. Screening strategies for tubal factor subfertility. Hum Reprod 2008; 23:1840-8.

Donnez J, Jadoul P. Taking a history in the evaluation of infertility: obsolete or venerable tradition? Fertil Steril 2004; 81:16-7

Drake T, Tredway D, Buchanan G, Takaki N, Daane T. Unexplained infertility. A reappraisal. Obstet Gynecol 1977; 50:644-6.

23138 Coppus.indd 19 28-09-12 14:55

Chapter 1

20

Evers J. Female subfertility. Lancet 2002; 360: 151-159.

Fatum M, Laufer N, Simon A. Investigation of the infertile couple: should diagnostic laparoscopy be performed after normal hysterosalpingography in treating infertility suspected to be of unknown origin? Hum Reprod 2002; 17:1-3.

Forsey JP, Caul EO, Paul ID, Hull MG. Chlamydia trachomatis, tubal disease and the incidence of symptomatic and asymptomatic infection following hysterosalpingography. Hum Reprod 1990; 5:444-7.

Glatstein IZ, Harlow BL, Hornstein MD. Practice patterns among reproductive endocrinologists: the infertility evaluation. Fertil Steril 1997; 67:443-51.

Glatstein IZ, Harlow BL, Hornstein MD. Practice patterns among reproductive endocrinologists: further aspects of the infertility evaluation. Fertil Steril 1998; 70:263-9.

Gordts S, Campo R, Rombauts L, Brosens I. Transvaginal hydrolaparoscopy as an outpatient procedure for infertility investigation. Hum Reprod 1998; 13:99-103.

Hanlo EAJM. Uterosalpingographie (Thesis). SW Melchior. Amersfoort. 1930.

Härkki-Siren P, Sjöberg J, Kurki T. Major complications of laparoscopy: a follow-up Finnish study. Obstet Gynecol 1999; 94:94-8.

Health Council of The Netherlands (2004) Screening for Chlamydia. The Hague: Health Council of The Netherlands. Publication no. 2004/07.

Helmerhorst FM, Oei SG, Bloemenkamp KW, Keirse MJ. Consistency and variation in fertility investigations in Europe. Hum Reprod 1995; 10:2027-30.

Heuser C. La radiographie de l’uterus, des trompes et des ovaires avec ou sans injection de lipiodol, pour le diagnostic précoce de la grossesse, de la stérilité, de la perméabilité des trompes. Bull. Soc. Rad. Méd. France 1925. T. XIII no. 119.

Hubacher D, Grimes D, Lara-Ricalde R, de la Jara J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor infertility. Fertil Steril 2004; 81:6-10.

Hull MG, Glazener CM, Kelly NJ, Conway DI, Foster PA, Hinton RA, Coulson C, Lambert PA, Watt EM, Desai KM. Population study of causes, treatment, and outcome of infertility. BMJ 1985; 291:1693-7.

Jansen FW, Kapiteyn K, Trimbos-Kemper T, Hermans J, Trimbos JB. Complications of laparoscopy: a prospective multicentre observational study. BJOG 1997; 104:595-600.

Khan KS, Bachmann LM, ter Riet G. Systematic reviews with individual patient data meta-analysis to evaluate diagnostic tests. Eur J Obstet Gynecol Reprod Biol 2003; 108:121-5.

Kiss H, Wenzl R, Egarter C. Tuboscopy. Initial experiences with microendoscopic tubal diagnosis. Wien Klin Wochenschr 1993; 105:719-22.

Land JA, Gijsen AP, Evers JL, Bruggeman CA. Chlamydia trachomatis in subfertile women undergoing uterine instrumentation. Screen or treat? Hum Reprod 2002; 17:525-7.

Land JA, Van Bergen JE, Morré SA, Postma MJ. Epidemiology of Chlamydia trachomatis infection in women and the cost-effectiveness of screening. Hum Reprod Update 2010; 16:189-204.

Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, Bossuyt PM. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999; 282:1061-6.

Lindequist S, Justesen P, Larsen C, Rasmussen F. Diagnostic quality and complications of hysterosalpingography: oil- versus water-soluble contrast media--a randomized prospective study. Radiology 1991; 179:69-74.

23138 Coppus.indd 20 28-09-12 14:55

Introduction

21

1Lundberg S, Rasmussen C, Berg AA, Lindblom B. Falloposcopy in conjunction with laparoscopy: possibilities and limitations. Hum Reprod 1998; 13:1490-2.

Luttjeboer F, Harada T, Hughes E, Johnson N, Lilford R, Mol BW. Tubal flushing for subfertility. Cochrane Database Syst Rev. 2007;18:CD003718.

Luttjeboer FY, Verhoeve HR, van Dessel HJ, van der Veen F, Mol BW, Coppus SF. The value of medical history taking as risk indicator for tuboperitoneal pathology: a systematic review. BJOG 2009; 116:612-25.

Marcoux S, Maheux R, Bérubé S. Laparoscopic surgery in infertile women with minimal or mild endometriosis. Canadian Collaborative Group on Endometriosis. N Engl J Med 1997; 337:217-22.

McComb PF. Clinical history is the essence of medicine. Fertil Steril 2004; 81:13-5.

Miettinen OS, Caro JJ. Foundations of medical diagnosis: what actually are the parameters involved in Bayes’ theorem? Stat Med 1994; 13:201-9.

Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. QUOROM Group. Br J Surg 2000; 87:1448-54.

Mol BW, Dijkman B, Wertheim P, Lijmer J, van der Veen F, Bossuyt PM. The accuracy of serum chlamydial antibodies in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1997; 67:1031-7.

Mol BW, Collins JA, Burrows EA, van der Veen F, Bossuyt PM. Comparison of hysterosalpingography and laparoscopy in predicting fertility outcome. Hum Reprod 1999; 14:1237-42.

Moons KG, Biesheuvel CJ and Grobbee DE. Test research versus diagnostic research. Clin Chem 2004; 50:273-6.

National institute for clinical excellence . (2004) Fertility guideline: assessment and treatment for people with fertility problems. www.nice.org.uk/pdf/download.aspx?)0=CGO11fullguideline.

NVOG Richtlijn Orienterend Fertiliteits-onderzoek (OFO) Werkgroep Voortplantings Endocrinologie en Fertiliteit. Nederlandse Vereniging voor Obstetrie en Gynaecologie (NVOG) 2004.

Paavonen J, Eggert-Kruse W. Chlamydia trachomatis: impact on human reproduction. Hum Reprod Update 1999; 5:433-47.

Palmer R. La coelioscopie gynécologique. Mémoires de l´Academie Chirurgie 1946. 72, p.363.

Peipert JF. Clinical practice. Genital chlamydial infections. N Engl J Med 2003; 349:2424-30.

Perquin DA, Dörr PJ, de Craen AJ, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomized controlled trial. Hum Reprod 2006; 21:1227-31.

Punnonen R, Terho P, Nikkanen V, Meurman O. Chlamydial serology in infertile women by immunofluorescence. Fertil Steril 1979; 31:656-9.

Pittaway DE, Winfield AC, Maxson W, Daniell J, Herbert C, Wentz AC. Prevention of acute pelvic inflammatory disease after hysterosalpingography: efficacy of doxycycline prophylaxis. Am J Obstet Gynecol 1983; 147:623-6.

Rindfleisch W. Darstellung des Cavum Uteri. Klin Wochenschr 1910; 47:780.

Rowe PJ, Comhaire FH, Hargreave TB, Mahmoud AMA. WHO manual for the standardized investigation of the infertile couple. Cambridge, UK: Cambridge University Press Cambridge, UK, 1993.

23138 Coppus.indd 21 28-09-12 14:55

Chapter 1

22

Rubin IC. The non-operative determination of patency of fallopian tubes by means of intra-uterine inflation with oxygen and the production of an artificial pneumoperitoneum. Landmark article 1920. JAMA 1983; 250:2359-65.

Schlief R, Deichert U. Hysterosalpingo-contrast sonography of the uterus and fallopian tubes: results of a clinical trial of a new contrast medium in 120 patients. Radiology 1991; 178:213-5.

Simon A, Laufer N. Unexplained infertility: a reappraisal. Ass Reprod Rev. 1993; 3:26-36.

Soules MR, Spadoni LR. Oil versus aqueous media for hysterosalpingography: a continuing debate based on many opinions and few facts. Fertil Steril 1982; 38:1-11.

Stumpf PG, March CM. Febrile morbidity following hysterosalpingography: identification of risk factors and recommendations for prophylaxis. Fertil Steril 1982; 33:487-492.

Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 2000; 283:2008-12.

The practice committee of the American Society for Reproductive Medicine. Effectiveness and treatment for unexplained infertility. Fertil Steril 2006; 86:S111-114.

Tulandi T, Platt R. The art of taking a history. Fertil Steril 2004; 81:11-2.

Swart P, Mol BW, van der Veen F, van Beurden M, Redekop WK, Bossuyt PM. The accuracy of hysterosalpingography in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1995; 64:486-91.

Te Linde’s Operative Gynecology. Editors Thompson JD, Rock JA. 1992. 7th edition. p. 361. Lippincott Company.

Trimbos-Kemper GCM. Tubachirurgie (Thesis). Leiden. 1981.

Weström L. Incidence, prevalence, and trends of acute pelvic inflammatory disease and its consequences in industrialized countries. Am J Obstet Gynecol 1980; 138:880-92.

Weström L, Joesoef R, Reynolds G, Hagdu A, Thompson SE. Pelvic inflammatory disease and fertility. A cohort study of 1,844 women with laparoscopically verified disease and 657 control women with normal laparoscopic results. Sex Transm Dis 1992; 19:185-92.

Weström L, Wölner-Hanssen P. Pathogenesis of pelvic inflammatory disease. Genitourin Med 1993; 69:9-17.

Whiting P, Rutjes AW, Dinnes J, Reitsma J, Bossuyt PM, Kleijnen J. Development and validation of methods for assessing the quality of diagnostic accuracy studies. Health Technol Assess 2004; 8):iii, 1-234.

23138 Coppus.indd 22 28-09-12 14:55

23138 Coppus.indd 23 28-09-12 14:55

23138 Coppus.indd 24 28-09-12 14:55

Part 1Methodological considerations

23138 Coppus.indd 25 28-09-12 14:55

23138 Coppus.indd 26 28-09-12 14:55

2Quality of reporting of test accuracy

studies in reproductive medicine:

impact of the standards for

reporting of diagnostic

accuracy (STARD)-initiative.

Sjors CoppusFulco van der Veen

Patrick BossuytBen Willem Mol

Fertility and Sterility 2006, 86: 1321-1329

The content of this article was presented as a Prize Paper Candidate session oral presentation at the 61st ASRM Annual Meeting, 2005, Montreal, Canada.

23138 Coppus.indd 27 28-09-12 14:55

Chapter 2

28

Abstract

BACKGROUND: To evaluate the extent to which test accuracy studies

published in two leading reproductive medicine journals in the years

1999 and 2004 adhered to the Standards for Reporting of Diagnostic

Accuracy Studies (STARD) initiative parameters, and to explore whether

the introduction of the STARD statement has led to an improved quality of

reporting.

METHODS: Structured literature search. Articles that reported on the

diagnostic performance of a test in comparison with a reference standard

were eligible for inclusion. For each article we scored how well the twenty-

five items of the STARD checklist were reported. These items deal with the

study question, study participants, study design, test methods, reference

standard, statistical methods, reporting of results and conclusions. We

calculated the total number of reported STARD items per article, summary

scores for each STARD item and the average number of reported STARD

items per publication year.

RESULTS: We found 24 studies reporting on test accuracy in reproductive

medicine in 1999 and 27 studies in 2004. The mean number of reported

STARD items for articles published in 1999 was 12.1 ± 3.3 (range 6.5-20) and

12.4 ± 3.2 (range 7-17.5) in 2004, after publication of the STARD statement.

Overall, less than half of the studies reported adequately on 50% or more of

the STARD items. The reporting of individual items showed a wide variation.

There was no significant improvement in mean number of reported items

for the articles published after the introduction of the STARD statement.

CONCLUSIONS: Authors of test accuracy studies in the two leading fertility

journals poorly report the design, conduct, methodology and statistical

analysis of their study. Strict adherence to the STARD guidelines should be

encouraged.

23138 Coppus.indd 28 28-09-12 14:55

Quality of reporting of diagnostic studies

29

2

Introduction

The number of available diagnostic tests is growing exponentially. Evaluation of

their accuracy in ruling in or ruling out disease is increasingly important as these

data can help us identify which tests are useful to perform.

A problem in test accuracy studies is that rigorous methodological standards

for research on diagnostic tests have been slower to develop than standards for

studies of therapeutic interventions. This could be explained by the fact that test

evaluation research is less straightforward than the evaluation of therapy, where

randomized controlled trials have become the de facto standard method for

evaluating effectiveness. In the last 10 years, more and more empirical evidence

has been gained on design features of test accuracy studies associated with bias

and lack of applicability. Several factors can threaten the internal and external

validation of accuracy studies. One study showed that the methodological quality

of studies of diagnostic accuracy, as published in four major medical journals

between 1978 and 1993, was mediocre at best. Information on key elements of

design, conduct and analysis was often not reported (1).

In 1999, Lijmer et al. (2) showed that many reports of diagnostic tests contain

methodological shortcomings that can result in an overestimation of diagnostic

test accuracy. A systematic review showed that a number of issues in the design

and conduct of diagnostic accuracy studies can lead to bias or variation, with the

best-documented effects found for demographic features, disease prevalence and

severity, partial verification bias, clinical review bias, and observer and instrument

variation (3).

In reproductive medicine, tests are ordered to establish the presence or absence

of disease or to predict clinical outcome such as ovarian response or pregnancy.

In diagnostic accuracy studies the results of a test are compared to those of a

reference standard, the best available method to detect the presence of absence

of disease. These results are expressed then expressed in accuracy statistics, such

as sensitivity and specificity. In studies on the prognostic performance of tests,

test results are compared with a clinically relevant endpoint and statistics are

calculated to express the predictive capacity of the test.

23138 Coppus.indd 29 28-09-12 14:55

Chapter 2

30

To interpret the results of clinical studies on testing correctly, readers must

understand the design, conduct, and data analysis and must be able to judge the

internal validity and generalizability of the results. This goal can only be achieved

through complete transparency of reporting in the articles. In recent years, a

growing number of guidelines have been published to support authors of medical

articles to clearly report the methodology used. For randomised clinical trials, the

CONSORT statement (Consolidated Standards of Reporting Trials) was published in

1996 (4) and in a revised form in 2001 (5). For meta-analyses QUOROM (Quality of

Reporting of Meta-analyses) and MOOSE (Meta-analysis of Observational Studies

in Epidemiology) are available (6, 7).

For diagnostic studies, a group of scientists and editors developed the STARD

(Standards for Reporting of Diagnostic Accuracy) statement to improve the

reporting quality of studies on diagnostic accuracy. This STARD statement contains

a 25-item checklist and a prototypical flow chart that helps authors to write a

proper report of a diagnostic accuracy study and enables readers to assess the

potential for bias in a study and to evaluate the generalizability of the results.

STARD was primarily developed for authors to ensure completeness of reporting,

not for reviewers to determine the quality of the study methodology.

In January 2003, this statement was first published simultaneously in eight

medical journals, (American Journal of Clinical Pathology, Annals of Internal Medicine,

British Medical Journal, Clinical Chemistry, Clinical Biochemistry, Clinical Chemistry

and Laboratory Medicine, the Lancet and Radiology) together with an explanation

and elaboration document published in Clinical Chemistry (8, 9). Since then, it

has been published or promoted in several other biomedical journals, including

Academic Radiology, British Journal of Obstetrics and Gynaecology, Clinical Radiology,

Gynecological Oncology, JAMA, Annals of Clinical Microbiology and others (10-15).

In reproductive medicine, neither Human Reproduction nor Fertility and Sterility

published STARD or promoted this statement with an editorial. In the instructions

for authors of these journals on how to prepare a paper, only Human Reproduction

lists the STARD statement as compulsory when writing an article on test accuracy.

Since it is not known how reports on diagnostic and prognostic accuracy in

reproductive medicine comply with the STARD-checklist, and to explore whether

23138 Coppus.indd 30 28-09-12 14:55


31

2

the introduction of the STARD-statement has improved the quality of report, we

conducted a systematic literature survey in which the quality of reporting of these

studies published in Fertility and Sterility and Human Reproduction was assessed for

the years 1999 (pre-STARD) and 2004 (post-STARD).

Materials and Methods

Study selection and data extractionWe performed a systematic search in all issues of Fertility and Sterility and Human

Reproduction published in 1999 (pre-STARD) and in 2004 (post-STARD) for articles

reporting on the diagnostic or prognostic accuracy of a test. A search was run in

Pubmed on journal name and publication year. This search was then limited to

articles that dealt with human research and had an abstract. Hereafter we excluded

all (randomized) clinical trials, meta-analyses, letters, editorials, practice guidelines

and reviews with the search limit options of the search system.

We screened the remaining articles for eligibility based on title and abstract. Articles

were eligible for inclusion if they reported on primary studies on diagnostic or

prognostic accuracy. Diagnostic accuracy studies were defined as the assessment

of a new test in comparison with a reference standard. Prognostic accuracy studies

were defined as studies in which a test was performed to predict clinical outcomes

in ART such as ovarian hyperstimulation syndrome, ovarian response, or pregnancy.

A flowchart of the study selection process is shown in Fig. 1.

All selected articles were closely examined on the extent to which they reported the

STARD criteria. A standardized score form for the quality of reporting of diagnostic

accuracy studies that had been developed and tested at the department of Clinical

Epidemiology and Biostatistics from the Academic Medical Centre, University of

Amsterdam was used for this purpose (16). A single trained reviewer scored all

articles, with a secondary reviewer (BWJM) checking a random sample of 20% to

ensure accuracy in interpretation of the articles. Inter-rater agreement between

these two reviewers was expressed as overall agreement percentage and more

formally tested with the kappa statistic.

23138 Coppus.indd 31 28-09-12 14:55

Chapter 2

32

Pubmed search for articles published in Fertility and Sterility and Human Reproduction in 1999 and 2004 (n = 2566)

Search limited to human research with available abstract (n = 2127)

Excluded: n = 439

Excluded: n = 670: • Trial n = 452 • Editorial n = 4 • Letter n = 57 • Meta-analysis n = 16 • Practice guideline n = 14 • Review n = 127

Potentially eligible articles identified and screened for retrieval based on title and abstract (n = 1457) !

Potentially eligible articles (n = 60)

Excluded: n = 1397

Assessment of the quality of reporting on diagnostic and prognostic test accuracy studies in reproductive medicine (n = 51)

Excluded: n = 9 No diagnostic or prognostic test evaluation n = 7 Correspondence n = 2

FIGURE 1. Flowchart showing search and selection process of articles reporting on diagnostic or prognostic test accuracy.

Statistical analysisThe 25-items of the STARD list can be divided into items addressing formulation

of the study question, study participants, study design, test methods, reference

standard, statistical methods, report of results and conclusions (Table 1). For

each item of the STARD statement, the total number of articles reporting all the

23138 Coppus.indd 32 28-09-12 14:55


33

2

elements needed for that item was summed. Equal weights were applied to each

item. The total number of reported STARD items, was also calculated for each

article by summing the number of reported items (0-25 points possible). A higher

number of items indicates better reporting and visa versa.

Six items (items 8, 9, 10, 11, 13 and 24) concern both the index test as well as the

reference test. If only the index test or only the reference standard was reported

adequately, this item was scored as 0.5 point. The overall mean, range, and standard

deviation of the total number of reported STARD items were calculated. Testing for

significant improvement of the total number of reported STARD items 1 year after

the introduction of this statement was performed with an unpaired two-tailed

t-test for independent samples, with the significance level set at P<0.05.

Results

A total number of 2127 articles reporting on human research with an available

abstract published in 1999 and 2004 in Fertility and Sterility and Human Reproduction

were identified. After limiting the search, as described in the method section, the

remaining 1457 articles were screened for eligibility. Of these, 51 (3.5%) were

marked as prognostic or diagnostic accuracy studies (17-67). Twenty-eight of

these were published in Fertility and Sterility and 23 in Human Reproduction.

Agreement between the two reviewers was good with an overall agreement

percentage of 91.5%. The kappa-statistic had a value of 0.82, indicating a very

good agreement beyond change.

The mean number of reported STARD items in the 24 accuracy reports published

in 1999 was 12.1 (SD 3.3) with a range of 6.5 to 20 on a scale from 0 to 25 points. In

2004, the 27 included articles reported a mean of 12.4 (SD 3.2) items with a range

of 7 to 17.5 (Fig. 2). The mean difference was a nonsignificant 0.3 points (P=0.7). No

subgroup analysis was performed. In both years, slightly less than half of studies

(11/24 and 13/27) reported on more than 50% of the items (≥12.5 items). The

maximum number of reported items in a single article in 1999 was 20 of the 25

items versus 17.5 of the 25 items in 2004.

The reporting of individual items showed a wide variation (0% to 96%), which

is shown in Table 1. Although we interpreted item 1 more broadly (allowing

23138 Coppus.indd 33 28-09-12 14:55

Chapter 2

34

“diagnostic” to be read as “prognostic”), articles about diagnostic and prognostic

accuracy were indexed poorly, with only 29% and 19% of articles clearly indexing

the study as an article reporting on diagnostic or prognostic accuracy in the way

that is recommended by the STARD committee. The aim of the study was stated

in more than 80% of articles, with an adequate description of the test under study,

the reference test, and target condition or outcome of interest.

1999

year

2004

Num

ber

of r

epor

ted

STA

RD it

ems

20

18

16

14

12

10

8

6

FIGURE 2. Median, interquartile ranges, and the range of adequately reported STARD items in 1999 and 2004.

The methods section lacked a detailed definition of the study population in which

the tests were carried out in 30% to 40% of studies. Of the 16 items in the method

section -item 8, 9, 10, 11 and 13 counted twice, for index test and reference test

separately- only three of these were reported in more than 75% of the articles

in 1999, a number that increased to six in 2004. For 1999 these items were the

description of participant recruitment, of data collection and adequate description

of the index test (items 4, 6 and 8a). In 2004, these items were participant

recruitment, participant sampling, data collection, description of the index test,

and definition of and rationale for units and cutoffs for the index test and reference

23138 Coppus.indd 34 28-09-12 14:55


35

2

standard (items 4, 5, 6, 8a, 9a and 9b). The reference standard and its rationale or

the outcome of interest was only reported in 52% of the studies.

TABLE 1. Report of items of the STARD statement in 51 articles of diagnostic and prognostic accuracy published in 1999 and 2004.

1999¶ 2004¶¶

Category and item no. (n=24) (n=27)

Title, abstract and keywords

1. Identify the article as a study of diagnostic accuracy (recommend MESH heading “sensitivity and specificity”)

7 (29) 5 (19)*

Introduction

2 State the research question or study aims, such as estimating diagnostic accuracy or comparing accuracy between tests or across participant groups.

21 (88) 22 (81)

Methods

3 Describe the study population: the inclusion and exclusion criteria, setting, and locations where data were collected.

17 (71) 17 (63)

4 Describe participant recruitment: was recruitment based on presenting symptoms, results from previous tests, or the fact that the participants had undergone the index test(s) or the reference standard?

20 (83) 26 (96)

5 Describe participant sampling: was the study population a consecutive series of participants defined by the selection criteria in item 3 and 4? If not, specify how participants were further selected.

15 (63) 21 (78)

6 Describe data collection: was data collection planned before the index test and reference standard were performed (prospective study) or after (retrospective study)?

19 (79) 23 (85)

7 Describe the reference standard and its rationale. 10 (42) 14 (52)

8 Describe technical specifications of material and methods involved including how and when measurements were taken, and/or cite references for index tests and reference standard. a. For the index test 23 (96) 24 (89)b. For the reference standard 16 (67) 16 (59)

9 Describe definition of and rationale for the units, cut-offs and/or categories of the results of the index test and the reference standard.a. For the index test 16 (67) 25 (93)b. For the reference standard 16 (67) 21 (78)

10 Describe the number, training and expertise of the persons executing and reading the index tests and the reference standard.a. For the index test 6 (25) 5 (19)b. For the reference standard 2 (8) 2 (7)

11 Describe whether or not the readers of the index tests and reference standard were blind (masked) to the results of the other test and describe any other clinical information available to the readers.a. For the index test 10 (42) 7 (26)b. For the reference standard 4 (17) 6 (22)

12 Describe methods for calculating or comparing measures of diagnostic accuracy, and the statistical methods used to quantify uncertainty (e.g. 95% confidence intervals).

2 (8) 3 (11)

13 Describe methods for calculating test reproducibility, if done.a. For the index test 14 (62) 13 (48)b. For the reference standard 5 (27) 1 (4)

23138 Coppus.indd 35 28-09-12 14:55

Chapter 2

36

Results

14 Report when study was done, including beginning and ending dates of recruitment.

14 (58) 22 (81)

15 Report clinical and demographic characteristics of the study population (e.g. age, sex, spectrum of presenting symptoms, comorbidity, current treatments, recruitment centers).

16 (67) 19 (70)

16 Report the number of participants satisfying the criteria for inclusion that did or did not undergo the index tests and/or the reference standard; describe why participants failed to receive either test (a flow diagram is strongly recommended).

19 (89) 19 (70)

17 Report time interval from the index tests to the reference standard, and any treatment administered between them.

16 (67) 17 (63)

18 Report distribution of severity of disease (define criteria) in those with the target condition; other diagnoses in participants without the target condition.

9 (38) 9 (33)

19 Report a cross tabulation of the results (including indeterminate and missing results) by the results of the reference standard; for continuous results report the distribution of the test results by the results of the reference standard.

16 (67) 12 (44)

20 Report any adverse events from performing the index tests or the reference standard.

0 1 (4)

21 Report measures of diagnostic accuracy and measures of statistical uncertainty (e.g., 95% confidence intervals).

2 (8) 6 (22)

22. Report how indeterminate results, missing responses, and outliers of the index tests were handled.

1 (4) 4 (15)

23. Report estimates of variability of diagnostic accuracy between subgroups of participants, readers or centers, if done.

4 (17) 6 (22)

24. Estimates of test reproducibility, if donea. For the index test 8 (33) 6 (22)b. For the reference standard 0 0

Discussion

25. Discussion of the clinical applicability of the study findings. 22 (92) 26 (96)

* Number in parentheses expresses percentage ¶ Mean STARD score, 12.1 ± 3.3; range 6.5-20 ¶¶ Mean STARD score, 12.4 ± 3.2; range 7-17.5

Information concerning the index test was generally better reported than that for the

reference standard (items 8-13 and 24). Notably information regarding the number

and training of the persons executing and evaluating the index test and reference

test, and blinding of readers to the tests was reported in a minority of reports. In

2004, methods for calculating measures of diagnostic or prognostic accuracy, such

as sensitivity, specificity, likelihood ratios, predictive values and ROC curves, were

reported in only three articles. These were also the only articles that reported a

statistical method used to quantify uncertainty, in all cases 95% confidence intervals.

Indices of accuracy and the corresponding confidence intervals were reported in

another three articles. Reporting indices of accuracy was done more often in articles

in 1999. However, because the STARD statement invites authors to report confidence

intervals as well, no more than 2 articles fulfilled item 12 (data not shown).

23138 Coppus.indd 36 28-09-12 14:55


37

2

The study population in which tests were executed was reported in 71% and

63% of articles. Of the 11 items in the result section, study period, clinical and

demographic characteristics, and time interval between tests were reported with

a frequency of more than 60%. A flow diagram, which is strongly encouraged by

the STARD steering committee, was presented in only 3 of the 51 articles.

Discussion

The results of this study indicate that the quality of reporting in articles on test

accuracy in reproductive medicine is poor to mediocre. For both publication

years examined, more than 50% of articles report less than 50% of STARD items.

The mean number of reported STARD items was 12.1 for the articles published in

1999 and 12.4 for articles published in 2004. If we compare our results to data on

quality of reporting from other fields of medicine, the standards of reporting in

reproductive medicine are quite comparable to these other specialities. Articles

published in the year 2000 in journals with impact factor of four or higher that

regularly publish articles on diagnostic accuracy, reported 11.9 STARD items on

average, with only 41% of articles reporting on more than 50% of STARD items

(16). In ophthalmology, only 44% of the 16 diagnostic accuracy studies published

in 2002 in the five leading ophthalmic journals reported more than half of the

STARD items (68).

Description of study population was reported in 60% to 70% of the articles.

Although this may seem a reasonable score, in our opinion this percentage is

worrisome. An absent or incomplete description of the study population affects

the generalizability of the study. Empirical evidence exists that this may be

accompanied by overestimated test accuracy (2).

The indexing of the articles as diagnostic studies fulfilled the criteria of the STARD

list in only a minority of reports. This is of concern due to the following reason:

with the number of diagnostic meta-analyses growing, electronic databases

have become indispensable tools to identify studies. To facilitate retrieval of their

diagnostic study, authors should identify their studies as such. STARD recommends

the use of the term “diagnostic accuracy” in the title or abstract of a report that

compares the results of index test(s) with that of a reference standard. The MESH

23138 Coppus.indd 37 28-09-12 14:55

Chapter 2

38

heading “Sensitivity and Specificity” is also recommended. Several reports however,

have shown that a search with this term does not identify all accuracy studies,

while incorrectly identifying many articles as such. “Sensitivity and Specificity” may

therefore be a less suitable set of keywords (69-70). Using the term “diagnostic

accuracy” in title or abstract would enable the reader to easily and quickly identify

a test accuracy study as such.

In the present study, we only valued the quality of reporting (defined as the number

of reported STARD items), not the methodological quality or the degree of bias of

the studies. If a study reported that the evaluation of the index and reference test

was not blinded and this was clearly stated as such in the report, item 11 was

considered to be well reported. Yet this may indicate a possible methodological

shortcoming, depending on the amount of subjectivity associated with reading

both tests. Good reporting though allows the reader-clinician to judge the

potentials for bias in a study. The Quality Assessment of Studies of Diagnostic

Accuracy Included in Systematic Reviews (QUADAS) tool, which contains items

concerning bias in diagnostic accuracy studies, is a checklist that can be used to

judge the potential for bias in a diagnostic accuracy study, and has been developed

to be used in systematic reviews of diagnostic test accuracy (71). Some of the items

listed in the QUADAS tool, were also selected by the US Department of Health and

Human Services in their evidence-based practice report on systems to rate the

strength of scientific evidence (www.ahrq.gov/clinic/tp/strengthtp.htm).

Confidence intervals (CIs), which are important to judge the reliability of the

estimates of diagnostic accuracy, were reported in only eight studies. It is

noteworthy that, although in studies of effects CIs are more or less obligatory;

these are much less being reported in studies of diagnostic accuracy. Like measures

of effect in therapeutic trials, estimates of diagnostic accuracy are also subject to

chance variation, and authors should quantify the amount of uncertainty around

their observed values. Methods for calculating the precision around frequently

used measures of diagnostic accuracy are readily available in literature (72, 73).

One other review revealed that CIs were reported in only 50% of 16 diagnostic

evaluation reports published in the BMJ in 1996 and 1997 (74). In our study

confidence intervals were reported in only 16% of studies.

23138 Coppus.indd 38 28-09-12 14:55


39

2

In this study, we evaluated whether or not information on each of the 25 STARD

items was reported. Although we have referred to this evaluation in this manuscript

as the “quality of reporting”, the latter may be defined by an additional number of

features, such as clarity of the language used for reporting, absence of ambiguity,

consistency between abstract, methods, results and discussion. Such a more

detailed analysis was beyond the scope of our paper.

It has been shown that the introduction of a guideline on how to report

randomized controlled trials has led to improvement in the quality of reporting

of these studies. Articles in journals that had adopted CONSORT showed greater

improvement in quality of reporting than did articles in a journal that did not

advocate its use (75). A recent study provided additional evidence to suggest a

role for checklists. Articles published in Clinical Chemistry - a journal that adopted a

preliminary checklist for the report of diagnostic accuracy studies as early as 1996

- showed a greater improvement in reporting from 1996 to 2002 than articles

published in Clinical Chemistry and Laboratory Medicine, a journal that did not use

a similar checklist before (76).

Our study was unable to detect a significant improvement of the quality of

reporting between 1999 and 2004 in a “before-and-after STARD” evaluation. A

possible explanation may be that the time involved between submitting an article

to a journal and the final acceptation for publication (the so-called “peer-review

time”) did not allow authors to incorporate the STARD-guidelines into their articles.

Another possibility might be that authors, reviewers and editors are less familiar

with the current STARD checklist than with others such as CONSORT.

Possible limitations of our study are the relatively small number of studies included,

and the fact that we applied the STARD list also to prognostic accuracy studies. As

far as the first limitation is concerned, we decided to include studies published from

January 2004 onward, specifically with the aim of reducing the effect of the above

mentioned peer review time. This obviously limited the number of eligible articles.

With respect to the second limitation, we believe that it is justified to apply the

STARD-criteria to prognostic accuracy studies as well, as in reproductive medicine

these studies have a striking similarity with diagnostic test accuracy studies. For

the evaluation of the prognostic accuracy of tests, transparent reporting of study

23138 Coppus.indd 39 28-09-12 14:55

Chapter 2

40

execution, participants, flow of participants throughout the study, statistical

uncertainty, definition of outcome and rationale, reproducibility of test results, and

so on, is as essential as it is with diagnostic test accuracy studies.

In conclusion, this study shows that the quality of reporting on diagnostic and

prognostic accuracy studies in reproductive medicine is comparable to other fields

of medicine but leaves ample room for improvement. Authors, reviewers, editors

and readers should be aware of the STARD criteria for reporting on diagnostic/

prognostic accuracy studies. We welcome stricter adherence to this guideline.

AcknowledgementsThe authors thank Nynke Smidt, Ph.D., from the Institute for Research in Extramural Medicine, VU University Medical Center Amsterdam and Anne W.S. Rutjes, Ph.D., from the department of Clinical Epidemiology and Biostatistics of the Academic Medical Centre, University of Amsterdam, for generously providing the STARD scoring form.

23138 Coppus.indd 40 28-09-12 14:55


41

2

References

1. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 1995;274:645-51.

2. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061-6. Erratum in: JAMA 2000; 283:1963.

3. Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 2004;140:189-202.

4. Altman DG. Better reporting of randomised controlled trials: the CONSORT statement. BMJ 1996; 313:570-1.

5. Moher D, Schulz KF, Altman D; CONSORT Group (Consolidated Standards of Reporting Trials). The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 2001;285:1987-91.

6. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF. Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 1999; 354:1896-1900.

7. Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. JAMA 2000; 283:2008-2012.

8. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. BMJ 2003;326:41-44.

9. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49:7-18.

10. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Toward complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Acad Radiol 2003;10:599-600.

11. Khan KS, Bakour SH, Bossuyt PM. Evidence-based obstetric and gynaecologic diagnosis: the STARD checklist for authors, peer-reviewers and readers of test accuracy studies. BJOG 2004;111:638-40.

12. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Clin Radiol 2003;58:575-80.

13. Selman TJ, Khan KS, Mann CH. An evidence-based approach to test accuracy studies in gynecologic oncology: the ‘STARD’ checklist. Gynecol Oncol 2005;96:575-8.

14. Rennie D. Improving reports of studies of diagnostic tests: the STARD initiative. JAMA 2003;289:1061-6.

15. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al. Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Ann Clin Biochem 2003;40:357-63.

16. Smidt N, Rutjes AW, van der Windt DA, Ostelo RW, Reitsma JB, Bossuyt PM, et al. Quality of reporting of diagnostic accuracy studies. Radiology 2005;235:347-53.

23138 Coppus.indd 41 28-09-12 14:55

Chapter 2

42

17. Coulam CB, Goodman C, Rinehart JS. Colour Doppler indices of follicular blood flow as predictors of pregnancy after in-vitro fertilization and embryo transfer. Hum Reprod 1999;14:1979-82.

18. Noci I, Maggi M, Biagiotti R, D’Agata A, Criscuoli L, Marchionni M. Serum CA-125 values on the day of oocyte retrieval are not predictive of subsequent pregnancy with in-vitro fertilization. Hum Reprod 1999; 14:1773-76.

19. Bjercke S, Tanbo T, Dale PO, Morkrid L, Abyholm T. Human chorionic gonadotrophin concentrations in early pregnancy after in vitro-fertilization. Hum Reprod 1999;14:1642-46.

20. Yuval Y, Lipitz S, Dor J, Achiron R. The relationships between endometrial thickness, and blood flow and pregnancy rates in in-vitro fertilization. Hum Reprod 1999;14:1067-71.

21. Hall JE, Welt CK, Cramer DW. Inhibin A and inhibin B reflect ovarian function in assisted reproduction but are less useful at predicting outcome. Hum Reprod 1999;14:409-15.

22. Urman B, Alatas C, Aksoy S, Mercan R, Isiklar A, Balaban B. Elevated serum progesterone level on the day of human chorionic gonadotropin administration does not adversely affect implantation rates after intracytoplasmic sperm injection and embryo transfer. Fertil Steril 1999:72:975-79.

23. Branigan EF, Estes MA, Muller CH. Advanced semen analysis: a simple screening test to predict intrauterine insemination success. Fertil Steril 1999;71:547-551.

24. Evenson DP, Jost LK, Marshall D, Zinaman MJ, Clegg E, Purvis K et al. Utility of the sperm chromatin structure assay as a diagnostic and prognostic tool in the human fertility clinic. Hum Reprod 1999;14:1039-49.

25. Kinkel K, Chapron C, Balleyguier C, Fritel X, Dubuisson J, Moreau J. Magnetic resonance imaging characteristics of deep endometriosis. Hum Reprod 1999;14:1080-86.

26. Kataoka ML, Togashi K, Kobayashi H, Inoue T, Fujii S, Konishi J. Evaluation of ectopic pregnancy by magnetic resonance imaging. Hum Reprod 1999;14:2644-50.

27. Yamamoto Y, Sofikitis N, Mio Y, Miyagawa I. Highly sensitive quantitative telomerase assay of diagnostic testicular biopsy material predicts the presence of haploid spermatogenic cells in therapeutic testicular biopsy in men with Sertoli cell-only syndrome. Hum Reprod 1999;14:3041-47.

28. Campo R, Gordts S, Rombauts L, Brosens I. Diagnostic accuracy of transvaginal hydrolaparoscopy in infertility. Fertil Steril 1999:71:1157-1160.

29. Huey S, Abuhamad A, Barroso G, Hsu MI, Kolm P, Mayer J, Oehninger S. Perifollicular blood flow Doppler indices, but not follicular pO

2, pCO

2, of pH, predict oocyte developmental

competence in in vitro fertilization. Fertil Steril 1999;72:707-712.

30. Agrawal R, Tan SL, Wild S, Sladkevicius P, Engmann L, Payne N et al. Serum vascular endothelial growth factor concentrations in in vitro fertilization cycles predict the risk of ovarian hyperstimulation syndrome. Fertil Steril 1999;71:287-293.

31. Mol BWJ, Hajenius PJ, Engelsbel S, Ankum WM, van der Veen F, Hemrika DJ, Bossuyt PMM. Are gestational age and endometrial thickness alternatives for serum human chorionic gonadotropin as criteria for the diagnosis of ectopic pregnancy? Fertil Steril 1999;72:643-645.

32. Kitawaki J, Kusuki I, Koshiba H, Tsukamoto K, Fushiki S, Honjo H. Detection of aromatase cytochrome P-450 in endometrial biopsy specimens as a diagnostic test for endometriosis. Fertil Steril 1999;72:1100-1106.

23138 Coppus.indd 42 28-09-12 14:55


43

2

33. Nowacek GE, Meyer WR, McMahon MJ, Thorp JR, Wells SR. Diagnostic value of cervical fetal fibronectin in detecting extrauterine pregnancy. Fertil Steril 1999;72:302-4.

34. Daniel Y, Geva E, Lerner-Geva L, Eshed-Englender T, Gamzu R, Lessing JB et al. Levels of vascular endothelial growth factor are elevated in patients with ectopic pregnancy: is this a novel marker? Fertil Steril 1999;72:1013-1017.

35. Azziz R, Hincapie LA, Knochenhauer ES, Dewailly D, Fox L, Boots LR. Screening for 21-hydroxylase-deficient nonclassic adrenal hyperplasia among hyperandrogenic women: a prospective study. Fertil Steril 1999;72:915-925.

36. Güleki B, Bulbul Y, Onvural A, Yorukoglu K, Posaci C, Demir N, Erten O. Accuracy of ovarian reserve tests. Hum Reprod 1999: 14:2822-26.

37. Ludwig M, Jelkmann W, Bauer O, Diedrich K. Prediction of severe ovarian hyperstimulation syndrome by free serum vascular endothelial growth factor concentration on the day of human chorionic gonadotrophin administration. Hum Reprod 1999;14:2437-41.

38. Haddad B, Abirached F, Louis-Sylvestre C, Le Blond J, Paniel BJ, Zorn JR. Predictive value of early human chorionic gonadotrophin serum profiles for fetal growth retardation. Hum Reprod 1999;14:2872-75.

39. Hata T, Yanagihara T, Hayashi K, Yamashiro C, Ohnishi Y, Akiyama M et al. Three-dimensional ultrasonographic evaluation of ovarian tumours: a preliminary study. Hum Reprod 1999;14:858-61.

40. Lashen H, Afnan M, McDougall L, Clark P. Prediction of over-response to ovarian stimulation in an intrauterine insemination program. Hum Reprod 1999;14:2751-2754.

41. Frazier LM, Grainger DA, Schieve LA, Toner JP. Follicle-stimulating hormone and estradiol levels independently predict the success of assisted reproductive technology treatment. Fertil Steril 2004;82:834-40.

42. Volpes A, Sammartano F, Coffaro F, Mistretta V, Scaglione P, Allegra A. Number of good quality embryos on day 3 is predictive for both pregnancy and implantation rates in in vitro fertilization/intracytoplasmic sperm injection cycles. Fertil Steril 2004;82:1330-1336.

43. Ioannidis G, Sacks G, Reddy N, Seyani L, Margara R, Lavery S, Trew G. Day 14 maternal serum progesterone levels predict pregnancy outcome in IVF/ICSI treatment cycles: a prospective study. Hum Reprod Advance Access published Dec 9, 2004.

44. Hazout A, Bouchard P, Seifer DB, Aussage P, Junca AM, Cohen-Bacrie P. Serum antimüllerian hormone/müllerian-inhibiting substance appears to be a more discriminatory marker of assisted reproductive technology outcome than follicle-simulating hormone, inhibin B, or estradiol. Fertil Steril 2004;82:1323-1329.

45. Bungum M, Humaidan P, Spano M, Jepson K, Bungum L, Giwercman A. The predictive value of sperm chromatin structure assay (SCSA) parameters for the outcome of intrauterine insemination, IVF and ICSI. Hum Reprod 2004;19:1401-08.

46. De Neuborg, Gerris J, Mangelschots K, van Royen E, Vercruyssen M, Elseviers M. Single top quality embryo transfer as a model for prediction of early pregnancy outcome. Hum Reprod 2004;19:1476-79.

47. Fasouliotis SJ, Spandorfer SD, Witkin SS, Schattman G, Liu HC, Roberts JE et al. Maternal serum levels of interferon-γ and interleukin-2 soluble receptor-α predict the outcome of early IVF pregnancies. Hum Reprod 2004;19:1357-63.

48. Blazar AS, Hogan JW, Frankfurter D, Hackett R, Keefe D. Serum estradiol positively predicts outcomes in patients undergoing in vitro fertilization. Fertil Steril 2004;81:1707-1709.

23138 Coppus.indd 43 28-09-12 14:55

Chapter 2

44

49. Frattarelli JL, Levi AJ, Miller BT, Segars JH. Prognostic use of mean ovarian volume in in vitro fertilization cycles: a prospective assessment. Fertil Steril 2004;82:811-815.

50. Hauzman E, Fedorcsák P, Klinga K, Papp Z, Rabe T, Strowitzki T et al. Use of serum inhibin A and human chorionic gonadotropin measurements to predict the outcome of in vitro fertilization pregnancies. Fertil Steril 81;66-71.

51. Bancsi LFJMM, Broekmans FJM, Looman CWN, Habbema JDF, te Velde ER. Impact of repeated antral follicle counts on the prediction of poor ovarian response in women undergoing in vitro fertilization. Fertil Steril 2004;81:35-41.

52. Fish KE, Phipps M, Trimarchi J, Weitzen S, Blazar A. CA-125 serum levels and pregnancy outcome in in vitro fertilization. Fertil Steril 2004;82:1705-07.

53. Wiener-Megnazi Z, Vardi L, Lissak A, Shnizer S, Zeev Reznick A, Ishai D et al. Oxidative stress indices in follicular fluid as measured by the thermochemiluminescence assay correlate with outcome parameters in in vitro fertilization. Fertil Steril 2004;82:1171-1176.

54. Virro MR, Larson-Cook KL, Evenson DP. Sperm chromatin structure assay (SCSA®) parameters are related to fertilization, blastocyst development, and ongoing pregnancy in in vitro fertilization and intracytoplasmatic sperm injection cycles. Fertil Steril 2004;81:1289-95.

55. Van Rooij IAJ, de Jong E, Broekmans FJM, Looman CWN, Habbema JDF, te Velde ER. High follicle-stimulating hormone levels should not necessarily lead to the exclusion of subfertile patients from treatment. Fertil Steril 2004;81:1478-1485.

56. Kin KH, Sik OH D, Heaok Jeong J, Sub Shin B, Sun Joo B, Sup Lee K. Follicular blood flow is a better predictor of the outcome of in vitro fertilization-embryo transfer than follicular fluid vascular endothelial growth factor and nitric oxide concentrations. Fertil Steril 2004;82:586-92.

57. Allamaneni SSR, Bandaranayake I, Agarwal A. Use of semen quality scores to predict pregnancy rates in couples undergoing intrauterine insemination with donor sperm. Fertil Steril 2004;82:606-11.

58. Kamiyama S, Teruya Y, Nohara M, Kanazawa K. Impact of detection of bacterial endotoxin in menstrual effluent on the pregnancy rate in in vitro fertilization and embryo transfer. Fertil Steril 2004;82:788-92.

59. Pal L, Kovacs P, Witt B, Jindal S, Santoro N, Barad D. Postthaw blastomere survival is predictive of the success of frozen-thawed embryo transfer cycles. Fertil Steril 2004;82:821-826.

60. Lédéé-Bataille N, Olivennes F, Kadoch J, Dubanchet S, Frydman N, Chaouat G et al. Detectable levels of interleukin-18 in uterine luminal secretions at oocyte retrieval predict failure of the embryo transfer. Hum Reprod 2004;19:1968-73.

61. Fasouliotis SJ, Spandorfer SD, Witkin SS, Liu HC, Roberts JE, Rosenwaks Z. Maternal serum vascular endothelial growth factor levels in early ectopic and intrauterine pregnancies after in vitro fertilization treatment. Fertil Steril 2004;82:309-13.

62. Potlog-Nahari C, Feldman AL, Stratton P, Koziol DE, Segars J, Merino MJ et al. CD10 immunohistochemical staining enhances the histological detection of endometriosis. Fertil Steril 2004;82:86-92.

63. Omodei U, Ferrazzi E, Ramazzotto R, Becorpi A, Grimaldi E, Scarselli G et al. Endometrial evaluation with transvaginal ultrasound during hormone therapy: a prospective multicenter study. Fertil Steril 2004;81:1632-1637.

64. Hubacher D, Grimes D, Lara-Ricalde R, de la Jara J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor infertility. Fertil Steril 2004;81:6-10.

23138 Coppus.indd 44 28-09-12 14:55


45

2

65. De Kroon CD, van der Sandt HAGM, van Houwelingen JC, Jansen FW. Sonographic assessment of non-malignant ovarian cysts: does sonohistology exist? Hum Reprod 2004;19:2138-43.

66. Den Hartog JE, Land JA, Stassen FRM, Slobbe-van Drunen MEP, Kessels AGH, Bruggeman CA. The role of Chlamydia genus-specific and species-specific IgG antibody testing in predicting tubal disease in subfertile women. Hum Reprod 2004;19:1380-84.

67. Somigliana E, Viganò P, Tirelli AS, Felicetta I, Torresani E, Vignali M et al. Use of the concomitant serum dosage of CA 125, CA 19-9 and interleukin-6 to detect the presence of endometriosis. Results from a series of reproductive age women undergoing laparoscopic surgery for benign gynaecological conditions. Hum Reprod 2004;19:1871-76.

68. Siddiqui MA, Azuara-Blanco A, Burr J. The quality of reporting of diagnostic accuracy studies published in ophthalmic journals. Br J Ophthalmol 2005;89:261-5.

69. Deville WL, Bezemer PD, Bouter LM. Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. J Clin Epidemiol 2000;53:65-69.

70. Bachmann LM, Coray R, Estermann P, Ter Riet G. Identifying diagnostic studies in MEDLINE: reducing the number needed to read. J Am Med Inform Assoc 2002;9:653-58.

71. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25.

72. Lang TA, Secic M. Generalizing from a sample to a population: reporting estimates and confidence intervals. In: Lang TA, Secic M, eds. How to report statistics in medicine: annotated guidelines for authors, editors, and reviewers. Philadelphia: American College of Physicians, 1997:55-63.

73. Habbema JDF, Eijkemans R, Krijnen P, Knottnerus JA. Analysis of data on the accuracy of diagnostic test. In: Knottnerus JA, ed. The evidence base of clinical diagnosis. London: BMJ Publishing Group, 2002:117-44.

74. Harper R, Reeves B. Reporting of precision of estimates for diagnostic accuracy: a review.BMJ 1999;318:1322-3.

75. Moher D, Jones A, Lepage L; CONSORT Group (Consolidated Standards for Reporting of Trials. Use of the CONSORT statement and quality of reports of randomized trials: a comparative before-and-after evaluation. JAMA 2001;285:1992-5.

76. Lumbreras-Lacarra B, Ramos-Rincon JM, Hernandez-Aguado I. Methodology in diagnostic laboratory test research in clinical chemistry and clinical chemistry and laboratory medicine. Clin Chem 2004;50:530-6.

23138 Coppus.indd 45 28-09-12 14:55

23138 Coppus.indd 46 28-09-12 14:55

3Evaluating prediction models in

reproductive medicine.

Sjors CoppusFulco van der Veen

Brent OpmeerBen Willem MolPatrick Bossuyt

Human Reproduction 2009, 24: 1774-1778

23138 Coppus.indd 47 28-09-12 14:55

Chapter 3

48

Abstract

Prediction models are used in reproductive medicine to calculate the

probability of pregnancy without treatment, as well as the probability of

pregnancy after ovulation induction, intrauterine insemination or in vitro

fertilization. The performance of such prediction models is often evaluated

with a receiver operating characteristic (ROC) curve. The area under the

ROC curve, also known as c-statistic, is then used as a measure of model

performance. The value of this c-statistic is low for most prediction

models in reproductive medicine. Here, we demonstrate that low values

of the c-statistic are to be expected in these prediction models, but we

also show that this does not imply that these models are of limited use

in clinical practice. The calibration of the model (the correspondence

between model-based probabilities and observed pregnancy rates) as well

as the availability of a clinically useful distribution of probabilities and the

ability to correctly identify the appropriate form of management are more

meaningful concepts for model evaluation.

23138 Coppus.indd 48 28-09-12 14:55

Evaluating prediction models

49

3

Introduction

In the evaluation of subfertile couples, clinicians have traditionally focused on

finding the underlying cause of subfertility. Yet, only in a minority of couples, a

causal diagnosis can be made, one which fully explains why conception has not

occurred. Examples of such causal diagnoses are anovulation, azoospermia and

bilateral tubal pathology. In the majority of couples, a relative factor is found, only

partially explaining why conception has not occurred. The latter factors include

advanced maternal age, mild male subfertility, cervical hostility, mild endometriosis

and one-sided tubal pathology. In other couples, no factors impairing fertility can

be found at all, and a diagnosis of unexplained subfertility is made.

A number of treatments options are available once a causal diagnosis has been

made. In anovulatory women, infertility can be corrected by ovulation induction.

Women with bilateral tubal disease can be treated with tubal surgery or offered

in vitro fertilisation (IVF). In men with azoospermia or oligoasthenozoospermia,

the inability to conceive can be overcome by surgical sperm retrieval and assisted

fertilization. Treatment of these couples is indicated, as their probability of natural

conception is virtually zero. In contrast to couples with a causal diagnosis, couples

with a relative factor as well as those with unexplained subfertility vary in their

prognosis. Whereas some have relatively good chances of achieving a pregnancy

without treatment, these chances are rather low in others (Collins et al., 1995). As

targeted treatments are not available for these conditions, empiric interventions

such as intrauterine insemination (IUI) and IVF are usually offered. IUI and IVF are

expensive and not without potential side effects, and these treatment options

should therefore be offered to couples only if their probability of conceiving

with treatment is sufficiently higher than their probability of conceiving without

treatment.

Clinical decision making in these couples should therefore be guided by these

probabilities (Habbema et al., 2004). A number of validated prediction models

are available to help the clinician in calculating the probability of pregnancy in

subfertile couples, with and without treatment (see Leushuis et al., 2009, for a

critical appraisal). There are models to calculate: the probability of a treatment

independent pregnancy (Eimers et al., 1994; Collins et al., 1995; Snick et al., 1997;

Hunault et al., 2004; Hunault et al., 2005; van der Steeg et al., 2007), the probability

23138 Coppus.indd 49 28-09-12 14:55

Chapter 3

50

of a pregnancy after IUI (Steures et al., 2004; Custers et al., 2007) and the probability

of a pregnancy after IVF (Templeton et al., 1996; Stolwijk et al., 1996; Smeenk et al.,

2000; Lintsen et al., 2007).

For most of these models, the ability to predict pregnancy has been evaluated

by receiver operating characteristic (ROC) curves. These curves are obtained by

comparing the proportion of couples exceeding a pre-set probability threshold

within those who achieve a pregnancy with the proportion also exceeding that

threshold in those who do not achieve a pregnancy in a specified time frame. This

is done for all possible probability thresholds. The area under the resulting ROC

curve (AUC), also known as c-statistic, expresses the extent to which a model can

identify couples who will become pregnant and is used as a measure of model

performance.

In general, the c-statistic for prediction models in reproductive medicine is rather

low, ranging between 0.59 and 0.64 for models for treatment independent

pregnancy, between 0.56 and 0.59 for IUI, and between 0.50 and 0.67 for IVF. A

graph of such a typical ROC curve is shown in Fig. 1, adapted from a paper on

the prediction of pregnancy after IUI (Custers et al., 2007). According to criteria

for evaluating areas under ROC curves, these values indicate low-to-modest

discriminatory performance (Swets, 1988). It is questionable whether the

c-statistic is the best way to express the predictive performance of a model and,

consequently, if a limited discriminatory capacity precludes the use of the model

in clinical practice. In this paper, we illustrate that the area under the ROC curve or

c-statistic has limited value in the evaluation of prediction models in reproductive

medicine. We argue that calibration and prognostic classification are more relevant

concepts for such models to guide clinical decision making.

c-Statistic in diagnostic research

In principle, the aim of diagnostic testing is to distinguish diseased from non-

diseased patients. The diagnostic accuracy of a test is studied by comparing the

result of the test under evaluation with the result of a reference standard in a series

of patients, the latter being the best available method for classifying a patient as

diseased or not. The results of the cross-classification can then be expressed as

23138 Coppus.indd 50 28-09-12 14:55


51

3

the sensitivity and specificity of the test under evaluation. The sensitivity, or true-

positive fraction, reflects the proportion of diseased patients with a positive test

result, whereas the specificity, or one minus the false-positive fraction, reflects the

proportion of patients without the disease with a negative test result.

1,0

0,8

0,6

0,4

0,2

0,00,0 0,2 0,4 0,6 0,8 1,0

Sens

itiv

ity

1-specificity

FIGURE 1. Typical ROC curve of a prediction model in reproductive medicine. This ROC curve derived from a paper in which a prediction model for pregnancy after IUI was externally validated, demonstrating an AUC of 0.56 (Custers et al., 2007). Sensitivity is defined here as the percentage of cycles not resulting in an ongoing pregnancy that was predicted correctly, and specificity was defined as the percentage of cycles that resulted in an ongoing pregnancy that was predicted correctly. Reprinted from Custers et al., (2007). Copyright 2007, with permission from Elsevier.

In case the studied test is of a continuous nature, the sensitivity and specificity of

the test depend on the cut-off value to define positive and negative test results.

Sensitivities and specificities can be calculated over the whole range of possible

cut-off values, where higher sensitivities are obtained at lower specificities and

visa versa. The resulting series of sensitivity-specificity pairs is plotted as an ROC

curve, showing the sensitivity versus the specificity for each possible cut-off value.

The AUC indicates how well the test discriminates between diseased and non-

diseased patients. An AUC of 1 implies perfect discrimination; whereas an AUC

of 0.5 means that the test does not discriminate at all (Hanley and McNeil, 1982).

Assuming an error-free reference standard, every diagnostic test has the potential

of being 100% discriminative.

23138 Coppus.indd 51 28-09-12 14:55

Chapter 3

52

c-Statistic in prognostic research

Prediction models in reproductive medicine evaluate whether a combination of

factors can predict the occurrence of pregnancy within a specified time period.

In contrast to the situation in diagnostic testing, in which the disease is present at

the moment of testing, pregnancy has not yet occurred at the time the potentially

predictive factors are measured. Whether or not pregnancy occurs can be

considered the outcome of a stochastic process occurring over time. This implies

that the value of a test cannot be expressed by the proportion of patients who

are pregnant at the time of testing, as this is essentially zero. The association has

to be expressed in terms of the probability that pregnancy will occur in the future.

This probability can be calculated for a single test or for a multivariable prediction

model; the latter most often developed using Cox proportional hazard modelling.

Such prediction models are often evaluated similar to diagnostic tests, using

ROC curves and their AUC or c-statistic. In doing so, the calculated probability of

pregnancy within a specified time period is regarded as the continuous test result,

and the reference standard is the actual occurrence of pregnancy within that

specified period of follow-up. Using different cut-off values for these probabilities

yields a series of sensitivity and specificity pairs, through which an ROC curve can

be drawn. For prediction models in reproductive medicine, the AUC or c-statistic

reflects how well the model is able to distinguish those with a future pregnancy

from those that will not achieve a pregnancy within the time frame specified, not

the degree to which couples with a higher probability will conceive first.

Although this may seem quite logical at prima facie, the crux of the matter is that

perfect discrimination is achievable only if the population of subfertile couples

consists of two distinct subpopulations: fertile couples that have not yet conceived

but will do so in the near future and infertile couples guaranteed not to conceive. It

is very unlikely that this is a fair representation of couples after an initial subfertility

workup. Subfertility is not dichotomous in nature but a complex multifactorial

phenomenon, corresponding to a gradual continuum of impaired reproductive

capacity. Couples with very good pregnancy chances will rarely enter a subfertility

workup, as they are the ones that usually have conceived spontaneously within

12 months after the start of unprotected intercourse. Most couples with a causal

diagnosis, the ones with poorest fertility prospects, are identified in the first part of

the subfertility workup and treated accordingly. With the two extremes eliminated,

23138 Coppus.indd 52 28-09-12 14:55


53

3

the remaining group of subfertile couples predominantly has an intermediate

prognosis.

When perfect discrimination is not achievable, the maximum value of the c-statistic

depends on the underlying distribution of probabilities. The lower the variability

of the probabilities, the lower the maximum value of the c-statistic will be (Gail

and Pfeiffer, 2005; Cook, 2007). Among subfertile couples, defined as couples that

have not conceived despite a year of unprotected intercourse, the probability

to conceive spontaneously within the next 12 months is more or less normally

distributed in couples with unexplained subfertility, with a mean probability of

30% (van der Steeg et al., 2007). As the distribution of probabilities in couples that

successfully conceive has considerable overlap with that of couples that are not

successful, the maximum c-statistic that can be reached with any model is in the

order of 0.62 (Gail and Pfeiffer, 2005; Cook 2007).

Calibration

So if pregnancy itself cannot be predicted with 100% accuracy, what then? We

should realize that the modern approach in subfertility treatment is to offer tailored

management to individual couples, in which treatment is offered only to couples

who have poor chances of a spontaneous pregnancy. Ideally, early treatment in

couples with high chances of natural conception should be avoided, as should

the delay of treatment in couples with poor prospects of pregnancy. Such an

approach could reduce costs and may prevent multiple pregnancies as well as

other complications of assisted reproductive technologies (Steures et al., 2006). For

this tailored treatment, a classification of couples is required. This classification will

not be in terms of those that will become pregnant versus those that will not, as

this is almost impossible. The relevant classification is one in terms of those for

whom the probability of becoming pregnant with treatment, be it IUI or IVF, is

sufficiently higher than the probability of becoming pregnant without treatment,

versus those for which this is not the case.

Prediction models in reproductive medicine should be judged on this capability.

One necessary condition is that we should be able to trust calculated probabilities.

The reliability of these model-based probabilities can be evaluated by inspecting

23138 Coppus.indd 53 28-09-12 14:55

Chapter 3

54

the calibration of the model: the degree to which calculated probabilities agree

with actual observed event rates. The level of calibration of a prediction model

can be explored by assigning couples to subgroups, based on the calculated

probabilities, and then, after follow-up, comparing the average calculated

probability in each subgroup with the proportion of couples that became

pregnant. One way of assigning couples to subgroups is using deciles of calculated

probabilities. A comparison of the mean probability in each subgroup versus the

observed proportion can be plotted in a calibration plot, as in Fig. 2. In case of

perfect calibration, all points in the calibration plot will be located on the diagonal,

the line of equality. In the left plot, probabilities are well calibrated, whereas the

middle and right plots in Fig. 2 show calculated probabilities that suffer from

underestimation and overestimation, respectively.

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

23138 Coppus.indd 54 28-09-12 14:55


55

3

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

1,0

0,9

0,8

0,7

0,6

0,5

0,4

0,3

0,2

0,1

0,00,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

FIGURE 2. Calibration plots with calculated probability on the X-axis and observed proportion on the Y-axis. The left plot shows perfect calibration. The middle plot demonstrates a model that tends to give underestimated probabilities, whereas the plot on the right shows systematic overestimation.

A combination of perfect calibration and perfect discrimination can only be

achieved with perfect prediction: a probability of 100% for all those with a future

event, and a probability of 0% for all others. Such an extreme bimodal distribution

is not very realistic in most applications, including subfertility (Ware, 2006; Cook,

2007). In fact, there is a trade-off between calibration and discrimination. A

simulation model showed that, assuming a realistic range of probabilities, the

c-statistic for a perfectly calibrated model would be 0.83 at best (Diamond, 1992).

Overlapping probability distributions in those with and those without the future

event will usually prevent that maximum from being reached (Gail and Pfeiffer,

2005).

Prognostic classification

Although good calibration is very important, a well-calibrated model is not

necessarily useful in subfertility care. Probabilities also need to have sufficient

variability over a clinically useful range, as the essence of prediction models is

their ability to correctly classify individuals into clinically useful risk strata (Cook

et al., 2006; Cook, 2007; Ridker and Cook, 2007). Imagine a perfectly calibrated

model that assigns a probability of achieving a treatment independent pregnancy

23138 Coppus.indd 55 28-09-12 14:55

Chapter 3

56

of 20%-30% to all subfertile couples, and a second perfectly calibrated model

that assigns a probability of IVF success of 20%-30% in the same couples. That

would probably lead to a recommendation of expectant management in all,

which is likely to be erroneous, and the models would not allow us to identify

couples that are better off with IVF, separating them from couples that are better

off with expectant management. For models to be useful, they need to not only

be well calibrated, but also produce a clinically useful distribution in probabilities

in the target population. If we believe that some couples are better off with IVF,

both the spread of probabilities with expectant management and the spread of

probabilities with IVF should be sufficiently wide and distributions should be well

apart. Only then couples can be subdivided into prognostic categories to decide

on the appropriate treatment policy. Various graphs and expressions of the range

in calculated probabilities have recently been suggested in the literature. One

example is the predictiveness curve, suggested by Pepe et al. (2007), which plots

the cumulative distribution of the calculated probabilities in a group, and as such

is a useful visual aid in expressing the relative variability in calculated probabilities.

Research is emerging, and a debate has started, on the best ways to evaluate

superiority of models for prediction (Pepe et al., 2007; Janes et al., 2008; Pencina et

al., 2008). In most of these discussions, a change in the c-statistic is not regarded

as the best statistic to measure an improvement in model performance (Cook,

2007), as a substantial improvement in the c-statistic can only be achieved by

unrealistically large associations between the predictor variables and outcome

(Ware, 2006). Whereas odds ratios for diagnostic tests exceeding 30 or more are

of no exception, odds ratios for prognostic tests rarely exceed 2 (Glas et al., 2003).

Most importantly, a comparison of areas under the curve between prognostic

models does not show us whether individual couples have a different prognosis in

one model when compared with a second model. Regarding the latter, some have

suggested examining how many couples are reclassified, i.e. reassigned from the

group of those for whom expectant management is the better option to those for

whom, say, IUI or even IVF is the better alternative. Ultimately, however, the value

of prediction models should be based on outcome: to what extent does the use of

models improve patient outcome, i.e. allow a couple to have a healthy pregnancy,

at a reasonable balance with patient burden, morbidity and costs.

23138 Coppus.indd 56 28-09-12 14:55


57

3

Conclusion

In reproductive medicine, prognostic models that perfectly predict pregnancy

in subfertile couples, after natural conception or after assisted reproductive

technologies, do not exist and most likely will never exist. Yet, properly calibrated

models, with sufficient variability in calculated probabilities, can be used to support

clinical decision making when the chances of a pregnancy with treatment have to

be weighed against the chances of a pregnancy without treatment. The commonly

used c-statistic or area under the ROC curve expresses only discrimination and is,

as such, not a good measure of the extent to which predictive markers and models

can guide decision making. To assess the clinical value of prediction models,

calibration, variability in probabilities and, subsequently, the degree to which

reclassification of these probabilities yield clinically relevant prognostic strata are

more relevant criteria.

23138 Coppus.indd 57 28-09-12 14:55

Chapter 3

58

References


Cook NR, Buring JE, Ridker PM. The effect of including c-reactive protein in cardiovascular risk prediction models for women. Ann Intern Med 2006; 145:21-29.

Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007; 115:928-935.

Custers IM, Steures P, van der Steeg JW, van Dessel TJ, Bernardus RE, Bourdrez P, Koks CA, Riedijk WJ, Burggraaff JM, van der Veen F, Mol BW. External validation of a prediction model for an ongoing pregnancy after intrauterine insemination. Fertil Steril 2007; 88:425-431.

Diamond GA. What price perfection? Calibration and discrimination of clinical prediction models. J Clin Epidemiol 1992; 45:85-89.

Eimers JM, te Velde ER, Gerritse R, Vogelzang ET, Looman CW, Habbema JD. The prediction of the chance to conceive in subfertile couples. Fertil Steril 1994; 61:44-52.

Gail MH, Pfeiffer RM. On criteria for evaluating models of absolute risk. Biostatistics 2005; 6:227-239.

Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 2003; 56:1129-1135.

Habbema JD, Collins J, Leridon H, Evers JL, Lunenfeld B, te Velde ER. Towards less confusing terminology in reproductive medicine: a proposal. Hum Reprod 2004; 19:1497-1501.

Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143:29-36.

Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, te Velde ER. Two new prediction rules for spontaneous pregnany leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004; 19:2019-2026.

Hunault CC, Laven JS, van Rooij IA, Eijkemans MJ, te Velde ER, Habbema JD. Prospective validation of two models predicting pregnancy leading to live birth amont untreated subfertile couples. Hum Reprod 2005; 20:1636-1641.

Janes H, Pepe MS, Gu W. Assessing the value of risk predictions by using risk stratification tables. Ann Intern Med 2008; 149:75-760.

Leushuis E, van der Steeg JW, Steures P, Bossuyt PM, Eijkemans MJ, van der Veen F, Mol BW, Hompes PG. Prediction models in reproductive medicine: a critical appraisal. Hum Reprod Update 2009; 15:537-552.

Lintsen AM, Eijkemans MJ, Hunault CC, Bouwmans CA, Hakkaart L, Habbema JD, Braat DD. Predicting ongoing pregnancy chances after IVF and ICSI: a national prospective study. Hum Reprod 2007; 22:2455-2462.

Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008; 27:157-172.

Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, Thompson IM, Zheng Y. Integrating the predictiveness of a marker with its performance as a classifier. Am J Epidemiol 2007; 167:362-368.

23138 Coppus.indd 58 28-09-12 14:55


59

3

Ridker PM, Cook NR. Biomarkers for prediction of cardiovascular events. N Engl J Med 2007: 356:1474-1475.

Smeenk JM, Stolwijk AM, Kremer JA, Braat DD. External validation of the templeton model for predicting success after IVF. Hum Reprod 2000; 15:1065-1068.

Snick HK, Snick TS, Evers JL, Collins JA. The spontaneous pregnancy prognosis in untreated subfertile couples: the Walcheren primary care study. Hum Reprod 1997; 12:1582-1588.

Steures P, van der Steeg JW, Mol BWJ, Eijkemans MJ, van der Veen F, Habbema JD, Hompes PG, Bossuyt PM, Verhoeve HR, van Kasteren, van Dop PA. Prediction of an ongoing pregnancy after intrauterine insemination. Fertil Steril 2004; 82:45-51.

Steures P, van der Steeg JW, Hompes PG, Habbema JD, Eijkemans MJ, Broekmans FJ, Verhoeve HR, Bossuyt PM, van der Veen F, Mol BW. Intrauterine insemination with controlled ovarian hyperstimulation versus expectant management for couples with unexplained subfertility and an intermediate prognosis: a randomised clinical trial. Lancet 2006; 368:216-221.

Stolwijk AM, Zielhuis GA, Hamilton CJ, Straatman H, Hollanders JM, Goverde HJ, van Dop PA, Verbeek AL. Prognostic models for the probability of achieving an ongoing pregnancy after in-vitro fertilization and the importance of testing their predictive value. Hum Reprod 1996; 11:2298-2303.

Swets J. Measuring the accuracy of diagnostic systems. Science 1998; 240:1285–1293.

Templeton A, Morris JK, Parslow W. Factors that affect outcome of in-vitro fertilisation treatment. Lancet 1996; 348:1402-1406.

Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile couples. Hum Reprod 2007; 22:536-542.

Ware JH. The limitations of risk factors as prognostic tools. N Engl J Med 2006; 355: 2615-2617.

23138 Coppus.indd 59 28-09-12 14:55

23138 Coppus.indd 60 28-09-12 14:55

Part 2Clinical studies

23138 Coppus.indd 61 28-09-12 14:55

23138 Coppus.indd 62 28-09-12 14:55

4The predictive value of medical

history taking and Chlamydia IgG

ELISA antibody testing (CAT) in the

selection of subfertile women for

diagnostic laparoscopy: a clinical

prediction model approach.

Sjors CoppusBrent Opmeer

Susan LoganFulco van der Veen

Siladitya BhattacharyaBen Willem Mol


The content of this article was presented as an oral presentation at the 22nd ESHRE Annual Meeting, 2006, Prague, Czech Republic.

23138 Coppus.indd 63 28-09-12 14:55

Chapter 4

64

Abstract

BACKGROUND: Medical history taking as well as Chlamydia antibody titre

(CAT) testing is currently used in the selection of patients for diagnostic

laparoscopy with tubal patency testing. Most research has focused on

the predictive value of CAT in isolation from medical history. We assessed

therefore whether the combination of medical history and CAT testing

improves the efficiency of selecting patients for laparoscopy as compared

to the use of either medical history or CAT.

METHODS: Data of 207 consecutive subfertile women were used to create

multivariable logistic regression models for the prediction of tubal disease

as diagnosed by diagnostic laparoscopy.

RESULTS: The model with data of medical history only had an area under

the receiver operating characteristic curve (AUC) of 0.65 (95% CI 0.56-0.74).

Addition of CAT increased the AUC to 0.70 (95% CI 0.62-0.78) (P=0.065). CAT

was positive in 40 women and showed a sensitivity of 0.37 (95% CI 0.26-

0.49) for a specificity of 0.88 (95% CI 0.82-0.93). In CAT positive women a

blank medical history did not decrease the probability of tubal disease. Of

the 167 women tested CAT negative, 23 (14%) still had a high probability

of disease due to their medical history and 11 of them (48%) showed tubal

abnormalities on diagnostic laparoscopy.

CONCLUSIONS: CAT testing adds valuable information to a woman’s risk

profile based on her medical history. The combination of medical history

taking and CAT testing has a better yield for diagnosing tubal disease than

either of these alone.

23138 Coppus.indd 64 28-09-12 14:55

Medical history taking and CAT testing

65

4

Introduction

Tubal pathology affects approximately 15-30% of subfertile women (Evers, 2002).

Currently, laparoscopy is considered to be the best available test for diagnosing

tubal abnormalities. It allows direct visualization of the pelvis and, in addition to

tubal patency testing, offers the opportunity to detect pelvic adhesions. However,

laparoscopy does have several drawbacks. First, it is an invasive surgical procedure

that requires general anaesthesia, both of which carry associated risks. Secondly, as

an expensive investigation that requires operating time and dedicated personnel,

its availability is not unlimited. Therefore, triage is needed to limit the number of

unnecessary laparoscopies, while maintaining a high diagnostic yield.

Since it was reported in 1979 that women with positive Chlamydia IgG antibodies

are more likely to have tubal pathology than those without (Punnonen et al., 1979),

numerous studies have reported on the value of Chlamydia antibody titer (CAT)

testing to predict tubal pathology. Based on these reports, the Dutch Society for

Obstetrics and Gynaecology (NVOG) recommends the use of CAT as a first-line

test in the basic work-up of subfertile couples, with a fixed cut-off level (IgG MIF

>1:32 or ELISA >1.1) above which post-infectious pelvic disease should be ruled

out with laparoscopy and chromotubation (Swart et al., 1995; NVOG, 2004). The

pre-test probability of disease of a woman, based on specific risk factors or other

characteristics from the woman’s medical history is not formally taken into account.

In contrast to the Dutch guideline, the guideline from the National Institute for

Clinical Excellence (NICE) in the United Kingdom does not recommend CAT testing

in the workup of subfertile women, but advises the use of past medical history in

the woman to decide whether diagnostic laparoscopy should be performed or

not (NICE, 2004).

Most diagnostic test research is interpreted in isolation from its clinical context.

However, this does not reflect daily clinical practice, in which the diagnostic

investigation is a consecutive process always starting with history taking and

physical examination, followed by laboratory tests and more invasive and costly

tests such as imaging (Moons et al., 2004). With multivariable analysis one can

reveal the “add-on” value of diagnostic tests (Moons and Grobbee, 2002), in this

case the additional value of CAT testing.

23138 Coppus.indd 65 28-09-12 14:55

Chapter 4

66

The present study therefore aimed to explore the diagnostic consequences

of medical history taking by itself as recommended by the NICE guideline, CAT

testing, as recommended by the Dutch Society for Obstetrics and Gynaecology,

and the combined interpretation of the results from medical history and CAT

testing, which is intuitively used in daily practice by many clinicians.


PatientsWe used data from a previously reported study, in which 207 consecutive women

referred for evaluation of subfertility underwent a diagnostic laparoscopy.

Characteristics of included women and the materials and methods used in this

study have been published previously (Logan et al., 2003). In brief: timing of tubal

assessment was made on clinical grounds and was independent of the result of the

CAT test. All couples had undergone assessment of ovulation and a semen analysis.

Women who were physically unsuitable for laparoscopy or who had undergone

a tubal patency test in the past (either hysterosalpingography or laparoscopy), as

well as women with previous tubal surgery (including ectopic pregnancy) were

excluded from the analysis. Before surgery, all women completed a standard

questionnaire with regards to their demographic data and medical history. For

CAT testing, a Chlamydia trachomatis-specific IgG antibody peptide-based ELISA

(pELISA, Medac, Germany) was used. A positive IgG titer (>1.1) was indicative

of a past infection. If the result was equivocal, an infection with C. trachomatis

in the past could not be excluded and therefore positive and equivocal results

were pooled. Tubal pathology (uni-or bilateral) was defined as the presence

of adhesions involving the Fallopian tube and ovary, clubbing of the tube,

hydrosalpinx or obstruction to the flow of dye in the absence of endometriosis,

which was categorized as a separate entity according to the Revised American

Fertility classification (American Fertility Society, 1985).

Data analysisWe developed multivariable logistic regression models to predict tubal pathology

using the following data from the medical history: female age, number of previous

term deliveries, previous induced abortion, type of subfertility (i.e. primary

or secondary), duration of subfertility, history of pelvic inflammatory disease

(PID), history of sexually transmitted disease, previous use of an intra-uterine

23138 Coppus.indd 66 28-09-12 14:55


67

4

contraceptive device and prior non-gynaecological pelvic surgery. We evaluated

the prognostic value of CAT both as a dichotomous and continuous variable. First,

we checked the assumption of linearity between the continuous variables age,

duration of subfertility, number of previous term deliveries and titer of CAT on

the one hand and tubal pathology on the other hand visualized with smoothed

piecewise polynomials (splines), and formally tested with a generalized additive

model. As we found no significant deviations from linearity in the relationship

with the outcome, it was not necessary to transform these continuous variables.

Subclasses within categorical variables were dichotomized. For dichotomous and

continuous variables, univariable beta coefficients (β), odds ratios (ORs) and 95%

confidence intervals (95% CI) as well as P-values were calculated. In the univariable

analysis, the P-value for statistical significance was set at < 0.05 in all analyses.

Subsequently, multivariable logistic regression analysis with a stepwise backwards

selection procedure was used to construct the prediction models. Since the

erroneous exclusion of a variable is more deleterious for a model than including

too many factors (Mol et al., 2000; Steyerberg et al., 2000), we used a significance

level of 30% for entry in the multivariable model and a significance level of 20% to

stay in the model.

To adjust for overfitting, we performed internal validation with bootstrapping.

Bootstrapping is a technique to evaluate the robustness of the model by

simulating sampling variation in repeated samples from the study population.

Cases are selected randomly from the original sample to create a new sample

of the same size. As cases are drawn with replacement, some cases may be

randomly selected multiple times and others not at all (Steyerberg et al., 2001).

We generated 500 bootstrap data sets in which the same multivariable logistic

regression model was estimated. By analyzing the difference between the

models based on the original dataset and the bootstrap estimates, a shrinkage

factor was calculated and used to correct the original model coefficients and

associated probabilities for overfit.

We developed two models. The first model (“clinical history” model) was based on

items from medical history only. In the second model (“clinical history and CAT”

model) previous infection with C. trachomatis as determined with CAT ELISA was

combined with medical history. This second model was developed to explore

what the result of CAT adds to the information already obtained from medical

23138 Coppus.indd 67 28-09-12 14:55

Chapter 4

68

history taking. To make comparisons, as a third “model”, CAT was dichotomized into

positive or negative test results, as recommended by the Dutch guideline.

The discriminative capacity of the two constructed models was evaluated with

receiver operating characteristic (ROC) curve analysis, in which the area under the

curve (AUC) was calculated. Accuracy of CAT was plotted into the ROC curve as a

comparison. Sensitivity was defined as the number of women with tubal pathology

that was predicted correctly, and specificity was defined as the number of women

without tubal pathology predicted correctly. Calibration of the model, which is the

agreement between predicted and observed probabilities, was evaluated with the

goodness-of-fit Hosmer and Lemeshow test statistic and by visual inspection of

the calibration plots. Finally, we explored the diagnostic consequences of different

strategies. Data were analyzed using SPSS 12.0.1 (SPSS Inc., Chicago, IL, USA) and

S-PLUS 2000 (MathSoft Inc.)

Results

The prevalence of tubal pathology was 30.4% (63/207). Demographic data of

included women have been published elsewhere (Logan et al., 2003). In the

univariable analysis, a higher number of previous term deliveries, a history of PID,

and a positive CAT were statistically significant predictors of tubal pathology. The

other variables were not statistically significant associated with tubal subfertility

(Table I). Four predictive variables, i.e. the number of previous term deliveries,

duration of subfertility, a history of PID and previous non-gynaecological pelvic

surgery were found to be associated with tubal pathology at a P-level of <0.20 in

the “clinical history” model (Table I).

The backward logistic regression selection procedure excluded duration of

subfertility due to a P-value of 0.38 in the “clinical history and CAT” model. However,

as one of the aims of the present study was to compare medical history taking

with an integrated approach of history taking and CAT testing, exclusion of this

variable in the second model would have prohibited this exploration. Analysis

of the regression coefficients and 95% CI for the OR of this variable showed that

duration of subfertility only had a minor contribution towards the estimated

probabilities. To allow comparison, we therefore decided to force duration of

23138 Coppus.indd 68 28-09-12 14:55


69

4

subfertility into the second model (Table I). The possible additional value of use

of CAT as a continuous variable in the prediction model was explored, but did not

improve the discriminatory capacity (data not shown). As a result, CAT was used as

dichotomous variable in the “clinical history and CAT” model.

TABLE I. Results of the uni- and multivariable analysis

Univariable analysis Multivariable analysisclinical history model ¶

Multivariable analysis clinical history and CAT

model ¶¶

Variable OR 95% CI P OR 95% CI P OR 95% CI P

Age (per year) 1.03 0.96-1.09 0.44

Number of deliveries 2.05 1.23-3.43 0.006 2.14 1.27-3.63 0.004 2.05 1.17-3.58 0.01

Induced abortion 1.59 0.77-3.29 0.21

Secondary infertility 1.59 0.87-2.88 0.13

Duration of subfertility 1.02 1.00-1.03 0.08 1.01 1.00-1.03 0.11 1.01 1.00-1.03 0.38

History of PID 3.48 1.06-11.4 0.04 3.28 0.95-11.3 0.06 3.27 0.88-12.2 0.08

History of STD 1.15 0.33-3.98 0.82

IUCD use in history 2.40 0.67-8.59 0.18

Pelvic surgery 2.93 0.86-9.97 0.09 3.18 0.89-11.4 0.08 4.49 1.10-14.7 0.03

CAT positivity 4.30 2.09-8.83 <0.001 4.07 1.88-8.81 <0.001

OR = odds ratio ¶ Backward selection procedureCI = confidence interval ¶¶ Backward selection procedure with forced entry of duration of subfertilityP = p-value

Internal validation with bootstrapping showed a 17.5% overfit for the clinical

history model, and a 14% overfit for the “clinical history and CAT” model. Therefore

the regression coefficients of the model variables were corrected with these

shrinkage factors.

The ROC curves of the two models are shown in Fig. 1. The “clinical history” model

showed an AUC of 0.65 (95% CI 0.56-0.74). “The clinical history and CAT model”

increased the AUC to 0.70 (95% CI 0.62-0.78) (P=0.065) (De Long et al., 1988). As

a comparison, the performance of CAT alone –sensitivity 37% (95% CI 26-49),

specificity 88% (95% CI 82-93) - is plotted in the ROC space.

Calibration of the models was acceptable, as confirmed with the Hosmer and

Lemeshow goodness-of-fit test statistic. This test statistic had a P-value of 0.30 for

the “clinical history” model and a P-value of 0.35 for the “clinical history and CAT”

model. Tables IIa and IIb show the agreement between different categories of

predicted and observed probabilities in the two prediction models. Both models

show an acceptable agreement between the mean predicted probabilities and

23138 Coppus.indd 69 28-09-12 14:55

Chapter 4

70

the observed probability of tubal pathology. However, above a 30% predicted

probability, both models underestimate the chance of disease. In Tables IIa and IIb

the categories of predicted probability of the “clinical history” and “clinical history

and CAT” model are also cross-tabulated against the results of laparoscopy and

CAT. Using the “clinical history model”, 28 women that tested CAT positive, are

classified as having a predicted probability of tubal pathology <30%. However,

12 of these 28 women (43%) still showed abnormalities at laparoscopy (Table IIa).

Using the “clinical history and CAT model” the same 28 women that would have

been classified as low risk patients based on clinical history only, shift to a high

risk classification (i.e. a predicted probability of tubal disease >30%) (Table IIb). On

the other hand, Table IIb also shows that among the 167 women who tested CAT

negative, 23 (14%) still had a predicted probability of >30% of tubal disease due to

their medical history. In this selected group of CAT negative women 48% (11/23)

still showed tubal abnormalities on diagnostic laparoscopy.

1,0

0,8

0,6

0,4

0,2

0,00,0 0,2 0,4 0,6 0,8 1,0

Sens

itivi

ty

1-Specificity

FIGURE I. Receiver operating characteristic curves of the multivariable logistic regression models for the prediction of tubal pathology in subfertile patients.- - - - - = CAT-–-–-–- = clinical history model____ = clinical history + CAT model

23138 Coppus.indd 70 28-09-12 14:55


71

4

TABLE IIa. The classification of women according to their predicted probabilities, cross-tabulated against the results of laparoscopy and univariable CAT testing.

CAT positive CAT negative

Predicted probability*

Diseased Non-diseased


Total Mean predicted probability

Observed probability

10-20% 8 7 15 76 106 18% 21%

20-30% 4 9 12 34 59 25% 27%

30-40% 2 1 6 11 20 35% 40%

>40% 9 0 7 6 22 50% 73%

Total 23 17 40 127 207

40 167 207

* Probability calculated from the “clinical history” model

TABLE IIb. The classification of women according to their predicted probabilities, cross-tabulated against the results of laparoscopy and univariable CAT testing.

CAT positive CAT negative

Predicted probability**



Total Mean predicted probability

Observed probability

10-20% 0 0 22 91 113 15% 19%

20-30% 0 0 7 24 31 24% 23%

30-40% 9 11 7 9 36 37% 44%

>40% 14 6 4 3 27 57% 67%

Total 23 17 40 127 207

40 167 207

** Probability calculated from the “clinical history and CAT” model

To show the consequences of medical history taking by itself, CAT testing alone

by itself, and a combination of medical history taking and CAT testing, diagnostic

and clinical parameters are listed in Table III. For comparison, the consequences of

performing diagnostic laparoscopy in all women without any selection strategy are

shown in the first column. The number of laparoscopies that has to be performed

to detect one woman with tubal pathology is comparable when using history,

CAT or history and CAT and much lower than without any workup. However, the

detection rate of tubal pathology (true positive rate) is highest when using the

combined model (54% versus 38 and 37% respectively).

23138 Coppus.indd 71 28-09-12 14:55

Chapter 4

72

TABLE III.

Diagnostic parameter All LS* probability >30% by history LS

positive CAT LS

probability >30% by history and CAT LS

Number of laparoscopies 207 42 40 63

Tubal disease diagnosed 63 24 23 34

Tubal diseased missed 0 39 40 29

Unnecessary laparoscopies 144 18 17 26

NND** 3.3 1.8 1.7 1.9

Sensitivity NA 38% 37% 54%

Specificity NA 88% 88% 80%

* Laparoscopy** Number needed to diagnose: number of laparoscopies needed to detect one abnormality.

Discussion

In this study we evaluated the efficiency of medical history taking, CAT testing,

and a combination of both in selecting women for laparoscopy to detect tubal

pathology. We found that the discriminative capacity of the “medical history” model

did not differ from that of CAT testing alone. However, combined interpretation of

both history and CAT identified the women at highest risk for tubal disease, and

increased sensitivity for a similar specificity. In CAT positive women, the probability

of tubal disease is so high that a prompt diagnostic laparoscopy is indicated,

irrespective of whether the woman has a suspect or non-suspect medical history.

A small subgroup of CAT negative women (14%) still have a high probability of

tubal disease due to their history, and laparoscopy will identify tubal abnormalities

in around half of these women. The majority of women (70%) in this cohort were

classified as having a probability <30% of tubal disease due to a negative CAT

status and a non-suspect clinical history. Since 80% of these women will show no

tubal abnormalities, laparoscopy in these women can be deferred.

Our study has some limitations. First, both our study cohort and the actual number

of patients with positive factors in the medical history were relatively small. This

limits the statistical power of this study and leads to relatively large confidence

intervals. However, the strength of using this cohort of women is the fact that

the decision to perform a diagnostic laparoscopy was irrespective of the result of

CAT. This makes it possible to obtain estimates of diagnostic performance without

partial verification bias, a limitation encountered in many other diagnostic studies

on this subject (Logan et al., 2003; Whiting et al., 2004).

23138 Coppus.indd 72 28-09-12 14:55


73

4

A second limitation of this study is the fact that we dichotomized the result of

the CAT ELISA assay. A progressive increase in the likelihood of tubal pathology

in patients with a higher titre of the Chlamydia antibody test has been shown for

microimmunofluorescence (MIF) tests (Thomas et al., 2000), as well as for ELISA

(Tanikawa et al., 1996). The antibody titres of women in our cohort showed only

little diversity, as a result of which we were not able to demonstrate this relation

in the present study. It may be possible that use of a MIF test would have resulted

in a larger diversity between IgG titres of the patients, leading to extra diagnostic

information if incorporated into the prediction model. MIF tests however have

been shown to suffer from a significant amount of cross-reactivity with Chlamydia

pneumoniae. Furthermore, interpretation of these tests is subject to inter-observer

variation (Gijsen et al., 2001; Land et al., 2003). Moreover, a recent study showed the

pELISA used in this study to be the one most specific for a previous C. trachomatis

infection (Land et al., 2003).

A third limitation concerns our definition of tubal pathology. Broad criteria were

used, namely presence of adhesions involving the Fallopian tubes and ovaries,

clubbing of the tubes and hydrosalpinges or obstruction to the flow of dye,

irrespective of severity or whether the pathology was uni- or bilateral. Previous

studies have shown that the diagnostic performance of CAT increases when the

label “tubal disease” is limited to the more severe cases (Land et al., 1998). Due to a

low number of women with bilateral tubal disease in this study, we were unable to

restrict our analysis to these cases.

Detection of endometriosis by laparoscopy was not part of the intention to screen

by medical history, since CAT will be unable to predict adhesions or damage due

to endometriosis. Although this approach might be open to debate, this has

been done previously in studies on the subject (Akande et al., 2003; Logan et al.,

2003). Data on the capacity of medical history to predict endometriosis have been

published previously (Calhaz-Jorge et al., 2004).

One previous study has attempted to predict tubal pathology by means of a

multivariable logistic regression model (Hubacher et al., 2004). Women were

divided into a ‘high probability’ and ‘low probability’ group based on a logistic

regression model that included four variables, namely previous symptoms of PID,

vaginal discharge, lower genital tract infection, and presence of antibodies to C.

23138 Coppus.indd 73 28-09-12 14:55

Chapter 4

74

trachomatis. The results demonstrated a sensitivity of 84% with a specificity of 29%.

The authors concluded that “the role of history taking in the evaluation of women

with tubal factor infertility is limited”. Although clinical history does not have the

ultimate discriminative power to distinguish between patients with and without

tubal pathology, we have demonstrated in the present study that it can be used to

create individualized patient risk profiles and have shown the additional value of

CAT testing upon medical history taking.

In our study we included women with both uni- and bilateral tubal disease.

Unilateral tubal occlusion or tubo-ovarian adhesions may compromise fertility

prospects moderately, but only bilateral tubal occlusion is known to virtually

exclude all possibility of a spontaneous pregnancy (Mol et al., 1999). Therefore,

it is questionable whether all women with minor tubal pathology should be

labelled as having tubal disease. To date, no evidence-based policies for the

treatment of unilateral tubal disease have been published (Ahmad et al., 2006),

and therefore detection of one-sided tubal disease is unlikely to result in a major

shift in treatment. Since most of these women have a low predicted probability

of disease, postponing laparoscopy in these women may allow them to conceive

spontaneously (van der Steeg et al., 2007), thereby making laparoscopy redundant.

Unfortunately, we did not have data about treatment independent pregnancy

rates in this cohort of patients, and cannot answer this question.

Our model allows differentiation between women with a high predicted chance

of tubal pathology and women with a low chance. What probability of diagnostic

uncertainty is acceptable for women is not known at the moment. It may be

possible that women prefer to have 100% certainty about their tubal integrity.

In that case, since no workup will ever be 100% accurate all women will require

diagnostic laparoscopy. This will increase costs and enlarge the pool of women

with a diagnosis of minimal pathology of questionable prognostic significance

(Collins et al., 1995). Use of a multivariable prediction model including history and

CAT allows a conjoint decision of whether or not to perform a laparoscopy, based

on the woman’s as well as doctor’s preferences. Depending on these preferences,

one could perform a diagnostic laparoscopy in case of a high predicted

probability, postpone laparoscopy for several months in cases where the predicted

probability is low, or use a less invasive alternative, i.e. hysterosalpingography or

hysterosalpingo-contrast-sonography. Women should be adequately informed

23138 Coppus.indd 74 28-09-12 14:55


75

4

regarding the chance of delaying the diagnosis in case of low predicted probability

and negative CAT status.

In this study, we explored the value of medical history taking and Chlamydia

antibody testing in predicting tubal subfertility by constructing diagnostic

prediction models. Whether the use of such models is feasible and clinically

effective in daily routine practice is not known. Further research will have to

evaluate women’s preferences regarding the trade-off between likelihood

of detecting tubal pathology and the risks and discomfort associated with a

potentially unnecessary diagnostic laparoscopy.

In conclusion, combined use of medical history taking and CAT testing has a

superior diagnostic accuracy than one of these alone. CAT testing adds additional

predictive value to a woman’s medical history risk profile.

Appendix:The chance of a patient of having tubal pathology can be calculated with the following formula: probability of tubal pathology = 1/[1+ EXP(-β)] in which the β for the different models are: “clinical history” model: beta = -1.764 + (0.629*number of term deliveries) + (0.0116*duration of subfertility) + (0.979*positive history of PID) + (0.955*history of non-gynaecologic pelvic surgery). “history+cat” model: beta = -1.874 + (0.616*number of term deliveries) + (0.0069*duration of subfertility) + (1.020*positive history of PID) + (1.201*history of non-gynaecologic pelvic surgery) + (1.208*positive CAT).

23138 Coppus.indd 75 28-09-12 14:55

Chapter 4

76

References

American Fertility Society. Revised American Fertility Society classification of endometriosis. Fertil Steril 1985; 43:351-352.

Ahmad G, Watson A, Vandekerckhove P and Lilford R. Techniques for pelvic surgery in subfertility. Cochrane Database Syst Rev 2006; 2: CD000221.

Akande VA, Hunt LP, Cahill DJ, Caul EO, Ford WC, Jenkins JM. Tubal damage in infertile women: prediction using Chlamydia serology. Hum Reprod 2003; 18:1841-1847.

Calhaz-Jorge C, Mol BW, Nunes J, Costa AP. Clinical predictive factors for endometriosis in a Portugese infertile population. Hum Reprod 2004; 19:2126-2131.


De Long ER, De Long DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988; 44:837-845.

Evers JL. Female subfertility. Lancet 2002; 360:151-159.

Gijsen AP, Land JA, Goossens VJ, Leffers P, Bruggeman CA, Evers JL. Chlamydia pneumoniae and screening for tubal factor subfertility. Hum Reprod 2001; 16:487-491.

Hubacher D, Grimes D, Lara-Ricalde R, de la Jarra J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor subfertility. Fertil Steril 2004; 81:6-10.

Land JA, Evers JL, Goossens VJ. How to use Chlamydia antibody testing in subfertility patients. Hum Reprod 1988; 13:1094-1098.

Land JA, Gijsen AP, Kessels AG, Slobbe ME, Bruggeman CA. Performance of five serological chlamydia antibody tests in subfertile women. Hum Reprod 2003; 18:2621-2627.

Logan S, Gazvani R, McKenzie H, Templeton A, Bhattacharya S. Can history, ultrasound, or ELISA chlamydial antibodies, alone or in combination, predict tubal factor subfertility in subfertile women? Hum Reprod 2003; 18:2350-2356.

Mol BW, Swart P, Bossuyt PM, van der Veen F. Prognostic significance of diagnostic laparoscopy for spontaneous fertility. J Reprod Med 1999; 44:81-86.

Mol BW, van Wely M, Steyerberg EW. Using prognostic models in clinical infertility. Hum Fertil (Camb) 2000; 3:199-202.

Moons KG, Grobbee DE. Diagnostic studies as multivariable, prediction research. J Epidemiol Community Health 2002; 56:337-338.

Moons KG, Biesheuvel CJ, Grobbee DE. Test research versus diagnostic research. Clin Chem 2004; 50:273-6.

Nederlandse Vereniging voor Obstetrie en Obstetrie (2004) Orienterend fertiliteitsonderzoek [NVOG guideline diagnostic fertility workup]

National Institute for Clinical Excellence (2004) Guideline CG11: Fertility: assessment and treatment for people with fertility problems. (http://www.nice.org.uk/download.aspx?o=CG011fullguideline)

Punnonen R, Terho P, Nikkanen V, Meurman O. Chlamydial serology in infertile women by immunofluorescence. Fertil Steril 1979; 31:656-659.

23138 Coppus.indd 76 28-09-12 14:55


77

4

Steyerberg EW, Eijkemans MJ, Harrell FE Jr., Habbema JD. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000; 19:1059-1079.

Steyerberg EW, Harrell FE Jr., Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001; 54:774-781.

Swart P, Mol BW, van der Veen F, van Beurden M, Redekop WK, Bossuyt PM. The accuracy of hysterosalpingography in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1995; 64:486-491.

Tanikawa M, Harada T, Katagiri C, Onohara Y, Yoshida S, Terakawa N. Chlamydia trachomatis antibody titres by enzyme-lined immunosorbent assay are useful in predicting severity of adnexal adhesion. Hum Reprod 1996; 11:2418-2421.

Thomas K, Coughlin L, Mannion PT, Haddad NG. The value of Chlamydia trachomatis antibody testing as part of routine infertility investigations. Hum Reprod 2000; 15:1079-1082.

Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ et al., Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile couples. Hum Reprod 2007; 22:536-542.

Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J. Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 204; 140: 189-202.

23138 Coppus.indd 77 28-09-12 14:55

23138 Coppus.indd 78 28-09-12 14:55

5Identifying subfertile ovulatory

women for timely tubal patency

testing: a clinical decision model

based on medical history.

Sjors CoppusHarold Verhoeve

Brent OpmeerJan Willem van der Steeg

Pieternel SteuresRene Eijkemans

Peter HompesPatrick Bossuyt

Fulco van der VeenBen Willem Mol


23138 Coppus.indd 79 28-09-12 14:55

Chapter 5

80

Abstract

BACKGROUND: The aim of tubal testing is to identify women with bilateral

tubal pathology in a timely manner, so they can be treated with IVF or

tubal surgery. At present it is unclear for which women early tubal testing is

indicated, and in whom it can be deferred.

METHODS: Data of 3716 women who underwent tubal patency testing as

part of their routine fertility workup were used to relate elements in their

medical history to the presence of tubal pathology. With multivariable

logistic regression, we constructed two diagnostic models. One in which

tubal disease was defined as occlusion and/or severe adhesions of at least

one tube, whereas in a second model, tubal disease was defined as the

presence of bilateral abnormalities.

RESULTS: Both models discriminated moderately well between women

with and women without tubal disease with an area under the receiver-

operating characteristic curve (AUC) of 0.65 (95% CI: 0.63-0.68) for any

tubal pathology and 0.68 (95% CI: 0.65-0.71) for bilateral tubal pathology,

respectively. However, the models could make an almost perfect distinction

between women with a high and a low probability of tubal pathology. A

decision rule in the form of a simple diagnostic score chart was developed

for application of the models in clinical practice.

CONCLUSIONS: In conclusion, the present study provides two easy to

use decision rules that can accurately express the women’s probability of

(severe) tubal pathology at the couple’s first consultation. They could be

used to select women for tubal testing more efficiently.

23138 Coppus.indd 80 28-09-12 14:55

Medical history models for tubal pathology

81

5

Introduction

Tubal pathology ranks among the most frequent causes of subfertility, next

to ovulatory disorders and sperm defects (Evers, 2002). Therefore, in UK and

Dutch guidelines, tubal patency testing is a part of the fertility workup (National

Institute of Clinical Excellence 2004; Nederlandse Vereniging voor Obstetrie

en Gynaecologie, 2004). The most frequently used and widely accepted tests

for this are hysterosalpingography (HSG) and diagnostic laparoscopy with

chromopertubation. Both tests have several drawbacks however. Women consider

HSG a painful test and HSG has a considerable risk of infection (Forsey et al., 1990;

Liberty et al., 2007). Laparoscopy is an invasive surgical procedure that requires

general anesthesia with associated risks, and its availability is restricted as it is an

expensive investigation requiring operating time and dedicated personnel.

For these reasons, tubal testing is usually performed in a selected group of

subfertile women. At present, guidelines advocate medical history taking as a tool

to select women for tubal testing (National Institute of Clinical Excellence 2004). It

is not entirely clear how elements from the medical history should be combined to

identify women that should undergo early tubal patency testing. So far, only a few

studies with relatively small sample sizes have tried to summarize this information

in a quantitative manner (Hubacher et al., 2004; Coppus et al., 2007).

The aim of the present study was to form clinical decision rules based on medical

history that can help to identify women at high and low risk for (severe) tubal disease.

These decision rules, presented as simple quantitative score charts, were developed

in a large cohort of women who underwent routine tubal patency testing.


PatientsBetween January 2002 and February 2004, consecutive couples presenting at the

fertility clinic of 38 centres in The Netherlands were invited to join a prospective

cohort study. The local ethics committee of each participating centre gave

Institutional Review Board approval for this study. All couples underwent a basic

fertility workup according to the guidelines of the Dutch Society of Obstetrics and

23138 Coppus.indd 81 28-09-12 14:55

Chapter 5

82

Gynaecology. The details of this workup have been described in detail elsewhere

(van der Steeg et al., 2007).

The present analysis was limited to couples with a regular ovulatory cycle,

defined as a cycle length between 23 and 35 days with a within cycle variation

of less than 8 days. Ovulation was detected by a basal body temperature chart,

mid-luteal serum progesterone, or by ultrasonographic monitoring of the cycle.

Couples with a history of reversal of sterilization, tubal surgery, IVF or previous

tubal patency testing were excluded. Couples in whom semen analysis showed

a severe impairment of semen quality requiring IVF-ICSI (defined as a post-wash

total motile count < 1*106), were also excluded from the present analysis.

For each couple, we registered referral status (by a general practitioner or

gynaecologist), duration of child wish, female and male age, dysmenorrhoea, female

and male smoking habits, history of pelvic inflammatory disease (PID), history

of Chlamydia infection, previous pelvic surgery other than Caesarean section,

previous treatment with intrauterine insemination and previous pregnancies from

the female and male partner in the present and/or prior relationships, including

whether these resulted in an ectopic pregnancy, legally induced abortion,

spontaneous abortion or term delivery. Moreover, the total motile sperm count

(TMC) was recorded.

Duration of child wish was defined as the period between the date the couple

had started unprotected intercourse and the date of first consultation. Female and

male age was calculated at the time of this first visit. Previous PID had either been

confirmed laparoscopically or was documented if a woman had been treated with

antibiotics for a period of intense abdominal-pelvic pain and fever, for which no

other focus had been found. Infection with Chlamydia trachomatis was self reported

by the patient, and could have been symptomatic or asymptomatic. Missing data

in patients’ history were imputed, as restricting the analysis to complete cases

leads to loss of statistical power and potentially biased results in multivariable

analyses (van der Heijden et al., 2006). For this purpose the ‘aRegImpute’ multiple

imputation function in S-plus 2000 (MathSoft Inc.) was used.

Tubal patency testing was performed with either HSG, diagnostic laparoscopy

(DLS) or transvaginal hydrolaparoscopy (THL), depending on the local protocol

23138 Coppus.indd 82 28-09-12 14:55


83

5

of participating hospitals. In case, tubal patency was evaluated both with HSG

and diagnostic laparoscopy or THL, the result of the most invasive test was taken

as conclusive. Tubal pathology was defined as uni- or bilaterally impaired or

absent flow of contrast medium on HSG, and as uni- or bilateral occlusion and/or

hydrosalpinx and/or severe adhesions disturbing ovum pick-up if the tubes were

evaluated with laparoscopy or THL. This definition included endometriosis related

tubal disease.

Statistical analysesWe developed two multivariable logistic regression models. In the first model, all

women with either uni- or bilateral abnormalities were coded as diseased, whereas

only those women with bilaterally normal tubes were considered as non-diseased.

This first model will be referred to as the ‘any pathology model’. In the second

model, only women with bilateral tubal abnormalities were classified as diseased,

while women with uni- or bilaterally normal tubes were considered to be non-

diseased. The latter model will be referred to as the ‘bilateral pathology model’.

For the development of the multivariable models, we first checked the assumption

of linearity between the continuous variables female age, male age, TMC and

duration of child wish on the one hand and both definitions of tubal pathology

on the other hand using smoothed piecewise polynomials (splines) (Harrell et al.,

1988). Non-linear associations were redefined, based on these spline functions.

Subclasses within categorical variables were dichotomized. We then performed

univariable and multivariable logistic regression analysis to estimate odds ratios

(ORs), 95% confidence intervals (95% CI) and corresponding P-values for both

dichotomous and continuous variables.

As the use of too stringent P-values for variable selection is more deleterious for a

model than including too many factors (Steyerberg et al., 2000), all variables that

showed a significance level of <30% in univariable analyses were entered in the

multivariable logistic regression model. To make parsimonious models, we used a

stepwise backwards selection procedure, using a predefined significance level of

>10% for removing variables from the models.

To adjust for overfitting, we performed internal validation with bootstrapping

(Steyerberg et al., 2001). We generated 500 bootstrap data sets in which the same

23138 Coppus.indd 83 28-09-12 14:55

Chapter 5

84

multivariable logistic regression model was estimated. By analysing the difference

between the model based on the original dataset and the bootstrap estimates, a

shrinkage factor was calculated and used to correct the original model coefficients

and associated probability estimates for overfit.

The discriminative capacity of the models was evaluated by calculating the area

under the receiver- operating curve (AUC). Calibration of the models was assessed

by comparing the calculated probabilities across five risk groups (quintiles) with the

observed proportion of tubal disease in each of these groups. The goodness-of-fit

was evaluated graphically and tested formally with the Hosmer and Lemeshow

test statistic.

Finally, a paper score worksheet was developed, for which the shrunken regression

coefficients were multiplied by 10 and rounded to simplify the computation

for clinical practice. Values for continuous variables were obtained by linear

interpolation. This diagnostic rule was then transformed into a score chart to

simplify computations in clinical practice. Data were analysed using SPSS 12.0.1

(SPSS Inc., Chicago, IL, USA) and S-PLUS 2000 (MathSoft Inc.)

Results

During the study period, data of 7860 couples were collected. Of these, 938

women did not have a regular ovulatory cycle, 636 men had severely impaired

sperm quality and 196 couples had previously undergone fertility surgery or IVF.

In another 568 women, HSG or DLS had been performed in a previous episode of

subfertility. In total, these factors left 5522 couples for inclusion.

In total, 3716 women underwent 2752 hysterosalpingographies, 1527 diagnostic

laparoscopies and 160 transvaginal hydrolaparoscopies (Fig. 1). About 2039 women

underwent only HSG, 808 women underwent only laparoscopy and 147 women

underwent only THL. The remaining 722 women underwent multiple tests: 709

women had HSG and laparoscopy, 3 women had HSG and THL, 9 women had THL

and laparoscopy, and 1 woman underwent all three procedures.

23138 Coppus.indd 84 28-09-12 14:55


85

5

Consecutive subfertile couples n =7860

N = 5522 • 1806 women no tubal testing • 3716 women tubal testing

Excluded • n = 938 ovulation disorder • n = 636 TMC < 1 x 106 • n = 196 previous IVF or fertility surgery • n = 568 previous tubal testing

Number of tubal patency tests in 3716 women • 2752 HSG • 1527 DLS • 160 THL

FIGURE 1. Study profile

In 1806 women (32%), tubal patency was not assessed invasively. Women that

were tested were significantly less often referred by a gynaecologist (7.4% versus

10.4%, P<0.001), had a longer duration of subfertility (median 1.5 versus 1.4 years,

P<0.001), were less often treated previously with IUI (3.1% versus 6.7%), had

conceived less often previously in the present relationship (30.3% versus 36.5%,

P<0.001) and more often had a history of pelvic surgery (14.2 versus 11.3%,

P=0.008) than women that did not undergo tubal testing.

The median time between the couple’s first appointment and tubal patency

testing with HSG was 3.1 months. For laparoscopy and for THL, these intervals were

6.5 months and 4.9 months respectively. For all women, the median time from

first visit to a definitive diagnosis regarding tubal patency was 4.1 months (5th-95th

percentile 1.1 to 15.8 months). Severe bilateral tubal pathology was diagnosed in

312 women, corresponding with a prevalence of 8.4%. In 369 women, unilateral

pathology was diagnosed, which results in an overall prevalence for combined

uni- or bilateral tubal disease of 18%.

The baseline characteristics of the 3716 couples are shown in Table 1. In total,

15% of data points were missing and subsequently imputed. The spline analyses

showed that the relation between most continuous variables and the probability of

both outcomes was approximately linear. A non-linear relationship was observed

23138 Coppus.indd 85 28-09-12 14:55

Chapter 5

86

for female age, with a marked increase in risk with age over 32 years. We therefore

redefined female age as two linear variables, one for female age below and one for

age above 32 years.

TABLE I. Baseline characteristics of the 3716 included couples

Mean/Median* 5th-95th percentile

Female age (years) 32.4 25.3 – 39.0

Male age (years) 34.9 27.5 – 44.6

Duration of child wish (years) 1.5* 0.7 – 4.3

Subfertility, primary (n) (%) 2312 62.2

Results of the univariable and multivariable analyses are summarized in Table

2. For any tubal pathology, the multivariable analysis showed that prolonged

duration of subfertility, a pregnancy of both partners or of only the male partner in

a previous relationship, male smoking habits, a history of PID, history of Chlamydia

infection, history of ectopic pregnancy and previous pelvic surgery were all factors

that increased the likelihood of tubal pathology. A history of a legally induced

abortion also contributed to a higher probability of tubal pathology, although this

parameter was not statistically significant at the 5% level (P=0.07).

For bilateral tubal pathology, multivariable analysis showed that referral by a

gynaecologist, female age above 32 years, dysmenorrhoea and smoking of the

woman increased the likelihood of disease, as did a longer duration of subfertility,

a pregnancy of both partners in a previous relationship, male smoking habits,

history of PID, and history of Chlamydia infection, ectopic pregnancy and previous

pelvic surgery. In contrast, a previous pregnancy in the current relationship and a

higher male age lowered the chances of bilateral tubal pathology.

The ‘any pathology’ model had a modest overfit of 4%, whereas the ‘bilateral

pathology’ model showed 9% overfit. To improve calibration of the probabilities in

future patients, the regression coefficients of the models were shrunk with these

factors. Both models discriminated modestly between diseased and non-diseased

with an AUC of 0.65 (95% CI: 0.63-0.68) for the ‘any pathology’ model and 0.68 (95%

CI: 0.65-0.71) for the ‘bilateral pathology’ model, respectively.

23138 Coppus.indd 86 28-09-12 14:55


87

5

TAB

LE II. Results of the uni- and multivariable analysis

Variable

Any tub

al path

ology

Bilateral tub

al path

ology

Un

ivariable an

alysisM

ultivariable an

alysisU

nivariab

le analysis

Multivariab

le analysis

OR

95%C

IP value

OR

95% C

IP value

OR

95% C

IP value

OR

95% C

IP value

Referral by gynaecologist1.52

1.14-2.02 0.005

1.791.24-2.59

0.0021.47

1.00-2.170.05

Female age

o per year <

32o

per year > 32

1.00 1.04

0.97-1.041.01-1.07

0.980.02

1.021.04

0.97-1.071.00-1.09

0.410.06

1.061.01-1.12

0.02

Male age (per year)

1.000.98-1.02

0.980.98

0.97-1.010.25

0.960.94-0.99

0.004

Child w

ish (per year)1.11

1.06-1.18<

0.0011.09

1.03-1.160.002

1.181.11-1.26

<0.001

1.151.08-1.23

<0.001

Previous IUI treatm

ent0.91

0.57-1.460.70

0.910.47-1.76

0.79

Previously conceivedo

current relationo

only male partner in previous

relationo

only female partner in previous

relationo

both partners in previous relation

1.141.51

1.29

2.56

0.95-1.361.07-2.12

0.96-1.75

1.35-3.20

0.150.02

0.09

<0.001

1.59

2.01

1.12-2.27

1.30-3.12

0.01

0.002

0.761.47

1.34

2.87

0.59-1.000.93-2.33

0.89-2.00

1.79-4.60

0.050.10

0.25

<0.001

0.77

2.39

0.58-1.03

1.44-3.95

0.08

0.001

Dysm

enorrhoea1.10

0.92-1.310.29

1.351.06-1.71

0.011.29

1.01-1.650.04

Female sm

oking1.35

1.12-1.62<

0.0011.73

1.35-2.21<

0.0011.35

1.03-1.770.03

Male sm

oking1.36

1.15-1.62<

0.0011.22

1.02-1.460.03

1.641.30-2.07

<0.001

1.331.03-1.72

0.03

History of PID

2.901.96-4.28

<0.001

2.031.34-3.09

0.0013.90

2.50-6.08<

0.0012.66

1.65-4.29<

0.001

History of C

hlamydia

2.081.54-2.80

<0.001

1.673.59-10.2

<0.001

2.691.88-3.87

<0.001

1.901.28-2.82

0.002

History of ectopic pregnancy

9.73 5.9-16.0

<0.001

6.063.59-10.2

<0.001

3.171.80-5.60

<0.001

3.781.74-8.18

0.001

History of induced abortion

1.571.19-2.07

0.0021.34

0.98-1.830.07

1.511.04-2.21

0.03

History of m

iscarriage or life birth1.07

0.90-1.280.44

0.930.73-1.20

0.58

Pelvic surgery excluding Caesarean section3.19

2.57-3.98<

0.0012.23

1.79-2.79<

0.0011.87

1.41-2.48<

0.0011.40

1.02-1.920.004

TMC

(*10*10^6/l)

1.001.00-1.01

0.561.00

1.00-1.010.63

23138 Coppus.indd 87 28-09-12 14:55

Chapter 5

88

The calibration of the model for any tubal pathology is shown in Fig. 2A, and that

for bilateral tubal pathology in Fig. 2B. The calculated probabilities of bilateral tubal

pathology correlated well with the observed proportions of affected women, as all

points were located close to the line of equality (x = y), indicating good calibration

of the model. This was confirmed by the Hosmer and Lemeshow test statistic

(P=0.83). Because of the overestimation of the lowest probabilities for women

with any tubal pathology, the calibration of this model was less optimal (P=0.27).

However, the women with the highest likelihood of tubal disease could still be

distinguished adequately from women with an average probability.

obse

rved

frac

tion 1

- or 2

-side

d tub

al pa

tholo

gy

0,20

0,25

0,30

0,35

0,15

0,10

0,05

0,000,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35

calculated probability of 1- or 2-sided tubal pathology

obse

rved

frac

tion 2

-side

d tub

al pa

tholo

gy 0,20

0,15

0,10

0,05

0,000,00 0,05 0,10 0,15 0,20

calculated probability of 2-sided tubal pathology

FIGURE 2. Calibration plot of the decision rule for uni- or bilateral tubal pathology (left) or bilateral tubal pathology (right)

A summary score can be calculated based on the respective elements in a

couple’s medical history. By plotting this score, the probability of tubal pathology

can subsequently be read from a graph. The score charts for both models are

presented in Fig 3.

The consequences of the use of the clinical decision rules are shown in Table IIIa

and IIIb. For example, deferring tubal testing in case of a probability of bilateral tubal

pathology of less than 5% results in a 25% reduction in the number of invasive tests

procedures, while 90% of women with bilateral tubal disease are still detected. At

this cut-off level, 10 tests have to be performed to detect one case of bilateral tubal

23138 Coppus.indd 88 28-09-12 14:55


89

5

disease. Increasing the cut-off to a higher pre-test probability of tubal pathology

at which tubal patency testing is performed will result in fewer procedures, but

also in a higher number of women in whom tubal testing is incorrectly postponed.

By planning tubal testing in women at highest risk for tubal pathology early, the

detection rate will increase while delaying testing in the majority of women correctly.

TABLE IIIA. Diagnostic consequence of use of decision rule for bilateral tubal pathology.

Strategy Number of women

undergoing invasive tubal

testing

Number (%) of tested

women with bilateral tubal

pathology

Reduction of number of invasive tubal tests

NND* Number (%) of women in whom tubal testing was

withhold correctly

Number (%) of women not tested

with bilateral pathology

No testing 0 0 3716 (100%)

3404 (91.6%) 312 (8.4%)

Test all with HSG or DLS

3716 (100%) 312 (8.4%) 0 12 0 0

Test if probability of bilateral tubal pathology >5%

2786 (75%) 279 (10%) 930 (25%) 10 897 (96.5%) 33 (3.5%)


801 (22%) 139 (17%) 2915 (78%) 6 2742 (94%) 173 (6%)


350 (9.4%) 79 (23%) 3366 (84%) 4 3133 (93%) 233 (7%)

*NND: Number of tubal tests needed to diagnose one woman with bilateral tubal disease

TABLE IIIB. Diagnostic consequence of use of decision rule for any tubal pathology.

Strategy Number of women

undergoing invasive

tubal testing

Number (%) of tested

women with any tubal pathology

Reduction of number of invasive tubal tests

NND* Number (%) of women in whom

tubal testing was withhold

correctly

Number (%) of women not tested with any

pathology

No testing 0 0 3716 (100%) 3035 (82%) 681 (18%)

Test all with HSG or DLS 3716 (100%) 681 (18%) 0 5.5 0 0

Test if probability of any tubal pathology >15%

1600 (42%) 422 (26%) 2116 (57%) 3.8 1857 (88%) 259 (12%)


921 (25%) 306 (33%) 2795 (75%) 3 2420 (87%) 375 (13%)


547 (15%) 208 (38%) 3169 (85%) 2.6 2696 (85%) 473 (15%)

*NND: Number of tubal tests needed to diagnose one woman with any tubal disease

23138 Coppus.indd 89 28-09-12 14:55

Chapter 5

90

Discussion

We developed two clinical decision rules to calculate the probability of any tubal

pathology as well as the probability of severe bilateral tubal pathology in subfertile

women. These rules were derived from the clinical characteristics of 3716 couples

who had been referred for evaluation of their subfertility in a large prospective

multicentre study in the Netherlands. Both models could discriminate moderately

well between women with and women without tubal pathology. Calibration

of the models was good, allowing distinction between women with a high-risk

profile and women with a low-risk profile.

The relative importance of discrimination and calibration depends on the clinical

application of a model (Mol et al., 2005). As our models are intended to counsel

women about the probability that tubal testing would reveal a cause for their

subfertility, the accuracy of the numeric probability (calibration) is relevant, less so

the fact of having tubal disease or not (discrimination).

A strength of this study is the national multicentre design, which enabled us to

include a large number of women. A possible disadvantage of the multicentre

nature is the potential for heterogeneity, due to differences in tubal testing

protocols between hospitals or because of variability in tubal pathology prevalence

between different areas of the country. Therefore we explored whether the centre

as such acted as a confounder for variables that were included in the models. To

do so we added centre as a covariate to the multivariate regression models. This

analysis did not alter the models, nor did it affect the strength of associations.

Data from medical history might suffer from a lack of reliability, and one cannot

exclude that certain items have been reported incorrectly. For example, due to

the design of the study, not all cases that reported a history of PID or Chlamydia

infection were verified by laparoscopy or culture. On the other hand, some women

may have reported incorrectly not having suffered an episode of PID.This will have

caused ‘false-positive’ or ‘false-negative’ histories. One should realise however

that these reports do also occur in daily clinical practice, where the information

from medical history is collected during the fertility workup on a routine basis, a

situation which is reflected by our prospective study.

From a study design perspective, ideally all women would have undergone visual

assessment of their Fallopian tubes. In the present study, 32% of eligible women

23138 Coppus.indd 90 28-09-12 14:55


91

5

did not undergo tubal testing, as they conceived prior to tubal testing, or were

referred for treatment with IVF, thereby surpassing the need for tubal testing. As our

purpose was to examine the relation between medical history and outcome of tubal

assessment, this mechanism could have led to biased estimates of associations. To

examine the impact of this partial verification, we explored whether there were any

systematic differences between women that did and did not undergo tubal testing.

This was indeed the case, although the absolute differences were small. However,

studies in which the diagnosis is immediately verified in all women are in our opinion

hardly feasible, leaving the present approach as the most practical and possible one.

We chose to develop models for two distinct definitions of tubal pathology; one

in which any type of tubal pathology was considered abnormal, and one in which

only bilateral abnormalities were considered to be abnormal. The latter is in our

opinion clinically the one most relevant, as these women have virtually no chance of

conceiving either spontaneously or after intra-uterine insemination. From a clinical

point of view, it is undesirable that these women are exposed to a long period of

expectant management and they should be identified early in the diagnostic workup

and counselled for IVF. At present, the detection of unilateral tubal pathology is unlikely

to imply a major change in clinical management, unless the woman already has a poor

chance of achieving a treatment independent pregnancy (van der Steeg et al., 2007).

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

0 5 10 15 20 25 30 35 40

Pred

icted prob

ability of abn

ormal tu

bal test

Diagnos6c index score

FIGURE 3. Score chart of the bilateral and any tubal pathology models to estimate the probability of tubal disease. (A) Calculation of the sumscore; (B) Probability of disease corresponding to this sum score. Upper line indicates the probability of any tubal pathology. Lower line indicates the probability of bilateral disease.

23138 Coppus.indd 91 28-09-12 14:55

Chapter 5

92

(A)

Scor

e b

ilate

ral m

odel

Scor

e un

i- o

r bila

tera

l mod

el

Wom

an’s

ag

e (y

r)<

3232

-24

34-3

6>

36

Bila

tera

l pat

holo

gy m

odel

00

23

……

……

……

……

….

Dur

atio

n o

f ch

ildw

ish

(yr)

<1

1-2

2-3

3-4

>4

Bila

tera

l pat

holo

gy m

odel

12

34

8…

……

……

……

……

.

Any

pat

holo

gy m

odel

01

23

5…

……

……

……

……

.

Mal

e ag

e (y

r)<

2525

-30

30-3

5>

35

Bila

tera

l pat

holo

gy m

odel

-8-1

0-1

1-1

3…

……

……

……

……

.

Refe

rral

sta

tus

(GP

or g

yn)

GP

Gyn

Bila

tera

l pat

holo

gy m

odel

04

……

……

……

……

….

Cou

ple

con

ceiv

ed in

pre

sent

rela

tion

No

Yes

Bila

tera

l pat

holo

gy m

odel

0-2

……

……

……

……

….

Bot

h c

once

ived

in p

rior

rela

tion

No

Yes

Bila

tera

l pat

holo

gy m

odel

08

……

……

……

……

….

Any

pat

holo

gy m

odel

07

……

……

……

……

….

On

ly m

an c

once

ived

in p

rior

rela

tion

No

Yes

Any

pat

holo

gy m

odel

04

……

……

……

……

….

Dys

men

orrh

oea

No

Yes

Any

pat

holo

gy m

odel

02

……

……

……

……

….

Wom

an s

mok

erN

oYe

s

Bila

tera

l pat

holo

gy m

odel

03

……

……

……

……

….

Mal

e sm

oker

No

Yes

Bila

tera

l pat

holo

gy m

odel

03

……

……

……

……

….

Any

pat

holo

gy m

odel

02

……

……

……

……

….

Prev

ious

sal

pin

giti

sN

o Ye

s

Bila

tera

l pat

holo

gy m

odel

09

……

……

……

……

….

Any

pat

holo

gy m

odel

07

……

……

……

……

….

23138 Coppus.indd 92 28-09-12 14:55


93

5

Prev

ious

Ch

lam

ydia

infe

ctio

nN

o Ye

s

Bila

tera

l pat

holo

gy m

odel

06

……

……

……

……

….

Any

pat

holo

gy m

odel

05

……

……

……

……

….

His

tory

of e

xtra

-ute

rin

e p

regn

ancy

No

Yes

Bila

tera

l pat

holo

gy m

odel

010

……

……

……

……

….

Any

pat

holo

gy m

odel

017

……

……

……

……

….

His

tory

of l

egal

ly in

duc

ed a

bor

tion

No

Yes

Any

pat

holo

gy m

odel

03

……

……

……

……

….

His

tory

of p

elvi

c su

rger

yN

oYe

s

Bila

tera

l pat

holo

gy m

odel

03

……

……

……

……

….

Any

pat

holo

gy m

odel

08

……

……

……

……

….

Sum

of a

bov

e…

……

……

……

……

.…

……

……

……

……

.

+15

+0

Dia

gnos

tic

ind

ex s

core

……

……

……

……

….

……

……

……

……

….

Proc

edur

e: c

ircle

the

scor

es fo

r eac

h of

the

varia

bles

, tra

nsfe

r to

right

mos

t col

umn

and

add

to g

et th

e di

agno

stic

inde

x sc

ore.

Mar

k th

e sc

ore

in th

e ap

prop

riate

figu

re b

elow

in

ord

er to

read

off

on th

e Y-

axe

the

prob

abili

ty o

f (se

vere

) tub

al p

atho

logy

.

23138 Coppus.indd 93 28-09-12 14:55

Chapter 5

94

We limited the study to those women that underwent their first tubal patency test.

Couples with a history of previous tubal testing, previous reproductive surgery

or previous IVF are less likely to undergo tubal testing than couples without

such a history. We also excluded couples in whom the woman was anovulatory

or in whom the man had severe oligoasthenozoospermia requiring IVF-ICSI. The

rationale for excluding these couples was to prevent biased associations as in

these conditions tubal patency testing will not always be performed routinely.

To our surprise, we found male characteristics to be associated with the probability of

tubal subfertility. Although male age, male smoking habits, as well as a pregnancy in a

previous relationship cannot causally be related to female tubal damage, it is possible

that these male characteristics act as indicators for some unrecorded variables,

such as number of sexual partners or socio-economic status, or reflect lifestyle and

selection mechanisms. We performed sub-analyses in which only female factors were

evaluated, but these models had a significantly lower predictive performance than

the models in which data of the couple were included (data not shown). Furthermore,

the model was developed from data collected in over 30 centres, and we corrected

for a possible confounding cluster effects of male characteristics. However, it might

be possible that these associations do not hold in a different population.

There are two ways in which our decision rule can be used. A clinician can use the

paper score chart, which transforms the total sum score into a probability of tubal

disease. Alternatively, the clinician can use the regression formula, which can be

incorporated into software, i.e. used in an electronic patient file, or a handheld

device such as a PDA.

Preferences of subfertile women regarding the trade-off between the likelihood

of detecting tubal pathology and the risks and discomforts associated with tubal

testing are currently unknown; therefore, a clear cut-off level cannot be advised at

present. The decision rules presented here cannot be used to exclude subfertile

women from tubal patency testing overall, but allow one to estimate a probability

of disease in order to time tubal patency tests adequately. This means that in clinical

practice, tubal testing can be performed directly in case of a high probability of

tubal disease or postponed for several months in women with a low probability.

In conclusion, the present study provides two easy-to-use decision rules that can

accurately express the woman’s probability of (severe) tubal pathology at the

23138 Coppus.indd 94 28-09-12 14:55


95

5

couple’s first consultation. They could be used to select women for tubal testing

more efficiently.

AppendixThe chance of tubal pathology can be calculated from the multivariable models with the formula: probability = 1/[1 + exp(-β)], where the β for the different models are:

‘any tubal pathology’ model: β = -2.086+ (0.085*duration of child wish) + (0.445*conceiving of only male partner prior relation) + (0.672*both partners conceived prior relation) + (0.193*male smoker) + (0.680*PID) + (0.494*Chlamydia infection) + (1.729*ectopic pregnancy) + (0.280*legally induced abortion) + (0.771*pelvic surgery).

‘bilateral tubal pathology’ model: β = -2.201 + (0.351*referral gynaecologist) + (0.054*female age above 32 years) + (0.129*duration of child wish) + (-0.235*conceived in present relation) + (0.792*both partners conceived prior relation) + (-0.034*male age) + (0.230*dysmenorrhoea) + (0.272*female smoker) + (0.259*male smoker) + (0.889*PID) + (0.583*Chlamydia infection) + (0.961*ectopic pregnancy) + (0.305*pelvic surgery).

23138 Coppus.indd 95 28-09-12 14:55

Chapter 5

96

References

Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007; 22: 1353-1358.

Evers JL. Female subfertility. Lancet 2002; 360:151-159.

Forsey JP, Caul EO, Paul ID, Hull MG. Chlamydia trachomatis, tubal disease and the incidence of symptomatic and asymptomatic infection following hysterosalpingography. Hum Reprod 1990; 5:444-447.

Harrell FE Jr, Lee KL, Pollock BG. Regression models in clinical studies: determining relationships between predictors and response. J Natl Cancer Inst 1988; 80:1198-1202.

Hubacher D, Grimes D, Lara-Ricalde R, de la Jarra J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor subfertility. Fertil Steril 2004; 81:6-10.

Liberty G, Gal M, Halevy-Shalem T, Michaelson-Cohen R, Galoyan N, Hyman J, Eldar-Geva T, Vatashky E, Margalioth E. Lidocaine-Prilocaine (EMLA) cream as analgesia for hysterosalpingography: a prospective, randomized, controlled, double blinded study. Hum Reprod 2007; 22:1335-1339.

Mol BW, Coppus SF, Van der Veen F, Bossuyt PM. Evaluating predictors for the outcome of assisted reproductive technology: ROC-curves are misleading; calibration is not! Fertil Steril 2005; 84;s253–s254 (Abstract).

National Institute for Clinical Excellence (2004) Guideline CG11: Fertility: assessment and treatment for people with fertility problems. (http://www.nice.org.uk/download.aspx?o=CG011fullguideline)

Nederlandse Vereniging voor Obstetrie en Obstetrie (2004) Orienterend fertiliteitsonderzoek [NVOG guideline diagnostic fertility workup]

Steyerberg EW, Eijkemans MJ, Harrell FE Jr., Habbema JD. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000; 19:1059-1079.

Steyerberg EW, Harrell FE Jr., Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001; 54:774-781.

Van der Heijden GJ, Donders AR, Stijnen T, Moons KG. Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 2006; 59:1102-1109.

Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile. Hum Reprod 2007; 22:536-342.

23138 Coppus.indd 96 28-09-12 14:55

23138 Coppus.indd 97 28-09-12 14:55

23138 Coppus.indd 98 28-09-12 14:55

6The capacity of

hysterosalpingography and

laparoscopy to predict natural

conception.

Harold VerhoeveSjors Coppus

Jan Willem van der SteegPieternel Steures

Peter HompesPetra BourdrezPatrick Bossuyt

Fulco van der VeenBen Willem Mol


The contents of this article were presented as a poster presentation at the 63nd ASRM Annual Meeting, 2007, Washington, USA.

23138 Coppus.indd 99 28-09-12 14:55

Chapter 6

100

Abstract

BACKGROUND: Laparoscopy has been claimed to be superior to

hysterosalpingography (HSG) in predicting fertility. Whether this conclusion

is applicable to a general subfertile population can be questioned as data in

support of this claim were collected in third line centres. The aim of this study

was to assess the prognostic capacity of HSG and laparoscopy in a general

subfertile population.

METHODS: In 38 centres, we prospectively studied a cohort of patients

referred for subfertility between 2002 and 2004, who underwent HSG and/or

laparoscopy as part of their subfertility work-up. Follow-up started immediately

after tubal testing and ended 12 months thereafter. Time to pregnancy was

censored at the date of last contact, when the woman was not pregnant or at

the start of treatment. Kaplan-Meier curves for the occurrence of spontaneous

intrauterine pregnancy were constructed for patients without tubal pathology,

for those with unilateral tubal pathology and for patients with bilateral tubal

pathology at HSG or laparoscopy. Multivariable Cox regression analysis was

used to calculate fecundity rate ratios (FRRs) to express associations between

tubal pathology and the occurrence of an intrauterine pregnancy.

RESULTS: Of the 3301 included patients, 2043 underwent HSG only, 747

underwent diagnostic laparoscopy only and 511 underwent both. At HSG, 322

(13%) patients showed unilateral tubal pathology and 135 (5%) showed bilateral

tubal pathology. At laparoscopy, 167 (13%) showed unilateral tubal pathology

and 215 (17%) showed bilateral tubal pathology. Multivariable analysis resulted

in FRRs of 0.81 (95% confidence interval (CI): 0.59-1.1) for unilateral, and 0.28

(95% CI: 0.13-0.59) for bilateral, tubal pathology at HSG. The FRRs at laparoscopy

were 0.85 (95% CI: 0.47-1.52) for unilateral, and 0.24 (95% CI: 0.11-0.54) for

bilateral, tubal pathology.

CONCLUSIONS: Patients with unilateral tubal pathology at HSG or laparoscopy

had a moderate reduction in pregnancy chances, whereas those with bilateral

tubal pathology at HSG or laparoscopy had a severe reduction in pregnancy

chances. This reduction was similar for HSG and laparoscopy, suggesting

that HSG and laparoscopy have a comparable predictive capacity for natural

conception.

23138 Coppus.indd 100 28-09-12 14:55

Prognostic significance of HSG and laparoscopy

101

6

Introduction

The prevalence of tubal pathology in subfertile populations varies between 11 and

30% and depends on whether one deals with a population in a primary, secondary

or tertiary setting, with the lowest prevalence in a primary care setting (Hull et al.,

1985; Collins et al., 1995; Snick et al., 1997). Chlamydia trachomatis is a major cause

for tubal pathology. Chlamydia infection is often asymptomatic and untreated

Chlamydia can progress to pelvic inflammatory disease (PID), infertility, ectopic

pregnancy, and chronic pelvic pain. An increase in the prevalence of Chlamydia

infections has been observed in women in their reproductive phase of life (Rekart

and Brunham, 2008) and hence tubal pathology remains an important cause of

subfertility.

A number of diagnostic tests are being used in clinical practice to assess tubal

patency as part of the work-up for subfertility. The most commonly used tests

are hysterosalpingography (HSG) and laparoscopy. HSG and Chlamydia Antibody

Titer test (CAT) are often seen as screening tests for the presence of tubal

pathology (den Hartog et al., 2008), whereas laparoscopy is considered the clinical

reference test for diagnosing tubal pathology (Swart et al.,1995; Mol et al., 1999b).

Laparoscopy allows visualization of peri-adnexal adhesions and the presence of

endometriosis, which cannot be done with HSG. Because of their invasive nature,

HSG and laparoscopy are usually reserved as the last investigation in the fertility

work-up. The findings at these diagnostic tests should translate into a prognosis

of natural pregnancy chances, which can be used to counsel patients on whether

their pregnancy chances can be improved by surgery, IUI or IVF.

The relative merits of HSG and laparoscopy for assessing tubal status have

been discussed for many years, reflecting a lack of agreement amongst fertility

subspecialists on which diagnostic tests have to be performed and their prognostic

utility (Helmerhorst et al., 1995; Balash, 2000; Fatum et al., 2002; Lavy et al., 2004;

Perquin et al., 2006).

Some studies have shown that patients with unilateral tubal pathology at HSG do

not have reduced chances of a treatment-independent pregnancy, in contrast to

those with bilateral tubal pathology (Mol et al., 1997). For diagnostic laparoscopy

(DLS) and dye, a moderate reduction of natural pregnancy chances in case

23138 Coppus.indd 101 28-09-12 14:55

Chapter 6

102

of unilateral tubal pathology and a severe reduction in case of bilateral tubal

pathology have been observed (Mol et al., 1999a). However these studies were

retrospective in design, relatively small and had included patients visiting a tertiary

referral hospital. In another study, laparoscopy was found to be a better predictor

of future fertility than HSG (Mol et al., 1999b). This study analysed data of couples

who underwent both HSG and laparoscopy and also consisted of a tertiary care

population. The conclusion that laparoscopy is a better predictor of infertility than

HSG was weakened by the fact that the median interval to laparoscopy after a

normal HSG was 10 months, compared with 4.5 months for women in whom the

HSG showed two-sided tubal abnormalities.

It is not clear if the results of these studies are applicable to the general subfertile

population. The purpose of the study presented here was to evaluate the impact

of unilateral and bilateral tubal pathology at HSG and laparoscopy on treatment-

independent pregnancy rates in a large prospective cohort of subfertile ovulatory

women from mixed secondary and tertiary hospital population. In particular, we

evaluated whether a difference in prognostic capacity exists between HSG and

laparoscopy.


Between January 2002 and February 2004, consecutive couples presenting at

the fertility clinic of 38 centres in The Netherlands were asked to participate in

a prospective cohort study. The study was approved by the Institutional Review

Board in each institution. All couples underwent a basic fertility work-up according

to the guidelines of the Dutch Society of Obstetrics and Gynaecology. The details

of this work-up have been described previously (van der Steeg et al., 2007).

PatientsThe present analysis was limited to couples with a regular ovulatory cycle, defined

as a cycle length between 23 and 35 days, with a within cycle variation of less

than 8 days. None of the patients analysed used Clomiphene Citrate. Ovulation

was detected by a basal body temperature chart, midluteal serum progesterone,

or by ultrasonographic monitoring of the cycle. Couples with a history of reversal

of sterilization, tubal surgery, IVF, or previous tubal patency testing were excluded.

23138 Coppus.indd 102 28-09-12 14:55


103

6

Only those women who underwent HSG and/or laparoscopy as part of their

fertility work-up were included in the analysis presented here.

Duration of subfertility was defined as the period between the time the couple had

started trying to conceive and the time of tubal testing. Female age was calculated

at the time of HSG or laparoscopy. Subfertility was considered to be secondary

if a woman had conceived in this or in a previous partnership, regardless of the

pregnancy outcome.

In all male partners, at least one semen analysis was performed. The total motile

sperm count (TMC) was calculated by multiplying semen volume, sperm

concentration and percentage of motile spermatozoa. Couples in whom semen

analysis showed a severe impairment of semen quality requiring IVF-ICSI (defined

as a total motile count < 1*106), were also excluded from the present analysis.

Tubal testing and follow-upClinics were free to use their own tubal testing protocol. In general, three different

protocols could be distinguished. With the first strategy, tubal patency was

routinely evaluated early in the work-up with either HSG or laparoscopy. In case

of abnormal findings at HSG, a laparoscopy was planned to verify these findings.

With the second strategy, tubal pathology was considered to be absent in case

of a negative CAT and HSG or laparoscopy was only performed in CAT-positive

women. With the third strategy, CAT-negative women were evaluated with HSG,

while laparoscopy was planned in CAT-positive women.

HSG was performed in the follicular phase of the menstrual cycle with either a

water-soluble or oil-based contrast medium. Patients were in a supine position

during the whole procedure. X-ray photographs were taken. Spasmolytic

drugs were allowed to be used. Each HSG was evaluated by a fertility specialist.

Laparoscopy was performed with a double-puncture technique. Methylene blue

was injected at room temperature through a Foley catheter in the uterine cavity.

The amount of methylene blue injected was variable, depending on the time

necessary to assess tubal function. Findings at HSG were classified as no tubal

occlusion, one-sided tubal occlusion or two-sided tubal occlusion and impaired

flow of contrast if contrast was not shown beyond the isthmic portion of the tube.

Findings at laparoscopy were classified as normal, one-sided tubal occlusion or

23138 Coppus.indd 103 28-09-12 14:55

Chapter 6

104

two-sided tubal occlusion and proximal- or distal tubal occlusion. Additional tubal

pathology at laparoscopy, i.e. adhesions disturbing ovum pick-up were classified

separately.

Follow-up started after tubal testing and ended 12 months thereafter. A model was

used to calculate the chances of natural conception (Hunault et al., 2004). Expectant

management was advised if the fertility work-up showed no abnormalities and

couples had a probability of natural conception in the next 12 months of 40% or

higher. Treatment was generally advised to those with a probability below 30%.

For all women lost to follow-up, the general practitioner was sent a questionnaire

and asked about the fertility status of the couple.

AnalysisThe primary end-point in this study was a spontaneous ongoing pregnancy at

12 weeks of gestational age, confirmed by ultrasonography. The first day of the

last menstrual cycle was considered to mark the end of time until spontaneous

conception. Time to pregnancy was censored at the moment treatment (IUI, IVF

or tubal surgery) started within 12 months after counseling, or at the last date of

contact during follow-up, when the couple had no ongoing pregnancy.

To examine if proximal occlusions identified at HSG or laparoscopy had a different

prognostic effect on treatment-independent pregnancy than distal occlusion /

hydrosalpinx or hydrosalpinges, ongoing pregnancy rates were scored in relation

to the findings of tubal testing at HSG and laparoscopy. We classified the findings

at HSG and laparoscopy into three groups and constructed Kaplan-Meier curves

for each i.e. for women without tubal pathology, for those with one-sided tubal

pathology and for women with two-sided tubal pathology, irrespective of their

nature. Subsequently, adjusted fecundity rate ratios (FRRs) and 95% confidence

intervals (CIs) were calculated through multivariable Cox regression modeling.

These FRRs express the instantaneous probability of an ongoing pregnancy for

women with a particular feature relative to the probability in those without that

feature. All analyses were stratified according to centre, to adjust for any potential

clustering effects of type of tubal work-up and for differences in prognostic profile

of women presenting to the participating clinics (Harrell, 2001).

23138 Coppus.indd 104 28-09-12 14:55


105

6

In case a woman had been evaluated both with HSG and DLS, the results of these

were evaluated separately, i.e. the result of the respective tubal test was used in

the analysis, not the definitive diagnosis regarding tubal pathology in an individual

woman. In those women who underwent both HSG and laparoscopy, we classified

tubal pathology as no abnormalities, unilateral pathology and bilateral pathology

and subsequently constructed 3 x 3 tables to compare tubal occlusion detected at

HSG with occlusion detected at laparoscopy.

Results

Data of 7860 couples were collected. Of these, 938 women (12%) did not have a

regular ovulatory cycle, 636 men (8%) had severely impaired semen quality and

196 couples (2.5%) had previously undergone fertility surgery or IVF. In another

568 women (7%) HSG or laparoscopy had been performed in a previous episode

of subfertility. A further 2221 women (28%) did not undergo HSG or DLS as part of

their fertility work-up, leaving 3301 women (42%) for this analysis (Fig. 1).

The baseline characteristics of the 3301 couples are shown in Table I. Median

duration of subfertility at HSG was 1.8 years (5th-95th percentiles: 1.0-4.3 years) and

at laparoscopy 2.2 years (5th-95th percentiles: 1.2-5.0 years). The median time from

the first visit to HSG was 3.0 months (5th-95th percentiles: 1-11.4), the median time

from the first visit to laparoscopy was 5.9 months (5th-95th percentile: 1.4-16.8). Of

the 3301 women in the analysis, 2043 underwent HSG, 747 women

underwent DLS and in 511 women the tubal status was evaluated with both.

In 1626 of the HSGs (64%) a water-based contrast medium was used and in 359

of the HSG’s (14%) an oil-based contrast medium was used; in 569 (22%) it was

unknown which medium was used. The findings at HSG and laparoscopy and

occurrence of ongoing pregnancies are shown in Tables II and III. The data we

collected allowed us to make a distinction between proximal and distal occlusion/

hydrosalpinx for laparoscopy. In the collected data for HSG, the only distinction we

were allowed to make was between no occlusion, occlusion and impaired flow of

contrast, by which was meant the absence of flow of contrast beyond the isthmic

portion of the fallopian tube. No distinction could be made between proximal and

distal occlusion.

23138 Coppus.indd 105 28-09-12 14:55

Chapter 6

106

TABLE I. Baseline characteristics of 3301 couples at time of procedure

HSG (n=2554) Diagnostic laparoscopy (n=1258)

Mean/Median* 5th-95th percentiles Mean/Median* 5th-95th percentiles

Male age (years) 35.4 28.0 – 45.0 35.1 27.6 – 44.8

Duration of subfertility (years) 1.8* 1.0 – 4.3 2.2* 1.2 – 5.0

Semen analysis – TMC (106) 57.6* 4.0 – 308 59.5* 4.8 – 274

Subfertility, primary (n) (%)Time to tubal test (months) †

15683.0*

61.41.0- 11.4

8065.9*

64.11.4- 16.8

TMC, total motile sperm count.*Value is the median.† time from first consultation to tubal testing

Of the women who had a HSG, 186 (7%) were lost to follow-up and of those who

had a laparoscopy 88 (7%) were lost to follow-up. A natural conception within one

year occurred in 454 (18%) of the women after HSG, in 126 (10%) after laparoscopy

and in 48 (9.4%) of those who underwent both HSG and laparoscopy, which would

correspond with cumulative treatment-independent pregnancy rates of 33% after

HSG and 23% after laparoscopy (Fig. 2). The number of patients, who were still

under study at 6 and 12 months, respectively was 800 and 240 after HSG. For

laparoscopy these numbers were 243 and 81, respectively. In those with unilateral

tubal pathology, these numbers were 145 and 57 after HSG and 46 and 14 after

laparoscopy. For patients with bilateral tubal pathology, the numbers remaining

for analysis at 6 and 12 months were 68 and 20 after HSG in comparison to 74 and

24 after laparoscopy.

TABLE II. Fertility outcome after HSG

Findings at HSG Frequency (%) Number of ongoing

pregnancies

FRR

Bilateral normal 2097 (82%) 350 1

Unilateral impaired flow of contrast * 84(3%) 10 0.68 (0.36-1.3)

Unilateral occlusion 238 (9%) 38 0.89 (0.63-1.3)

Unilateral occlusion, contra-lateral impaired flow of contrast*

25 (1%) 2 0.36 (0.09-1.5)

Bilateral impaired flow of contrast* 51 (2%) 2 0.19 (0.05-0.80)

Bilateral occlusion 59 (2%) 3 0.26 (0.08-0.80)

FFR = Fecundity Rate Ratio*Impaired flow meaning no flow of contrast beyond the isthmic portion of the tube.

In 511 patients both HSG and laparoscopy were performed. Tubal status detected

at HSG compared to tubal status detected at laparoscopy is shown in table IV. HSG

showed one-sided tubal occlusion in 153 (30%) and two-sided tubal occlusion

23138 Coppus.indd 106 28-09-12 14:55


107

6

in 82 (16%) of these patients. Laparoscopy showed one-sided tubal occlusion in

79 (15%) and two-sided tubal occlusion in 56 (11%) of these patients. In those

patients where HSG showed one-sided occlusion, laparoscopy revealed no

occlusion in 60%, whereas if HSG showed two-sided occlusion, laparoscopy

revealed no occlusion in 44%. If laparoscopy showed one-sided occlusion, HSG

revealed no occlusion in 22% of these patients and if laparoscopy showed two-

sided occlusion, HSG showed no occlusion in 23% of cases.

TABLE III. Fertility outcome after laparoscopy.

Findings at laparoscopy Frequency (%)

Number of ongoing pregnancies

FRR

Bilateral normal 876 (70%) 90 1

Unilateral proximal occlusion 126 (10%) 12 0.58 (0.44-1.7)

Unilateral hydrosalpinx/distal occlusion 25 (2%) 3 1.40 (0.43-4.5)

Unilateral proximal occlusion, contralateral hydrosalpinx/distal occlusion

21 (2%) 3 1.83 (0.56-6.0)

Bilateral proximal occlusion 114 (9%) 3 0.19 (0.06-0.62)

Bilateral hydrosalpinx/distal occlusion 23 (2%) 1 0.34 (0.05-2.5)

Unilateral proximal occlusion/ hydrosalpinx/distal occlusion, contra-lateral adhesions

32 (3%) 1 0.58 (0.08-4.4)

Bilateral patent, unilateral adhesions 16 (1%) 0 0 (0-NE)

Bilateral patent, bilateral adhesions 25 (2%) 0 0 (0-NE)

FFR = Fecundity Rate RatioNE = not estimable

After HSG, the presence of unilateral impaired flow and unilateral occlusion

showed no significant difference in FRR (0.68 and 0.89, respectively, with wide and

overlapping CIs). Similarly, a small difference in FRR (0.19 and 0.26) between bilateral

impaired flow and occlusion (which included proximal and distal occlusion) was

seen (table II). For this reason impaired flow of contrast was considered as complete

occlusion, because this did not prove tubal patency.

With laparoscopy, we found no differences in prognostic outcome for proximal

occlusions identified at laparoscopy compared with distal occlusion / hydrosalpinx

or hydrosalpinges. For unilateral proximal occlusion the FRR was 0.58 and for distal

unilateral occlusion 1.40, but both showed wide and overlapping CIs. For bilateral

proximal occlusion the FRR was 0.19 compared to 0.34 in case of bilateral distal

occlusion, with a wide and overlapping CI in the latter (Table III). We therefore

classified tubal pathology into three main categories, namely no tubal pathology,

unilateral tubal pathology and bilateral tubal pathology.

23138 Coppus.indd 107 28-09-12 14:55

Chapter 6

108

Using this classification, the results of the uni- and multivariable Cox regression

analysis are shown in Table V. At HSG the prevalence of one-sided tubal pathology

was 13% (322 women), and of bilateral tubal pathology 5.3% (135 women). At

laparoscopy, the prevalence of one-sided tubal pathology was 13% (167 women)

and of bilateral tubal pathology 17% (215 women).

TABLE IV. Tubal status detected at HSG as compared to the tubal status detected at laparoscopy.

Laparoscopy Two-sided occlusion One-sided occlusion No occlusion Total

HSG

Two-sided occlusion 31 15 36 82

One-sided occlusion 12 47 94 153

No occlusion 13 17 246 276

Total 56 79 376 511

Bilateral tubal pathology at HSG or laparoscopy was associated with a lower

probability of treatment-independent pregnancy, with an adjusted FRR of 0.28

(95% CI: 0.13-0.59) for HSG and 0.24 (95% CI: 0.11-0.54) for laparoscopy. Unilateral

tubal pathology at HSG or laparoscopy affected the probability of treatment-

independent pregnancy slightly, with an adjusted FRR of 0.81 (95% CI: 0.59-1.11)

for HSG and 0.85 (95% CI: 0.47-1.52) for laparoscopy. A comparable reduction in

the probability of treatment-independent pregnancy was seen for duration of

subfertility with a FRR of 0.90 (95% CI: 0.90 to 1.00) and 0.78 (95% CI: 0.64 to 0.95)

for HSG and laparoscopy respectively. Kaplan-Meier analyses of the cumulative

probability of spontaneous intrauterine pregnancy up to I year are shown in Fig. 2a

for HSG and Fig. 2b for laparoscopy. For HSG the cumulative pregnancy rate after

finding unilateral tubal pathology was slightly reduced, but after bilateral tubal

pathology the cumulative pregnancy rate was markedly reduced. Similar results

were seen for laparoscopy.

23138 Coppus.indd 108 28-09-12 14:55


109

6

cum

ulat

ive

ongo

ing

pre

gnac

y ra

te (%

)40

30

20

10

0 0 2 4

time after HSG (months)6 8 10 12

FIGURE 2A. Cumulative ongoing pregnancy rate after HSG.

Results of the Kaplan-Meier analysis for HSG. Black line represents bilateral normal HSG. Light grey line represents one-sided abnormal HSG. Dark grey line represents women with a bilateral abnormal HSG. Crosses indicate censored data.

cum

ulat

ive

ongo

ing

pre

gnac

y ra

te (%

)

40

30

20

10

0 0 2 4

time after laparoscopy (months)6 8 10 12

FIGURE 2B. Cumulative ongoing pregnancy rate after laparoscopy.

Results of the Kaplan-Meier analysis for laparoscopy (DLS). Black line represents bilateral normal DLS. light grey line represents one-sided abnormal DLS. Dark grey line represents women with a bilateral abnormal DLS. Crosses indicate censored data.

23138 Coppus.indd 109 28-09-12 14:55

Chapter 6

110

TABLE V. Results of the uni- and multivariable Cox regression analysis

N (%)

Univariable analysis Multivariable analysis

HSG HSG

FRR 95% CI P-value FRR 95% CI P-value

No tubal pathology 2097 (82) 1 - - 1 - -

Unilateral tubal pathology

322 (13) 0.83 0.61-1.13 0.25 0.81 0.59-1.11 0.19

Bilateral tubal pathology

135 (5) 0.25 0.12-0.54 <0.001 0.28 0.13-0.59 0.001

Duration of subfertility (years)

0.89 0.89-0.99 0.02 0.90 0.90-1.00 0.05

Female age (years) 0.95 0.93-0.98 <0.001 0.95 0.93-0.97 <0.001

Secondary subfertility 1.41 1.15-1.72 <0.001 1.53 1.24-1.88 <0.001

N (%)

Laparoscopy Laparoscopy


No tubal pathology 876 (70) 1 - - 1 - -

Unilateral tubal pathology

167 (13) 0.90 0.51-1.61 0.73 0.85 0.47-1.52 0.58

Bilateral tubal pathology

215 (17) 0.26 0.12-0.57 0.001 0.24 0.11-0.54 0.001

Duration of subfertility (years)

0.80 0.66-0.97 0.03 0.78 0.64-0.95 0.02

Female age (years) 1.02 0.97-1.08 0.37 1.01 0.96-1.07 0.59

Secondary subfertility 1.66 1.13-2.46 0.01 1.78 1.19-2.66 0.005

Discussion

In this study we compared findings at HSG and laparoscopy with fertility outcome

in untreated subfertile couples, from the time of tubal testing up to I year of follow-

up. For HSG as well as for laparoscopy, patients with two-sided tubal pathology

had significantly worse fertility prospects, whereas fertility prospects in those

with one-sided tubal pathology were only moderately worse than those without

tubal pathology. Both imaging tests showed a comparable reduction in FRRs for

unilateral as well as for bilateral tubal pathology.

Because proximal tubal pathology (occlusion as well as impaired flow) is more

likely to be due to artefacts, such as tubal spasm or ’steal effect’ than distal

tubal occlusion or hydrosalpinx, we analysed whether there was a difference in

FRR between proximal and distal tubal pathology for the detection of natural

pregnancy, but we were unable to detect such a difference. For this reason we

classified the findings at HSG and laparoscopy into three main groups, namely

23138 Coppus.indd 110 28-09-12 14:55


111

6

no tubal pathology, unilateral tubal pathology and bilateral tubal pathology to

calculate the FRRs.

The fertility rate ratios for two-sided tubal pathology are in accordance with results

from our previous small retrospective Dutch cohort study, which examined the

prognostic value of HSG for fertility outcome, as well as with data of the previous

large prospective Canadian study (Mol et al., 1997, 1999b). However, in the present

study we found a lower FRR in case of bilateral tubal pathology at HSG, compared

with the Canadian cohort. This could be due to the lower prevalence of bilateral

tubal pathology at HSG (5% compared to 24% in the Canadian study) whereas

the prevalence of unilateral tubal pathology at HSG was comparable (13% versus

14%). This might be explained by the fact that in the Canadian study only patients

that underwent both HSG and laparoscopy were analysed. Patients who had an

abnormal HSG but had a natural conception before a laparoscopy was performed

were not part of the analysis. Since natural conception indicates absence of

significant tubal pathology, a selection of patients with a higher prevalence of

tubal pathology took place in this study. Although the present study showed a

similarly low FRR for bilateral tubal pathology at laparoscopy, the prognostic

value of unilateral tubal pathology identified at laparoscopy (FRR 0.85) differed

markedly from that in our previous two studies, which showed a FRR of 0.65 and

0.51 respectively (Mol et al., 1999a,b). Both studies included only women from a

tertiary patient population who had a HSG followed by laparoscopy. Laparoscopy

was usually performed shortly after an abnormal HSG, whereas for women with a

normal HSG the laparoscopy was withheld for a longer time. In the present study,

15% of the patients underwent both HSG and laparoscopy. The median time

between the first visit of a couple and laparoscopy was 5.9 months compared to a

median time in the Canadian study between HSG and laparoscopy of 10 months in

those in whom HSG showed no abnormalities, and 8.5 months in those in whom

HSG showed unilateral tubal occlusion and 4.5 months in those in whom HSG

showed bilateral tubal occlusion, respectively.

In that study there were relatively few women in whom bilateral occlusion was

found at laparoscopy after a normal HSG or one-sided tubal occlusion at HSG (5%)

(Mol et al., 1999b). The longer delay between HSG and laparoscopy in the Canadian

study resulted in a higher proportion of patients with poor prognosis, which

could have negatively influenced the FRR. Selection bias may have influenced

23138 Coppus.indd 111 28-09-12 14:55

Chapter 6

112

the outcome of these previous studies, overestimating the detection rate of tubal

pathology at laparoscopy in comparison to patients who only undergo HSG or a

laparoscopy without a previous HSG.

Of notice is the lack of agreement between HSG and laparoscopy in those patients

who underwent both imaging tests. At laparoscopy bilateral tubal occlusion was

diagnosed in 5% of the patients where previously HSG showed no occlusion.

However, in 60% of the patients where HSG showed one-sided occlusion, and in

44% where HSG showed two-sided occlusion, laparoscopy did not show any tubal

occlusion. In 23% of the patients with two-sided occlusion at laparoscopy, HSG

showed tubal patency, emphasising that laparoscopy cannot be considered the

gold standard for diagnosing tubal pathology (Mol et al. 1999b).

Our finding that the presence of tubal pathology at HSG and laparoscopy has

a similar FRR is of interest. Unilateral tubal pathology reduced the probability of

a natural conception by 20% and bilateral tubal pathology reduced pregnancy

chances by 75%, whether diagnosed at HSG or at laparoscopy. Although we

were not able to directly compare the prognostic capacity of both imaging tests,

because only 15% of the included patients underwent HSG as well as laparoscopy

and as there were several months in between these tests, our findings suggest

that the prognostic capacity of HSG and that of laparoscopy do not differ much.

This is in contrast with previous studies, which concluded that laparoscopy is a

better predictor of treatment-independent pregnancy (Mol et al., 1999b).

The results of our study are in support of the recommendation of the fertility-

guidelines of the National Institute for Clinical Excellence to use HSG to test for

tubal pathology in women who are not known to have co-morbidities (NICE,

2004). Additional advantages of HSG are that a normal HSG reduces the probability

that tubal pathology plays a role in future fertility chances, as we can see in this

study, and which confirms the findings of previous studies (Swart et al., 1995, Mol

et al., 1999b, den Hartog et al., 2008). If an oil-soluble contrast medium is used for

tubal flushing, this can have a positive effect on pregnancy rates (Luttjeboer et

al., 2007). Using HSG in women at low risk for tubal pathology limits the number

of unnecessary laparoscopies because the prevalence of tubal pathology after

normal HSG is low (Bosteels et al., 2007, den Hartog et al., 2008). A randomised

trial did not show an improved pregnancy outcome if diagnostic laparoscopy

23138 Coppus.indd 112 28-09-12 14:55


113

6

was routinely performed after normal HSG and before treatment with intrauterine

insemination (Tanahatoe et al., 2005).

An important limitation of our study is that not in every patient HSG as well

as laparoscopy was performed. Patients at low risk for tubal pathology were

offered HSG and those with co-morbidities and considered to be at risk for tubal

pathology, were offered laparoscopy. Furthermore, the interpretation of this study,

like previous studies, is hampered by selection bias caused by a difference in

timing of the imaging tests. It is not clear to what extent these limitations influence

the outcome of our study. Correction for these requires a study in which women

are offered CAT, HSG and laparoscopy as part of their subfertility work-up. These

tests should preferably be performed on the same day, or with a minimal delay in

between these tests. To address the clinical consequences of mild tubal pathology,

patients with mild tubal pathology should then be randomized to compare

natural conception with IUI and IVF. Until such a trial has been conducted, the

recommendations regarding expectant management versus treatment, in case of

unilateral pathology, depends on several factors such as age of the woman and

duration of subfertility. In patients with unexplained subfertility, the prognostic

model by Hunault et al. (2004) following tubal testing is advised. The most

important prognostic factors are age of the woman, duration of subfertility

and whether or not the couple has ever had a pregnancy before consultation.

If the chances for natural conception are above 30%, expectant management is

generally recommended. Below 30%, treatment is advised (Steures et al., 2006).

23138 Coppus.indd 113 28-09-12 14:55

Chapter 6

114

References

Balasch J. Investigation of the infertile couple: investigation of the infertile couple in the era of assisted reproductive technology: a time for reappraisal. Hum Reprod 2000; 15: 2251-2257.



Den Hartog JE, Lardenoije CM, Severens JL, Land JA, Evers JL, Kessels AG. Screening strategies for tubal factor subfertility. Hum Reprod 2008; 23:1840-1848.

Fatum M, Laufer N, Simon A. Investigation of the infertile couple: should diagnostic laparoscopy be performed after normal hysterosalpingography in treating infertility suspected to be of unknown origin? Hum Reprod 2002; 17:1-3.

Harrell FE jr. Regression Modeling Strategies. With Applications to Linear Models, Logistic Regression and Survival Analysis. Springer Series in Statistics. New York, USA: Springer Science Inc, 2001.

Helmerhorst FM, Oei SG, Bloemenkamp KW and Keirse MJ. Consistancy and variation in infertility investigations in Europe. Hum Reprod 1995; 10:2027-2030.

Hull MG, Glazener CM, Kelly NJ, Conway DI, Foster PA, Hinton RA, Coulson C, Lambert PA, Watt EM, Desai KM. Population study of causes, treatment, and outcome of infertility. Br Med J (Clin Res Ed) 1985; 291:1693-1697.

Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, te Velde ER. Two new prediction rules for spontaneous pregnancy leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004; 19:2019-2026.

Luttjeboer FY, Harada T, Hughes E, Johnson N, Lilford R, Mol BW. Tubal flushing for subfertility. Cochrane Database Syst Rev 2007; 3:CD003718.

Lavy Y, Lev-Sagie A, Holtzer H, Revel A, Hurwitz A. Should laparoscopy be a mandatory component of the infertility evaluation in infertile women with normal hysterosalpingogram or suspected unilateral distal tubal pathology. Eur J Obstet Gynecol Reprod Biol 2004; 114:64-68.

Mol BW, Swart P, Bossuyt PM, van der Veen F. Is hysterosalpingography an important tool in predicting fertility outcome? Fertil Steril 1997; 67:663-669.

Mol BW, Swart P, Bossuyt PM, van der Veen F. Prognostic significance of diagnostic laparoscopy for spontaneous fertility. J Reprod Med 1999a; 44:81-86.

Mol BW, Collins JA, Burrows EA, Van der Veen F, Bossuyt PM. Comparison of hysterosalpingography and laparoscopy in predicting fertility outcome. Hum Reprod 1999b; 14: 1237-1242.

National Institute for Clinical Excellence. Fertility: Assessment and Treatment for People with Fertility Problems. http://www.nice.org.uk, 2004

Perquin DA, Dorr PJ, de Craen AJ, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomised controlled trial. Hum Reprod 2006. 21: 1227-1231.

Rekart ML, Brunham RC. Epidemiology of chlamydial infection: are we losing ground? Sex Transm Infect 2008; 84:87-91.

23138 Coppus.indd 114 28-09-12 14:55


115

6

Snick HK, Snick TS, Evers JL, Collins JA. The spontaneous pregnancy prognosis in untreated subfertile couple: the Walcheren primary care study. Hum Reprod 1997; 12:1582-1588.

Swart P, Mol BW, van der Veen F, van Beurden M, Redekop WK, Bossuyt PM. The value of hysterosalpingography in the diagnosis of tubal pathology, a meta-analysis. Fertil Steril 1995; 64:486-491.

Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile couples. Hum Reprod 2007; 22:536-542.

Steures P, van der Steeg JW, Hompes PG, Habbema JD, Eijkemans MJ, Broekmans FJ, Verhoeve HR, Bossuyt PM, van der Veen F, Mol BW. Intrauterine insemination with controlled ovarian hyperstimulation versus expectant management for couples with unexplained subfertility and an intermediate prognosis: a randomised clinical trial. Lancet 2006; 368:216-221.

Tanahatoe S, Lambalk CB, Hompes PG. The role of laparoscopy in intrauterine insemination: a prospective randomized reallocation study. Hum Reprod 2005; 20: 3225-3230.

23138 Coppus.indd 115 28-09-12 14:55

23138 Coppus.indd 116 28-09-12 14:55

7Chlamydia trachomatis IgG

seropositivity is associated with lower

natural conception rates in ovulatory

women without visible tubal

pathology.

Sjors CoppusJolande Land

Brent OpmeerPieternel SteuresRene Eijkemans

Peter HompesFulco van der Veen

Ben Willem MolJan Willem van der Steeg


The contents of this article were presented as an oral presentation at the 25th ESHRE Annual Meeting, 2009, Amsterdam, Netherlands.

23138 Coppus.indd 117 28-09-12 14:55

Chapter 7

118

Abstract

BACKGROUND: The relation between Chlamydia trachomatis infection and

subsequent tubal damage is widely recognized. As such, C. trachomatis

antibody (CAT) testing can be used to triage women for immediate tubal

testing with hysterosalpingography (HSG) or laparoscopy. However, once

invasive tubal testing has ruled out tubal pathology, CAT serology status is

ignored, as its clinical significance is currently unknown. This study aimed

to determine whether positive CAT serology is associated with lower

spontaneous pregnancy rates in women in whom HSG and/or diagnostic

laparoscopy showed no visible tubal pathology.

METHODS: We studied ovulatory women in whom HSG or laparoscopy

showed patent tubes. Women were tested for C. trachomatis immunoglobulin

G (IgG) antibodies with either micro-immunofluorescence (MIF) or an ELISA.

CAT serology was positive if the MIF titre was ≥ 1:32 or if the ELISA index was

> 1.1. The proportion of couples pregnant without treatment was estimated

at 12 months of follow-up. Time to pregnancy was considered censored at

the date of the last contact when the woman was not pregnant or at the

start of treatment. The association between CAT positivity and an ongoing

pregnancy was evaluated with Cox regression analyses.

RESULTS: Of the 1882 included women without visible tubal pathology, 338

(18%) had a treatment- independent pregnancy within 1 year [estimated

cumulative pregnancy rate 31%; 95% confidence interval (CI): 27%-35%].

Because of differential censoring after 9 months of follow-up, regression

analyses were limited to the first 9 months after tubal testing. Positive C.

trachomatis IgG serology was associated with a statistically significant 33%

lower probability of an ongoing pregnancy [adjusted fecundity rate ratio

0.66 (95% CI 0.49 to 0.89)].

CONCLUSIONS: Even after HSG or laparoscopy has shown no visible tubal

pathology, subfertile women with a positive CAT have lower pregnancy

chances than CAT negative women. After external validation, this finding

could be incorporated into existing prognostic models.

23138 Coppus.indd 118 28-09-12 14:55

CAT testing and natural conception

119

7

Introduction

The devastating effect of Chlamydia trachomatis infections on the female

reproductive tract is widely recognized (Paavonen and Eggert-Kruse, 1999). The

diagnostic value of C. trachomatis immunoglobulin G (IgG) antibody titre (CAT)

testing has been studied extensively, showing fairly good screening properties. In

the Netherlands, this has resulted in a recommendation to include CAT testing as

part of the routine fertility workup of subfertile couples (NVOG, 2004).

CAT is mainly used to identify women at high risk of tubal pathology and to

triage them for further tubal testing (Coppus et al., 2007a). CAT serology status is

ignored however, once invasive testing has ruled out tubal pathology, as its clinical

significance is currently unknown. The causal relationship between Chlamydia

infections and tubal factor subfertility has been well established, but the question

whether positive CAT serology is associated with lower pregnancy rates has

been studied less often. It has been postulated that Chlamydia infections have

a detrimental effect on the endometrium resulting in impaired implantation and

lower pregnancy rates (Witkin, 1999; Neuer et al., 2000). Yet studies examining the

association between elevated C. trachomatis IgG antibody levels and pregnancy

rates in an IVF population were not uniformly conclusive. Whereas some studies

suggested a lower implantation and a lower ongoing pregnancy rate (Rowland

et al., 1985; Lunenfeld et al., 1989; Witkin et al., 1994; Pacchiarotti et al., 2009), other

work suggested that IVF outcome was not associated with CAT status (Osser et al.,

1990; Tasdemir et al., 1994; Claman et al., 1996; Sharara et al., 1997; Spandorfer et al.,

1999).

The prognostic value of C. trachomatis antibodies on spontaneous pregnancy

rates in subfertile women is equally uncertain. To the best of our knowledge, four

previous studies have examined this relation, with conflicting results (Idahl et al.,

2004; Kelz et al., 2006; Perquin et al., 2007; El Hakim et al., 2009). As a consequence, it

is still unclear whether positive CAT serology is associated with lower spontaneous

pregnancy rates once invasive tubal testing has ruled out tubal pathology. The

objective of this study was to assess in a prospective consecutive series of subfertile

couples, whether evidence of a past Chlamydia infection affects the probability of

spontaneous pregnancy in women without visible tubal pathology.

23138 Coppus.indd 119 28-09-12 14:55

Chapter 7

120


PatientsBetween January 2002 and February 2004, consecutive couples presenting at the

fertility clinics of 38 hospitals in The Netherlands were invited to participate in a

prospective cohort study. Institutional Review Board approval for this study was

obtained from each participating hospital. For all couples, a thorough medical

history was taken, including previous pelvic inflammatory disease (PID) and

Chlamydia infection (Coppus et al., 2007b), and all couples underwent a basic

fertility work-up according to the guidelines of the Dutch Society of Obstetrics and

Gynaecology. The details of this workup have been described in detail elsewhere

(van der Steeg et al., 2007).

Couples with a history of reversal of tubal ligation, tubal surgery, IVF, or previous

tubal patency testing were excluded. The present study was limited to couples

with a regular ovulatory cycle, defined as a cycle length between 23 and 35 days

with a within cycle variation of less than 8 days. Ovulation was confirmed by a basal

body temperature chart, mid-luteal serum progesterone, or by ultrasonographic

monitoring of the cycle. Duration of subfertility was defined as the period between

the date the couple had started unprotected intercourse and date of tubal testing.

We calculated female age at the time of tubal testing. Subfertility was considered

to be secondary if a woman had conceived in this or in a previous partnership,

regardless of the pregnancy outcome.

Semen analysis was performed at least once, and a total motile sperm count

(TMC) was calculated by multiplying semen volume, semen concentration and

percentage of progressive motile spermatozoa. Couples in whom semen analysis

showed a severe impairment of semen quality requiring IVF-ICSI (defined as a TMC

< 1 x 106) were excluded from the present analysis.

Chlamydia antibody testing was performed either with a micro-immuno-

fluorescence (MIF) technique (BioMerieux, Paris, France) or with an ELISA (Medac

GmbH, Wedel, Germany; Savyon Diagnostics LTD, Marne La Valle, France). CAT

testing was performed according to the manufacturer’s instructions and the test

result was considered positive at an MIF titre of ≥ 1:32 or an ELISA index of >1.1

(NVOG, 2004). Endocervical swabs were not routinely taken.

23138 Coppus.indd 120 28-09-12 14:55


121

7

Tubal patency was evaluated by hysterosalpingography (HSG) and/or diagnostic

laparoscopy, depending on the local protocol of the participating hospital. Tubal

pathology was considered absent if the HSG showed passage of contrast medium

through both tubes and normal intra-abdominal spread of contrast medium. When

laparoscopy was used, tubal pathology was considered absent if there were no signs

of tubal obstruction or severe adhesions interfering with ovum pickup. We did not

take into account minimal stage endometriosis. In case a woman underwent both

HSG and diagnostic laparoscopy, which was often done in case of an abnormal HSG,

the results of the laparoscopy were decisive. No therapeutic interventions were

performed during these procedures. Only women with bilateral patent tubes were

included in the analyses; all women with intra-abdominal pockets of contrast after

HSG or extensive adhesions at laparoscopy were excluded.

Follow-up started after tubal testing and ended 12 months thereafter. Primary

end-point was spontaneous conception resulting in an ongoing pregnancy at 12

weeks of gestational age. The first day of the last menstrual cycle was considered

to mark the end of time until spontaneous conception.

Statistical analysis Time to spontaneous conception resulting in an ongoing pregnancy was

considered censored at the moment treatment had started within 12 months

after testing, or at the last date of contact during follow-up if the couple did not

have an ongoing pregnancy at that date. If a woman became lost to follow-up

within 12 months after evaluation of tubal status, her general practitioner was sent

a questionnaire to document the most recent fertility status of the couple.

Kaplan-Meier curves for the chances of spontaneous ongoing pregnancy over

time were constructed for CAT positive and CAT negative women separately. The

difference in time to pregnancy between these groups was tested for significance

with the log-rank test statistic. Subsequently, we evaluated the influence of CAT

positivity using Cox regression modelling, expressing the effect as a fecundity rate

ratio (FRR) with corresponding 95% confidence intervals (CIs). An FRR expresses

the probability of an ongoing pregnancy per time unit for women with a particular

feature relative to the probability in women without that feature. We estimated

unconditional and conditional FRRs. For the conditional FRR, we built a multivariable

Cox model by additionally including the following known prognostic factors:

23138 Coppus.indd 121 28-09-12 14:55

Chapter 7

122

duration of subfertility, female age at tubal testing, and secondary subfertility

(Hunault et al., 2004).

The proportional hazard assumption of the Cox model was checked separately

for each variable before performing the regression analysis. Such checking was

done graphically by inspecting the graphs of loge [-log

eS(t)] versus log

e(t) for each

dichotomous covariate to assess whether the curves run parallel. Additionally, time-

dependent covariates were included and we checked whether their coefficient

differed significantly from zero. All data were analysed using SPSS 12.0.1 (SPSS Inc.,

Chicago, IL, USA).

Results

Data of 7860 couples were collected during the study period. After excluding

couples with anovulation or severe male factor subfertility, and those that had

undergone previous tubal testing, IVF or tubal surgery, data of 5522 ovulatory

subfertile women were available. In 1569 women (28%) CAT was not measured; in

1602 other women (29%) CAT was measured but no tubal testing was performed.

Tubal pathology was diagnosed in 469 (20%) at HSG or laparoscopy. The data of

the remaining 1882 women with no visible tubal pathology and a known CAT

result were included in this analysis (Fig. 1).

Baseline characteristics of the 1882 included couples are shown in Table 1. CAT

positive women were slightly, but statistically significantly older and more often

suffered from a secondary subfertility than did CAT negative women. In 1244

women (66%) tubal pathology was excluded by HSG, whereas in 638 of women

(34%) this was done by laparoscopy. CAT testing was performed by MIF in 443

women (24%) and by ELISA in 1439 women (76%). Overall, 438 women (23%) had

a positive CAT result.

Follow-up status is shown in Fig. 1. The median duration of follow-up, from time of

tubal testing to pregnancy or start of treatment was 3.6 months (interquartile range

1.4 - 7.4 months). Of all included couples, 338 (18%) had an ongoing pregnancy

without treatment within 1 year after evaluation of tubal function, including

four multiple pregnancies. In 29 women pregnancy resulted in a miscarriage

23138 Coppus.indd 122 28-09-12 14:55


123

7

and one pregnancy was ectopic, resulting in an overall miscarriage and ectopic

pregnancy rate of 8.6% and 0.3% respectively. A total of 1162 couples (62%) had

started treatment within 12 months; 363 others (19%) neither started treatment

nor became pregnant. The estimated overall fraction of untreated couples with a

treatment independent pregnancy at 12 months of follow-up for all women was

31% (95% CI: 27%-35%).

Consecutive subfertile womenN=7860

Excluded - Anovulation n= 938 - TMC<1*106 n= 636 - previous IVF or tubal surgery n= 196 - previous tubal testing n= 568

Excluded - CAT not measured n= 1569

Follow-up Pregnancy without treatment in 1 year N= 338 (18%)

• Ongoing pregnancy, singleton N= 304 (16.2%) • Ongoing pregnancy, multiple N= 4 (0.2%) • Miscarriage N= 29 (1.5%) • Ectopic pregnancy N= 1 (0.1%)

No spontaneous pregnancy in 1 year N= 1544 (82%)

• Treatment < 12 months N= 1162 (62%) • No conception in 12 months N= 363 (19%) • Lost to follow-up N= 19 (1%)

Subfertile ovulatory women included in cohortN=5522

Excluded - tubal pathology n= 469

Excluded - no tubal testing n= 1602

Ovulatory women with CAT testing and normal tubesN=1882

Available CAT resultN=3953

CAT testing and tubal testingN=2351

FIGURE 1. Flow of patients

23138 Coppus.indd 123 28-09-12 14:55

Chapter 7

124

TABLE I. Baseline characteristics of 1882 couples

CAT positive (n = 438) CAT negative (n = 1444) P-value

Median 5th-95th percentile Median 5th-95th percentile

Female age (year) 34.0 27.0-39.3 32.2 25.2-39.3 <0.001

Male age (year) 36.2 29.1-46.8 34.1 27.5-44.3 <0.001

Duration of subfertility (year) 2.0 1.1-5.2 1.9 1.1-4.1 <0.001

Semen analysis – TMC (*106) 52.2 4.8-284 50.9 3.2-285 0.73

Secondary subfertility 46% 36% <0.001

History of PID 5.3% 1.2% <0.001

History of Chlamydia 14.8% 2.4% <0.001

The Kaplan-Meier curves of women with a positive CAT status and those with a

negative CAT status are shown in Fig. 2. There was a significant difference (P = 0.02).

The proportional hazards assumption had to be rejected for the effect of CAT status

on pregnancy chances over time (P < 0.001). Further analyses revealed that this

deviance from proportionality was not constant over time, so we could not model

the effect with a time-dependent covariate. Based on the loge

[-logeS(t)] versus

loge(t) plots, we came to the conclusion that there was differential censoring after

9 months. We therefore decided to limit the analysis to the first 9 months after

tubal testing. In this time period, positive C. trachomatis IgG serology had a strong

effect on time to natural conception, with 35% lower pregnancy chances (FRR

0.65, 95% CI 0.48-0.87) (Table II).

Multivariable Cox regression analysis showed that the conditional FRR was similar

to the unconditional one, indicating that the lower pregnancy rates in CAT positive

women were also present when known prognostic factors for spontaneous

pregnancy were taken into account. This makes it unlikely that the observed

association can be attributed to a confounding effect of duration of subfertility,

female age and type of subfertility (Table II). As laparoscopy is considered the

reference standard to exclude tubal pathology (NVOG, 2004), we performed a

sub-analysis in which we compared women who had HSG only with those who

underwent laparoscopy, which produced similar results.

23138 Coppus.indd 124 28-09-12 14:55


125

7

cum

ula

tive

on

go

ing

pre

gn

acy

rate

(%)

40

30

20

10

0 0 2 4

time after spontaneous pregnancy (months)6 8 10 12

FIGURE 2. Spontaneous pregnancy rates for CAT IgG negative (upper line) and CAT positive (lower line) without visible tubal pathology.

TABLE II. The effect of C. trachomatis IgG serology on spontaneous pregnancy at 9 months follow-up*

Unconditional Conditional


Positive CAT 0.65 0.48-0.87 <0.001 0.66 0.49-0.89 0.007

Duration of subfertility (per year) 0.77 0.68-0.88 <0.001

Female age (per year) 0.97 0.94-0.99 0.03

Secondary subfertility 1.88 1.49-2.36 <0.001

Results obtained in Cox regression analyses; FRR, fecundity rate ratio; 95% CI, 95% confidence interval; *Analysis limited to first 9 months of follow-up due to differential censoring

Discussion

In this large prospective multicentre study, we observed that subfertile ovulatory

women with a positive CAT result but without visible tubal pathology at HSG or

laparoscopy, had significantly lower spontaneous pregnancy rates in the first 9

months after tubal testing than comparable women with a negative CAT test. This

effect was also observed when other patient characteristics (secondary subfertility,

female age and duration of subfertility) were taken into account. Pregnancy rates

in CAT positive women were about a third lower.

23138 Coppus.indd 125 28-09-12 14:55

Chapter 7

126

Strength of our study is the multicentre design through which a large number of

couples could be included. Although no uniform CAT assay was used in all clinics,

the majority used ELISA assays, which are less laborious and reader-dependent

than MIF. It has been shown that ELISAs in general show less cross-reactivity with

other Chlamydia species than the MIF Biomerieux assay used in the present study

(Gijssen et al., 2001; Land et al., 2003), although a recent study in general showed

better diagnostic performance of MIF compared with ELISA (Broeze et al., 2011).

However, as the CAT assays that were used in this study are all commercially

available and routinely used in hospitals throughout the country, we feel that our

study reflects routine daily practice, thereby improving the generalizability of the

results of the study.

For tubal testing, both HSG and diagnostic laparoscopy were used in the

participating hospitals. In the present study, hospitals were free to use their own

protocol for evaluating tubal status. This generated the practice variation from

which our analysis could benefit. The Dutch guideline on the diagnostic subfertility

work-up (NVOG, 2004) recommends laparoscopy for CAT positive patients, and HSG

or no tubal assessment in case of a negative CAT. Strict adherence to this guideline

would have introduced a bias into our research, but in our study about half of CAT

positive women underwent HSG, whereas 30% of CAT negative women underwent

diagnostic laparoscopy. We acknowledge that, ideally, tubal status in all women

would have been verified by laparoscopy, but feel that such a design would be

hardly feasible for ethical reasons and due to restricted availability as a result of high

costs and limited operating time. Around 10% of women with a normal HSG show

tubal pathology on laparoscopy (Mol et al., 1999; den Hartog et al., 2008; Verhoeve

et al., 2011). Taking into account that more CAT negative women underwent tubal

testing with HSG than did CAT positive women, the effect of CAT found in the

present study cannot be attributed to differential verification. Instead, it is because

of this variability in the use of HSG and laparoscopy that we were able to evaluate

the effect of CAT positivity on pregnancy chances. In addition to this differential

verification, women who were tested with CAT were only partially verified. In the

present study, 46% of women with an available CAT result did not undergo invasive

tubal testing, of which 79% were CAT negative. The present study addressed

women in whom pathology had been ruled out, but had different CAT status. We

therefore believe that the main impact of the partial verification was a reduction in

the precision, rather than a bias of fecundity rate estimates.

23138 Coppus.indd 126 28-09-12 14:55


127

7

In the present study we tried to control for confounding effects of tubal pathology

by limiting the analysis to those women without tubal abnormalities on HSG

and/or laparoscopy. In addition, the confounding effect of the use of assisted

reproductive technologies was eliminated by censoring time to pregnancy

whenever treatment was started. Unfortunately, we were not able to analyse CAT

titres as a continuous or categorical variable, taking into account the quantitative

nature of this test, as the majority of clinics used an ELISA assay of which the results

are reported as either positive or negative and no titres are available. Nevertheless,

cut-off values advocated in the guideline of the Dutch Society for Obstetrics and

Gynaecology were used, i.e. a titre of ≥1:32 in case of MIF and an index of >1.1 in

case of ELISA. We feel this approach reflects daily clinical practice, where the CAT

result is usually interpreted by the clinician as either being positive or negative.

Our results are in contrast with that of three other studies, which found no

significant effect of C. trachomatis antibodies on pregnancy rates (Idahl et al., 2004;

Perquin et al., 2007; El Hakim et al., 2009). The most recent study (El Hakim et al.,

2009) explored the relation between CAT titres and probability of pregnancy in

women with normal-looking Fallopian tubes at laparoscopy. However, with a total

sample size of 174 women, the power of the study to detect an association was

limited, due to the formation of multiple subgroups, including endometriosis.

Idahl and colleagues, studied 244 women followed for a mean of 37 months, but

unfortunately their analysis did not take into account time to pregnancy (Idahl et

al., 2004). Instead, pregnancy was evaluated as a dichotomous outcome, ignoring

the fact that fecundity is a gradual continuum based on time to pregnancy.

Furthermore, the analysis did not correct for the presence of tubal pathology

and type of pregnancy (spontaneous or treatment-related). The third study,

using data of 153 women who participated in a randomized controlled trial,

evaluated the prognostic significance of CAT on cumulative pregnancy rates at

18 months follow-up (Perquin et al., 2007). No statistically significant difference

in cumulative pregnancy rates was found (FRR 0.73, 95% CI 0.42-1.25). While this

study used a time-to-event analysis, the interpretation of its results is hampered

by heterogeneity in outcomes, as the authors did not solely evaluate treatment

independent pregnancy rates, but also included pregnancies achieved after intra-

uterine insemination, IVF, tubal surgery and other therapeutic options.

23138 Coppus.indd 127 28-09-12 14:55

Chapter 7

128

The results of our study are in accordance with a previous study, which showed in

170 studied women that the number of treatment independent pregnancies was

lower with increasing CAT MIF titres (Kelz et al., 2006). However, a drawback of that

study is the fact that it did not correct for the effects of tubal pathology visible at

HSG or laparoscopy related to a previous Chlamydia infection. As higher MIF titres

are associated with a higher likelihood and severity of tubal disease (Akande et al.,

2003), it is possible that an indirect association of titres on spontaneous pregnancy

rates was measured; an association which in fact was due to the presence of tubal

pathology.

We found a marked difference in effect of CAT positivity on pregnancy rates at 9

months of follow-up. This difference might be explained by the increasing duration

of subfertility in those couples who did not conceive, which has been shown to be

associated with a decrease in spontaneous pregnancy rates (van der Steeg et al.,

2007). It might be that the remaining group of women at a later time in follow-up

differs from the total cohort at the beginning of follow-up, as it can be hypothesized

that when pregnancy has not occurred after 3 or 4 years, spontaneous pregnancy

chances are low, due to a preponderance of severe -undefined- pathology that

overrules other test results, such as the CAT that we investigated. Alternatively,

the marked difference between pregnancy rates in CAT positive and CAT negative

women at 9 months of follow-up could be explained by selection bias due to

non-random censoring, i.e. higher proportions of CAT positive couples are starting

treatment.

Two hypotheses could explain why CAT seropositivity influences pregnancy rates

negatively even in case of patent tubes. First, it is known that C. trachomatis may

persist in the upper female genital tract and this persistent exposure to the micro-

organism could result in a chronic inflammatory response that is responsible

for damage to the Fallopian tubes (Patton et al., 1994; Den Hartog et al., 2005). It

has been suggested that these persistent C. trachomatis infections also elicit an

autoimmune response to human heat shock proteins (HSPs) due to their structural

similarity with Chlamydia HSP (Neuer et al., 1997; Witkin, 1999). Human HSPs play

an important role in early pregnancy (Witkin, 2002), and animal as well as human

research indicates that autoimmunity to human HSP exerts a negative influence

on embryo development and implantation (Witkin et al., 1994; Witkin 1999, Neuer

et al., 2000). From this it has been hypothesized that women with Chlamydia

23138 Coppus.indd 128 28-09-12 14:55


129

7

IgG antibodies, as marker of a previous infection, might be at risk for impaired

implantation and lower pregnancy rates. Whether women who do not develop

tubal pathology after Chlamydia infection are able to clear the micro-organism

from their genital tracts before tubal pathology and an autoimmune response to

human HSPs are elicited remains to be elucidated. Some research has focused on

persistent endometrial infections that might either affect embryonic development

or the implantation capacity of the endometrium, leading to implantation failure

(Kamiyama et al., 2004; Romero et al., 2004; Den Hartog 2010). A second hypothesis

to explain lower fecundity in CAT positive women without visible tubal pathology,

is that intratubal microdamage may have resulted from a previous Chlamydia

infection that cannot be detected with conventional patency tests such as HSG

or laparoscopy. In addition, these tests are known to be imperfect, with some

between-reader variability, but it is unlikely that this would explain the substantially

lower pregnancy rates in CAT positive women.

We showed that in 9 months follow-up subfertile women with a positive CAT

test without visible tubal pathology on HSG or laparoscopy have a 33% lower

probability of pregnancy as compared to CAT negative women. This finding could

be useful in counselling subfertile couples on their spontaneous fertility prospects.

If our findings hold in external validation, CAT has to be incorporated in future

prognostic models on spontaneous pregnancy. Our finding demonstrates that

even in absence of tubal pathology, decreased fecundity is a late effect of lower

genital tract Chlamydia infections. This might increase the late costs of Chlamydia

infections, and as such alter the cost-effectiveness of Chlamydia screening

(Land et al., 2010). CAT apparently does not only have diagnostic value, but also

gives valuable additional prognostic information on the probability to conceive

spontaneously, even after tubal testing has shown no visible tubal pathology.

23138 Coppus.indd 129 28-09-12 14:55

Chapter 7

130

References

Akande VA, Hunt LP, Cahill CJ, Caul EO, Ford WC, Jenkins JM. Tubal damage in infertile women: prediction using chlamydia serology. Hum Reprod 2003; 18:1841-1847.

Broeze KA, Opmeer BC, Coppus SF, van Geloven N, Alves MF, Anestad G, Bhattacharya S, Allen J, Guerra-Infante MF, den Hartog JE et al. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011; 17:301-310.

Claman P, Amimi MN, Peeling RW, Toye B, Jessamine P. Does serologic evidence of remote Chlamydia trachomatis infection and its heat shock protein (CHSP60) affect in vitro fertilization-embryo transfer outcome? Fertil Steril 1996; 65:146-149.

Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007; 22:1353-1358.

Coppus SF, Verhoeve HR, Opmeer BC, van der Steeg JW, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW. Identifying subfertile ovulatory women for timely tubal patency testing: a clinical decision rule based on medical history. Hum Reprod 2007;22:2685-2692.

Den Hartog JE, Land JA, Stassen FR, Kessels AG, Bruggeman CA. Serological markers of persistent C. trachomatis infections in women with tubal factor subfertility. Hum Reprod 2005; 20:986-990.


Den Hartog JE. Academic Thesis, Maastricht University 2010.

El Hakim EA, Epee M, Draycott T, Gordon UD, Akande VA. Significance of positive Chlamydia serology in women with normal-looking Fallopian tubes. Reprod Biomed Online 2009; 19:847-851.

Gijssen AP, Land JA, Goossens VJ, Leffers P, Bruggeman CA, Evers JL. Chlamydia pneumoniae and screening for tubal factor subfertility. Hum Reprod 2001; 16:487-491.

Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, te Velde ER. Two new prediction rules for spontaneous pregnancy leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004, 19:2019-2026.

Idahl A, Boman J, Kumlin U, Oloffson JI. Demonstration of Chlamydia trachomatis IgG antibodies in the male partner of the infertile couple is correlated with a reduced likelihood of achieving pregnancy. Hum Reprod 2004; 19:1121-1126.

Kamiyama S, Teruya Y, Nohara M, Kanazawa K. Impact of detection of bacterial endotoxin in menstrual effluent on the pregnancy rate in in vitro fertilization and embryo transfer. Fertil Steril 2004; 82:788-792.

Keltz MD, Gera PS, Moustakis M. Chlamydia serology screening in infertility patients. Fertil Steril 2006; 85:752-754.

Land JA, Gijsen AP, Kessels AG, Slobbe ME, Bruggeman CA. Performance of five serological Chlamydia antibody tests in subfertile women. Hum Reprod 2003; 18:2621-2627.

Land JA, van Bergen JE, Morré SA, Postma MJ. Epidemiology of Chlamydia trachomatis infection in women and the cost-effectiveness of screening. Hum Reprod Update 2010; 16:189-204.

Lunenfeld E, Shapiro BS, Sarov B, Sarov I, Insler V, Decherney AH. The association between chlamydial-specific IgG and IgA antibodies and pregnancy outcome in an in vitro fertilization program. J In Vitro Fert Embryo Transf 1989; 6:222-227.


23138 Coppus.indd 130 28-09-12 14:55


131

7

Nederlandse Vereniging voor Obstetrie en Gynaecologie. Orienterend fertiliteitsonderzoek [Dutch Society for Obstetrics and Gynaecology: guideline diagnostic fertility workup] 2004.

Neuer A, Lam KN, Tiller FW, Kiesel L, Witkin SS. Humoral immune response to membrane components of Chlamydia trachomatis and expression of human 60 kDa heat shock protein in follicular fluid of in-vitro fertilization patients. Hum Reprod 1997; 12:925-929.

Neuer A, Spandorfer SD, Giraldo P, Dieterle S, Rosenwaks Z, Witkin SS. The role of heat shock proteins in reproduction. Hum Reprod Update 2000; 6:149-159.

Osser S, Persson K, Wramsby H, Liedholm P. Does previous Chlamydia trachomatis infection influence the pregnancy rate of in vitro fertilization and embryo replacement? Am J Obstet Gynecol 1990; 162:40-44.

Paavonen J and Eggert-Kruse W. Chlamydia trachomatis: impact on human reproduction. Hum Reprod Update 1999; 5:433-447.

Pacchiarotti A, Sbracia M, Mohamed MA, Frega A, Pacchiarotti A, Espinola SM, Aragona C. Autoimmune response to Chlamydia trachomatis infection and in vitro fertilization outcome. Fertil Steril 2009; 91:946-948.

Patton DL, Cosgrove Sweeney YT, Kuo CC. Demonstration of delayed hypersensitivity in Chlamydia trachomatis salpingitis in monkeys: a pathogenic mechanism of tubal damage. J Infect Dis 1994; 169:680-683.

Perquin DA, Beersma MF, de Craen AJ, Helmerhorst FM. The value of Chlamydia trachomatis-specific IgG antibody testing and hysterosalpingography for predicting tubal pathology and occurrence of pregnancy. Fertil Steril 2007; 88:224-226.

Romero R, Espinoza J, Mazor M. Can endometrial infection/inflammation explain implantation failure, spontaneous abortion, and preterm birth after in vitro fertilization? Fertil Steril 2004; 82:799-804.

Rowland GF, Forsey T, Moss TR, Steptoe PC, Hewitt J, Darougar S. Failure of in vitro fertilization and embryo replacement following infection with Chlamydia trachomatis. J In Vitro Fert Embryo Transf 1985; 2:151-155.

Sharara FI, Queenan JT, Springer RS, Marut EL, Scoccia B, Scommegna A. Evaluated serum Chlamydia trachomatis IgG antibodies. What do they mean for IVF pregnancy rates and loss? J Reprod Med 1997; 42:281-286.

Spandorfer SD, Neuer A, LaVerda D, Byrne G, Liu HC, Rosenwaks Z, Witkin SS. Previously undetected Chlamydia trachomatis infection, immunity to heat shock proteins and tubal occlusion in women undergoing in-vitro fertilization. Hum Reprod 1999; 14:60-64.

Tasdemir I, Tasdemir M, Kodama H, Sekine K, Tanaka T. Effect of chlamydial antibodies on the outcome of in vitro fertilization (IVF) treatment. J Assist Reprod Genet 1994; 11:104-106.

Van der Steeg JW, Steures P, Eijkemans MJ, Habbema JD, Hompes PG, Broekmans FJ, van Dessel HJ, Bossuyt PM, van der Veen F, Mol BW. Pregnancy is predictable: a large-scale prospective external validation of the prediction of spontaneous pregnancy in subfertile. Hum Reprod 2007; 22:536-342.

Verhoeve HR, Coppus SF, van der Steeg JW, Steures P, Hompes PG, Bourdrez P, Bossuyt PM, van der Veen F, Mol BW. The capacity of hysterosalpingography and laparoscopy to predict natural conception. Hum Reprod 2011, 26:134-142.

Witkin SS, Sultan KM, Neal GS, Jeremias J, Grifo JA, Rosenwaks Z. Unsuspected Chlamydia trachomatis infection and in vivo fertilization outcome. Am J Obstet Gynecol 1994; 171:1208-1214.

Witkin SS. Immunity to heat shock proteins and pregnancy outcome. Infect Dis Obstet Gynecol 1999; 7:35-38.

Witkin SS. Immunological aspects of genital clamydia infections. Best Pract Res Clin Obstet Gynaecol 2002; 16:865-874.

23138 Coppus.indd 131 28-09-12 14:55

23138 Coppus.indd 132 28-09-12 14:55

Part 3Discussion, summary and epilogue

23138 Coppus.indd 133 28-09-12 16:57

23138 Coppus.indd 134 28-09-12 14:55

8General discussion

23138 Coppus.indd 135 28-09-12 14:55

Chapter 8

136

It is estimated that tubal pathology accounts for subfertility in around 10 to 30

percent of couples (Evers 2002). The subfertility work-up is therefore usually

ended with invasive tubal patency testing, such as hysterosalpingography (HSG)

or laparoscopy with tubal testing, of which the latter is still regarded as the gold

standard for diagnosing tubal pathology. In the late nineties, 89% of all fertility

specialists in the USA routinely performed diagnostic laparoscopy in the fertility

work-up (Glatstein 1997). The majority of diagnostic laparoscopies will not reveal

any abnormality (Forman et al., 1993), as a result of which its role in current fertility

practice is heavily debated (Tanahatoe et al., 2003; Lavy et al., 2004; Tanahatoe et al.,

2005; Bosteels et al., 2008).

As laparoscopy is an expensive investigation, and not without risk of complications

(Jansen et al., 1997; Chapron et al., 1998; Härkki-Siren et al., 1999), some authors

have searched for less invasive alternatives like hysterosalpingo-contrast-

sonography (Schlief and Deichert, 1991) or transvaginal hydrolaparoscopy (Gordts

et al., 1998). However, these newer test have not gained widespread use in daily

clinical practice, as a result of which most clinicians still use hysterosalpingography

and/or diagnostic laparoscopy to conclude the fertility work-up. The reasons for

this conservatism are unclear, but it might be due to uncertainty about the validity

of published reports on such test alternatives (Chapter 2).

Other groups have searched for better screening strategies to only subject

those women with a high probability of finding tubal abnormalities to invasive

tubal testing. The majority of this work has focused on the value of Chlamydia

trachomatis antibody (CAT) testing (Mol et al., 1997; den Hartog et al., 2006;

Broeze et al., 2011). CAT testing is not routinely performed worldwide, and is not

a recommended investigation in international guidelines on the fertility work-up.

Some authors have explored the predictive value of medical history taking, and

concluded that medical history is of limited use for a policy of restricted use of

laparoscopy (Johnson et al., 2000; Hubacher et al., 2004). This conclusion has been

disputed by others (Thomas et al., 2001; Donnez and Jadoul, 2004; McComb, 2004;

Tulandi and Platt, 2004).

23138 Coppus.indd 136 28-09-12 14:55

General discussion

137

8

Diagnostic aspects of tubal testing

For the work presented in this thesis, we decided to go back to the basis and

explore the grounds on which guidelines advocate medical history taking for

selecting women for invasive tubal testing (Luttjeboer et al., 2009). We found

that a history of complicated appendicitis, pelvic surgery, ectopic pregnancy and

endometriosis are strong risk indicators for tubal pathology in subfertile women.

Due to limitations related to conventional meta-analytic techniques (Broeze et al.,

2009), we were unable to estimate exact probabilities of tubal pathology, and to

correct for dependency.

Therefore, we developed two prediction models for (severe) tubal pathology that

express the probability of tubal pathology, and correct for mutual dependence

(Chapters 4 and 5). While the first model was derived in a relatively small data set,

and used as a pilot study for the development of the large-scale model, all factors

from this model were also present in the larger model. We showed for any tubal

pathology, that a prolonged duration of subfertility, a pregnancy of both partners

or of only the male partner in a previous relationship, male smoking habits,

a history of PID, history of Chlamydia infection, history of ectopic pregnancy, a

history of legally induced abortion and previous pelvic surgery are all factors that

increase the likelihood of tubal pathology. Clinically more relevant is the model for

bilateral tubal pathology, for which it was shown that referral by a gynaecologist,

female age above 32 years, dysmenorrhoea and smoking of the woman increase

the likelihood of disease, as does a longer duration of subfertility, a pregnancy

of both partners in a previous relationship, male smoking habits, history of PID,

history of Chlamydia infection, ectopic pregnancy and previous pelvic surgery. In

contrast, a previous pregnancy in the current relationship and a higher male age

lower the probability of bilateral tubal pathology.

The association of male characteristics with female tubal disease was remarkable,

as male age, male smoking habits, as well as having fathered a pregnancy in a

previous relationship cannot causally be related to tubal damage in a woman.

Possibly these male characteristics act as indicators for some unrecorded variables,

such as number of sexual partners or socio-economic status, or they reflect lifestyle

and selection mechanisms.

23138 Coppus.indd 137 28-09-12 14:55

Chapter 8

138

Although both models are able to make an almost perfect distinction between

women with a low and high probability of tubal pathology, our models have not

been externally validated yet, so caution has to be taken to apply them already in daily

clinical practice. Reassuringly though, a recent study from Israel confirmed several of

the risk factors identified in our study, including increasing age, secondary subfertility,

and male smoking habits (Farhi et al., 2011). Our findings were also confirmed in an

individual patient data-meta analysis (Broeze et al., 2012), which showed a similar

predictive performance for medical history taking as did our models. Interestingly, this

study also showed that both CAT and HSG independently improve the performance of

the model when compared with the results of diagnostic laparoscopy.

Prognostic aspects of tubal testing

It is our opinion that medical history taking is certainly of value in selecting those

women that might benefit from early tubal testing. The next question is which

tubal test should then be applied. As the purpose of the fertility work-up has shifted

from a purely diagnostic question to a prognostic orientation, it is not so much the

question which tubal test is the most accurate, but to what extent an abnormal

test result interferes with chances of natural conception. To answer this question

we related findings at HSG and laparoscopy to spontaneous pregnancy (Chapter

6). We found that women with unilateral tubal pathology at HSG or laparoscopy

have a moderate, non-significant reduction in spontaneous pregnancy chances,

whereas those with bilateral tubal pathology at HSG or laparoscopy have a

severe reduction in the ability to achieve natural conception. This reduction in

fertility prospects was similar for HSG and laparoscopy, suggesting that HSG and

laparoscopy have a comparable predictive capacity for natural conception.

It is noteworthy that unilateral pathology is of only minor prognostic relevance,

confirming that the goal of tubal testing should be to detect women with bilateral

tubal pathology, so that they can be treated with IVF. Studies on the prediction

of pregnancy after intra-uterine insemination also failed to show a statistically

significant impact of unilateral tubal pathology on pregnancy rates (Steures et al.,

2004). In addition, no evidence-based treatment policies are known for women

affected by unilateral tubal pathology. In case of unilateral abnormalities on

HSG, one could consider several options, e.g. one could perform a diagnostic

23138 Coppus.indd 138 28-09-12 14:55

General discussion

139

8

laparoscopy. In 95% of cases though, laparoscopic findings will not change the

HSG-based treatment plan (Lavy et al., 2004). Alternatively, one could aim to

increase pregnancy chances with IUI with mild ovarian stimulation or bypass the

Fallopian tube with IVF. We found only one case-control study in the literature,

which evaluated IUI/COH in women with unilateral tubal pathology on HSG as

compared to IUI/COH in women with unexplained subfertility. Similar pregnancy

rates were found in both groups (Farhi et al., 2007). One of the studies that still

needs to be performed therefore is a randomized controlled trial comparing

expectant management, IUI/COH and IVF in women with unilateral tubal

pathology, evaluating pregnancy rates and costs.

Currently, after invasive tubal testing has shown bilateral patent tubes, the result

of CAT testing is generally ignored (Chapter 7). We questioned whether evidence

of a past Chlamydia infection affects the probability of spontaneous pregnancy in

women without visible tubal pathology, and found that subfertile women with

a positive CAT test but without visible tubal pathology on HSG or laparoscopy

have a 33% lower probability of pregnancy as compared to CAT negative women.

Results from previous studies on this topic are conflicting, which urges the need for

external validation of our findings. If confirmed, it would imply a new prognostic

factor that could be incorporated when developing updated prognostic models

for treatment independent pregnancy.

Ultimate value of tubal testing

Although much effort is being put in evaluating accuracy of different screening

strategies, the main question is whether ultimately tubal testing will increase the

likelihood of conception. For laparoscopy, there is no evidence that the procedure

itself increases this likelihood (Al-Fadhli et al., 2006; Luttjeboer et al., 2007). For

hysterosalpingography, there is some evidence that performing the procedure

with oil-soluble contrast medium increases the probability of natural conception

when compared with water-soluble contrast medium. However, the evidence is

based on aggregated data of two trials, where statistical heterogeneity was present

and the higher quality trial failed to show a significant difference. Currently, the

H2Oil study is recruiting women to answer the question whether we should use

oil-based contrast medium during a diagnostic HSG (H2Oil trial number: NTR3270).

23138 Coppus.indd 139 28-09-12 14:55

Chapter 8

140

One randomized controlled trial examined the value of a standard laparoscopy

after normal hysterosalpingography before start of intrauterine insemination

and found that a standard laparoscopy with therapeutic interventions does not

increase the likelihood of conception (Tanahatoe et al., 2005). Another randomized

study showed that early tubal testing did not increase spontaneous pregnancy

rates if compared with delayed tubal testing (Lindborg et al., 2009). Whether a

tubal test strategy should start with hysterosalpingography or whether this test

can be abandoned was evaluated in a randomised trial (Perquin et al., 2006). In

this study the fertility workup was completed with either HSG with water-based

contrast or immediate laparoscopy. In case of no abnormalities at HSG, women

were managed expectantly for 6 months. In case of no pregnancy after this time

period, women underwent a laparoscopy. In case of an abnormal HSG, women

underwent laparoscopy within one to two months. At 18 months of follow-up, no

differences in pregnancy rates between both strategies were noted. The authors

concluded that HSG has no role in the fertility work-up , but we criticized this study

due to an inadequate methodological design (as no specific treatment strategies

followed a specific test result) and showed that a strategy starting with HSG results

in a 30% reduction in the number of laparoscopies (Coppus et al., 2006). Although

more and more authors advocate that HSG is a useless test in modern fertility

practice (den Hartog et al., 2008; Lim et al., 2011), we feel that given the current

evidence HSG still deserves a prominent role in the fertility workup.

In conclusion, a difference in outcome between different tubal testing strategies

can currently only be made by referring women with bilateral tubal pathology for

IVF early. To end the debate on whether we should perform tubal testing, and if

so, with which tests, a large scale multilevel randomized trial is needed, comparing

different screening strategies with explicit treatment rules in case of abnormal test

results. Only by performing such a study, the true value and cost-effectiveness of

tubal testing can be revealed.

23138 Coppus.indd 140 28-09-12 14:55

General discussion

141

8

ReferencesAl-Fadhli R, Sylvestre C, Buckett W, Tulandi T. A randomized study of laparoscopic chromopertubation with lipiodol versus saline in infertile women. Fertil Steril 2006; 85:505-507.


Broeze KA, Opmeer BC, Bachmann LM, Broekmans FJ, Bossuyt PM, Coppus SF, Johnson NP, Khan KS, ter Riet G, van der Veen F, van Wely M, Mol BW. Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine. BMC Med Res Methodol 2009; 27:9:22.

Broeze KA, Opmeer BC, Coppus SF, van Geloven N, Alves MF, Anestad G, Bhattacharya S, Allen J, Guerra-Infante MF, den Hartog JE, Land JA, Idahl A, van der Linden PJ, Mouton JW, Ng EH, van der Steeg JW, Steures P, Svenstrup HF, Tiitinen A, Toye B, van der Veen F, Mol BW. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011; 17:301-310.

Broeze KA, Opmeer BC, Coppus SF, van Geloven N, den Hartog JE, Land JA, van der Linden PJ, Ng EH, van der Steeg JW, Steures P, van der Veen F, Mol BW. Integration of patient characteristics and the results of Chlamydia antibody testing and hysterosalpingography in the diagnosis of tubal pathology: an individual patient data meta-analysis. Hum Reprod, 2012; 27;2979-2790.

Chapron C, Querleu D, Bruhat MA, Madelenat P, Fernandez H, Pierre F, Dubuisson JB. Surgical complications of diagnostic and operative gynaecological laparoscopy: a series of 29,966 cases. Hum Reprod 1998; 13:867-72.

Coppus SF, Opmeer BC, van der Veen F, Bossuyt PM, Mol BW. Routine use of hysterosalpingography prior to diagnostic laparoscopy in the fertility workup. Hum Reprod 2006; 21:2726-2727.

Den Hartog JE, Morré SA, Land JA. Chlamydia trachomatis-associated tubal factor subfertility: Immunogenetic aspects and serological screening. Hum Reprod Update 2006; 12:719-730.


Donnez J, Jadoul P. Taking a history in the evaluation of infertility: obsolete or venerable tradition? Fertil Steril 2004; 81:16-17.

Evers J. Female subfertility. Lancet 2002; 360: 151-159.

Farhi J, Ben-Haroush A, Lande Y, Fisch B. Role of treatment with ovarian stimulation and intrauterine insemination in women with unilateral tubal occlusion by hysterosalpingography. Fertil Steril 2007; 88:396-400.

Farhi J, Homburg R, Ben-Haroush A. Male factor infertility may be associated with a lower risk for tubal abnormalities. Reprod Biomed Online 2011; 22:335-340.

Forman RG, Robinson JN, Mehta Z, Barlow DH. Patient history as a simple predictor of pelvic pathology in subfertile women. Hum Reprod 1993; 8:53-55.

Glatstein IZ, Harlow BL, Hornstein MD. Practice patterns among reproductive endocrinologists: the infertility evaluation. Fertil Steril 1997; 67:443-51.

Gordts S, Campo R, Rombauts L, Brosens I. Transvaginal hydrolaparoscopy as an outpatient procedure for infertility investigation. Hum Reprod 1998; 13:99-103.

H2Oil study: www.studies-obsgyn.nl/h2olie

23138 Coppus.indd 141 28-09-12 14:55

Chapter 8

142

Härkki-Siren P, Sjöberg J, Kurki T. Major complications of laparoscopy: a follow-up Finnish study. Obstet Gynecol 1999; 94:94-8.

Hubacher D, Grimes D, Lara-Ricalde R, de la Jara J, Garcia-Luna A. The limited clinical usefulness of taking a history in the evaluation of women with tubal factor infertility. Fertil Steril 2004; 81:6-10.

Jansen FW, Kapiteyn K, Trimbos-Kemper T, Hermans J, Trimbos JB. Complications of laparoscopy: a prospective multicentre observational study. BJOG 1997; 104:595-600.

Johnson NP, Taylor K, Nadgir AA, Chinn DJ, Taylor PJ. Can diagnostic laparoscopy be avoided in routine investigation for infertility? BJOG 2000; 107:174-178.

Land JA, Evers JL, Goossens VJ. How to use Chlamydia antibody testing in subfertility patients. Hum Reprod 1998; 13:1094-1098.

Lavy Y, Lev-Sagie A, Holtzer H, Revel A, Hurwitz A. Should laparoscopy be a mandatory component of the infertility evaluation in infertile women with normal hysterosalpingogram or suspected unilateral distal tubal pathology? Eur J Obstet Gynecol Reprod Biol 2004; 114:64-68.

Lim CP, Hasafa Z, Bhattacharya S, Maheshwari A. Should a hysterosalpingogram be a first-line investigation to diagnose female tubal subfertility in the modern fertility workup? Hum Reprod 2011; 26:967-971.

Lindborg L, Thorburn J, Bergh C, Strandell A. Influence of HyCoSy on spontaneous pregnancy: a randomized controlled trial. Hum Reprod 2009; 24: 1075-1079.

Luttjeboer FY, Verhoeve HR, van Dessel HJ, van der Veen F, Mol BW, Coppus SF. The value of medical history taking as risk indicator for tuboperitoneal pathology: a systematic review. BJOG 2009; 116:612-625.

Luttjeboer F, Harada T, Hughes E, Johnson N, Lilford R, Mol BW. Tubal flushing for subfertility. Cochrane Database Syst Rev 2007; 18: CD003718.

McComb PF. Clinical history is the essence of medicine. Fertil Steril 2004; 81:13-15.

Mol BW, Dijkman B, Wertheim P, Lijmer J, van der Veen F, Bossuyt PM. The accuracy of serum chlamydial antibodies in the diagnosis of tubal pathology: a meta-analysis. Fertil Steril 1997; 67:1031-7.

Perquin DA, Dörr PJ, de Craen AJ, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomized controlled trial. Hum Reprod 2006; 21:1227-1231.

Schlief R, Deichert U. Hysterosalpingo-contrast sonography of the uterus and fallopian tubes: results of a clinical trial of a new contrast medium in 120 patients. Radiology 1991; 178:213-5.

Steures P, van der Steeg JW, Mol BW, Eijkemans MJ, van der Veen F, Habbema JD, Hompes PG, Bossuyt PM, Verhoeve HR, van Kasteren YM, van Dop PA. Prediction of an ongoing pregnancy after intrauterine insemination. Fertil Steril 2004; 82:45-51.

Tanahatoe SJ, Hompes PG, Lambalk CB. Investigation of the infertile couple: should diagnostic laparoscopy be performed in the infertility work up programme in patients undergoing intrauterine insemination? Hum Reprod 2003; 18:8-11.

Tanahatoe SJ, Lambalk CB, Hompes PG. The role of laparoscopy in intrauterine insemination: a prospective randomized reallocation study. Hum Reprod 2005; 20:3225-3230.

Thomas DK, Mannion P, Coughlin L. Can diagnostic laparoscopy be avoided in routine investigation for infertility? BJOG 2001; 108:127-128.

Tulandi T, Platt R. The art of taking a history. Fertil Steril 2004; 81:11-12.

23138 Coppus.indd 142 28-09-12 14:55

23138 Coppus.indd 143 28-09-12 14:55

23138 Coppus.indd 144 28-09-12 14:55

9Summary

23138 Coppus.indd 145 28-09-12 14:55

Chapter 9

146

Subfertility is customarily defined as failure to conceive after regular unprotected

sexual intercourse for one year. The prevalence of subfertility is around 14%,

affecting one in seven couples. It has been estimated that in around 10-30% of

these couples, subfertility is due to tubal pathology. These tubal abnormalities

include tubal obstruction and pelvic adhesions which can be due to previous

infection, endometriosis, or previous surgery. To rule in or rule out this pathology, the

fertility work-up in subfertile couples is usually concluded with tubal patency testing.

Currently, laparoscopy is considered to be the best available test for diagnosing tubal

abnormalities, but it has several drawbacks. First, it is an invasive surgical procedure

that requires general anaesthesia, both of which carry associated risks. Secondly, as

an expensive investigation that requires operating time and dedicated personnel,

its availability is not unlimited. Therefore, triage is needed to limit the number of

unnecessary laparoscopies, while maintaining a high diagnostic yield. Nationally and

internationally, a large practice variation exists in the timing of testing, the test used,

and the basis on which women are selected for tubal testing. While some guidelines

advocate medical history taking to triage women for tubal patency testing, others

advocate Chlamydia antibody testing, or testing in all women. At the start of this

project, test research on which these guideline are based on, had been conducted

in isolation from its clinical context.

The work presented in the studies in this thesis started with a focus on

methodological issues in diagnostic and prognostic research in general. Hereafter,

we researched multivariable diagnostic models, to estimate pretest probabilities

for tubal pathology that can help to distinguish women with a high and low

probability of (severe) tubal disease at first consultation, and explored the

added value of Chlamydia antibody testing to medical history taking. Finally, we

investigated the prognostic significance of various tubal pathology tests like HSG,

laparoscopy and CAT testing.

Chapter 1 gives an outline and describes the objectives of this thesis.

Chapter 2 describes the results of a study in which we evaluated the impact of

publication of the Standards for Reporting of Diagnostic Accuracy Studies (STARD)

statement on the quality of reporting of test accuracy studies in two leading

journals in the field of reproductive medicine. 24 articles published in 1999 (pre-

STARD) and 27 published in 2004 (post-STARD) that reported on the diagnostic

23138 Coppus.indd 146 28-09-12 14:55

Summary

147

9

performance of a test in comparison with a reference standard were included.

Each article was scored on a 25 items checklist, dealing with the study question,

study participants, study design, test methods, reference standard, statistical

methods, reporting of results and conclusions. The mean number of reported

STARD items for articles published in 1999 was 12.1 ± 3.3 (range 6.5-20) and 12.4

± 3.2 (range 7-17.5) in 2004, after publication of the STARD statement. Overall, less

than half of the studies reported adequately on 50% or more of the STARD items.

The reporting of individual items showed a wide variation. There was no significant

improvement in mean number of reported items for the articles published after

the introduction of the STARD statement. From these data we concluded that

authors of test accuracy studies in the two leading fertility journals poorly reported

the design, conduct, methodology and statistical analysis of their study, which

hampers the ability to interpret these studies on their scientific soundness. Strict

adherence to the STARD guidelines should therefore be encouraged.

Chapter 3 presents an opinion article on methodological considerations when

evaluating prediction models in reproductive medicine. Such models are used

to to calculate the probability of pregnancy without treatment, as well as the

probability of pregnancy after ovulation induction, intrauterine insemination or

in vitro fertilization. The performance of such prediction models is often evaluated

with a receiver operating characteristic (ROC) curve. The area under the ROC

curve, also known as c-statistic, is then used as a measure of model performance.

However, the value of this c-statistic is low for most prediction models in

reproductive medicine. We demonstrate that low values of the c-statistic are to be

expected in these prediction models, but we also show that this does not imply

that these models are of limited use in clinical practice. Arguments are given why

the calibration of the model which is the correspondence between model-based

probabilities and observed pregnancy rates, the availability of a clinically useful

distribution of probabilities, the ability of a model to correctly classify couples for

treatment decisions, and reclassification indices are more meaningful concepts for

model evaluation.

Chapter 4 presents an exploratory investigation on whether information

obtained from medical history taking can be quantified in a probabilistic model.

We also explored whether CAT testing has additional value to medical history

taking, given differences in prior probabilities. To answer these questions, data

23138 Coppus.indd 147 28-09-12 14:55

Chapter 9

148

of 207 consecutive subfertile women were used to create multivariable logistic

regression models for the prediction of tubal disease as diagnosed by diagnostic

laparoscopy. The model with data of medical history only had an area under the

receiver operating characteristic curve (AUC) of 0.65 (95% CI 0.56-0.74). Addition of

CAT increased the AUC to 0.70 (95% CI 0.62-0.78) (P=0.065). CAT was positive in 40

women and showed a sensitivity of 0.37 (95% CI 0.26-0.49) for a specificity of 0.88

(95% CI 0.82-0.93). In CAT positive women a blank medical history did not decrease

the probability of tubal disease. Of the 167 women tested CAT negative, 23 (14%)

still had a high probability of disease due to their medical history and 11 of them

(48%) showed tubal abnormalities on diagnostic laparoscopy. From these data we

concluded that CAT testing adds valuable information to a woman’s risk profile

based on her medical history. The combination of medical history taking and CAT

testing has a better yield for diagnosing tubal disease than either of these alone.

Chapter 5 investigates the ability of medical history taking alone to predict tubal

pathology at first consultation of the subfertile couple, as the ultimate aim of tubal

testing is to identify women with bilateral tubal pathology in a timely manner,

so they can be treated with IVF or tubal surgery. For this purpose, data of 3716

women who underwent tubal patency testing as part of their routine fertility

workup, and participated in a nationwide cohort study were used. Elements

from their medical history were related to the presence of tubal pathology. With

multivariable logistic regression, we constructed two diagnostic models. In the first

model tubal disease was defined as occlusion and/or severe adhesions of at least

one tube, whereas in a second model tubal disease was defined as the presence

of bilateral abnormalities. Both models discriminated moderately well between

women with and women without tubal disease with an area under the receiver-

operating characteristic curve (AUC) of 0.65 (95% CI: 0.63-0.68) for any tubal

pathology and 0.68 (95% CI: 0.65-0.71) for bilateral tubal pathology, respectively.

However, the models could make an almost perfect distinction between women

with a high and a low probability of tubal pathology. A decision rule in the form

of a simple diagnostic score chart was developed for application of the models in

clinical practice. This decision rule could be used to select women for diagnostic

laparoscopy more efficiently.

Chapter 6 explores the prognostic capacity of hysterosalpingography and

laparoscopy in a general subfertile population. From data that were prospectively

23138 Coppus.indd 148 28-09-12 14:55

Summary

149

9

collected in 38 fertility centres, we selected those women who underwent

HSG and/or laparoscopy as part of their subfertility work-up. Follow-up started

immediately after tubal testing and follow-up was ended 12 months thereafter.

We correlated findings at HSG or laparoscopy with time to pregnancy. Follow-

up was censored at the date of last contact, when the woman was not pregnant

or at the start of fertility treatment. Kaplan-Meier curves for the occurrence of

spontaneous intrauterine pregnancy were constructed for women without tubal

pathology, for those with unilateral tubal pathology and for women with bilateral

tubal pathology at HSG or laparoscopy. With multivariable Cox regression analysis

the association between tubal pathology and the occurrence of an intrauterine

pregnancy was expressed in fecundity rate ratios (FRRs). In total, 3301 women

were included, of whom 2043 underwent HSG only, 747 underwent diagnostic

laparoscopy only and 511 underwent both. At HSG, 322 (13%) women showed

unilateral tubal pathology and 135 (5%) showed bilateral tubal pathology. At

laparoscopy, 167 (13%) showed unilateral tubal pathology and 215 (17%) showed

bilateral tubal pathology. Multivariable analysis resulted in FRRs of 0.81 [95%

confidence interval (CI): 0.59-1.1] for unilateral, and 0.28 (95% CI: 0.13-0.59) for

bilateral, tubal pathology at HSG. The FRRs at laparoscopy were 0.85 (95% CI: 0.47-

1.52) for unilateral, and 0.24 (95%CI: 0.11-0.54) for bilateral, tubal pathology. From

this study it was concluded that women with unilateral tubal pathology at HSG or

laparoscopy had moderate, non-significant reduction in spontaneous pregnancy

chances, whereas those with bilateral tubal pathology at HSG or laparoscopy had a

severe reduction in the ability to achieve spontaneous conception. This reduction

in fertility prospects was similar for HSG and laparoscopy, suggesting that HSG and

laparoscopy have a comparable predictive capacity for natural conception.

Chapter 7 investigates the prognostic significance of positive Chlamydia

trachomatis antibodies for spontaneous pregnancy once tubal pathology has been

ruled out. We studied ovulatory women in whom HSG or laparoscopy showed

patent tubes. Women were tested for C. trachomatis immunoglobulin G (IgG)

antibodies with either micro-immunofluorescence (MIF) or an ELISA. CAT serology

was positive if the MIF titre was ≥ 1:32 or if the ELISA index was > 1.1. The proportion

of couples pregnant without treatment was estimated at 12 months of follow-up.

Time to pregnancy was considered censored at the date of the last contact when

the woman was not pregnant or at the start of treatment. The association between

CAT positivity and an ongoing pregnancy was evaluated with Cox regression

23138 Coppus.indd 149 28-09-12 14:55

Chapter 9

150

analyses. Of the 1882 included women without visible tubal pathology, 338 (18%)

had a treatment- independent pregnancy within 1 year [estimated cumulative

pregnancy rate 31%; 95% confidence interval (CI): 27%-35%]. Because of violation

of Cox regression analysis assumptions after 9 months of follow-up, which could

not be explained on sound clinical nor statistical grounds, regression analyses

were limited to the first 9 months after tubal testing. Nevertheless, positive C.

trachomatis IgG serology was associated with a statistically significant 33% lower

probability of an ongoing pregnancy [adjusted fecundity rate ratio 0.66 (95% CI

0.49 to 0.89)]. We concluded that CAT testing does not only have diagnostic value,

but also prognostic value, even after HSG or laparoscopy has shown no visible

tubal pathology, as subfertile women with a positive CAT have lower pregnancy

chances than CAT negative women. After external validation, this finding could be

incorporated into existing prognostic models.

Chapter 8 presents a discussion of the findings of this thesis, clinical implications

and future research recommendations.

Chapter 10 provides an epilogue in which the results of three different theses on

tubal pathology from our study group are integrated.

23138 Coppus.indd 150 28-09-12 14:55

23138 Coppus.indd 151 28-09-12 14:55

23138 Coppus.indd 152 28-09-12 14:55

9Samenvatting

23138 Coppus.indd 153 28-09-12 14:55

Chapter 9

154

Subfertiliteit wordt gewoonlijk gedefinieerd als het uitblijven van zwangerschap na

regelmatige onbeschermde samenleving gedurende 12 maanden. De prevalentie

van subfertiliteit bedraagt rond de 14%, ofwel 1 op de 7 koppels dat probeert

zwanger te worden zal ermee geconfronteerd worden. Er wordt verondersteld

dat bij 10-30% van deze stellen de subfertiliteit het gevolg is van tubapathologie.

Eerdere infecties, endometriose, of chirurgische ingrepen in het verleden kunnen

geleid hebben tot occlusie van de tuba of adhesies. Om deze afwijkingen aan te

tonen danwel uit te sluiten ondergaan subfertiele paren vaak tubadiagnostiek als

afronding van het fertiliteitonderzoek. Momenteel wordt de laparoscopie met het

testen van de doorgankelijkheid van de tubae gezien als de meest informatieve

vorm van tubadiagnostiek, maar laparoscopie heeft verschillende nadelen.

Allereerst is het een invasieve chirurgische procedure onder algehele anesthesie,

met bijbehorende risico’s. Ten tweede is de beschikbaarheid niet ongelimiteerd, als

gevolg van hoge kosten, benodigde operatietijd en personeel. Daarom is gezocht

naar manieren om juist die vrouwen waarbij men ook verwacht afwijkingen te

vinden een laparoscopie te laten ondergaan, en tegelijkertijd het aantal ‘negatieve’

laparoscopieën te verminderen.

Nationaal en internationaal bestaat een grote praktijkvariatie in het type tubatest,

het tijdstip waarop de tubatest verricht wordt, en indicatie. Sommige richtlijnen

bevelen een zorgvuldige anamnese aan om vrouwen voor tubadiagnostiek te

selecteren, andere richtlijnen pleiten voor het gebruik van de Chlamydia antistof

test, terwijl weer andere richtlijnen tubadiagnostiek bij alle subfertiele vrouwen

adviseren. Een probleem bij de interpretatie van het onderzoek dat tot op heden

gedaan is, is dat het meeste test evaluatie onderzoek geïsoleerd van de klinische

context verricht is. Voor het onderzoek beschreven in dit proefschrift werd

gestart met het ophelderen van methodologische principes in diagnostisch en

prognostisch onderzoek. Hierna werden multivariabele diagnostische modellen

ontwikkeld, om voorafkansen op tubapathologie te genereren die kunnen helpen

vrouwen met een hoge kans op (ernstige) tubapathologie te onderscheiden van

vrouwen met een lage kans. Verder werd onderzocht wat de toegevoegde waarde

van de Chlamydia antistof test is bovenop hetgeen al bekend is uit zorgvuldig

afnemen van de anamnese. Ten slotte werd de prognostische waarde voor het

voorspellen van een spontane zwangerschap onderzocht voor de meest gebruikte

tubatesten testen zoals HSG, laparoscopie en de Chlamydia antistof test.

23138 Coppus.indd 154 28-09-12 14:55

Samenvatting

155

9

Hoofdstuk 1 beschrijft de achtergrond en doelstellingen van dit proefschrift.

Hoofdstuk 2 beschrijft de resultaten van een onderzoek waarin de gevolgen van

het Standards for Reporting of Diagnostic Accuracy Studies (STARD) statement op

de kwaliteit van rapportage van test accuratesse onderzoek bekeken werd. Hiervoor

werden in twee toptijdschriften op het gebied van de voortplantingsgeneeskunde

24 artikelen uit 1999 (pre-STARD), en 27 artikelen uit 2004 (post-STARD) nader

onderzocht. Studies die rapporteerden over de waarde van een diagnostische

test vergeleken met een referentiestandaard werden geïncludeerd. Elk artikel

werd gescoord aan de hand van een checklist bestaand uit 25 onderdelen,

waarin zaken als de studie vraag, samenstelling van de studiepopulatie, opzet

van de studie, testmethoden, referentiestandaard, statistiek, en rapportage van

resultaten en conclusie aan de orde komen. Het gemiddeld aantal gerapporteerde

STARD onderdelen was 12.1 ± 3.3 (spreiding 6.5-20) voor artikelen uit 1999 (pre-

STARD), en 12.4 ± 3.2 (spreiding 7-17.5) voor artikelen uit 2004 (post-STARD). In

totaal werd in minder dan de helft van de artikelen 50% of meer van de STARD

onderdelen adequaat gerapporteerd, met een ruime spreiding in de rapportage

van individuele onderdelen. Er werd geen significante verbetering gezien in het

gemiddeld aantal gerapporteerde zaken in de artikelen gepubliceerd na invoering

van het STARD statement. Wij concludeerden derhalve dat auteurs van test

accuratesse onderzoek in de twee vooraanstaande tijdschriften op het gebied van

de voortplantingsgeneeskunde heel matig hun opzet, uitvoering, methodologie

en statistische analyse van hun onderzoek rapporteren. Dit belemmert de

mogelijkheid om deze studies goed op hun waarde te kunnen beoordelen. Strikte

naleving van de STARD criteria voor publicatie van een test accuratesse onderzoek

dient daarom sterk aangemoedigd te worden.

Hoofdstuk 3 beschrijft een opiniërend artikel over de methodologische

overwegingen wanneer prognostische modellen binnen de voortplantings-

geneeskunde geëvalueerd worden. Vaak worden dergelijke modellen gebruikt

om de kans op spontane zwangerschap te berekenen, maar ook om de kans

op zwangerschap na ovulatie inductie, intra-uteriene inseminatie of na in vitro

fertilisatie uit te drukken. Het prestatievermogen van zulke prognostische

predictiemodellen wordt frequent geëvalueerd met een receiver operating

characteristic (ROC) curve. Het oppervlak onder de ROC curve, ook wel c-statistic

indien tijd tot zwangerschap gemodelleerd wordt, wordt dan gebruikt als maat

23138 Coppus.indd 155 28-09-12 14:55

Chapter 9

156

voor model prestatie. Echter, voor de meeste predictiemodellen binnen de

voortplantingsgeneeskunde is deze c-statistic waarde laag. In dit hoofdstuk

wordt aangetoond dat lage waarden van de c-statistic verwacht kunnen worden

in dergelijke predictiemodellen, maar dat dit niet betekent dat deze modellen

onbruikbaar zijn in de dagelijkse klinische praktijk. Beargumenteerd wordt

waarom de calibratie van een model (de overeenkomst tussen voorspelde kansen

afkomstig van het model en de geobserveerde zwangerschapscijfers), een klinisch

bruikbare verdeling van kansen, het vermogen van een model om koppels in te

delen in categorieën die buikbaar zijn om behandelbeslissingen te nemen, en

reclassificatie indices belangrijkere begrippen zijn voor evaluatie van modellen.

Hoofdstuk 4 beslaat een verkennend onderzoek naar de mogelijkheid om door

middel van een probabilistisch model, gebaseerd op onderdelen uit de anamnese,

een kansschatting op tubapathologie te maken. Daarnaast werd onderzocht of

testen op aanwezigheid van Chlamydia trachomatis antistoffen (CAT) toegevoegde

waarde heeft, wanneer de verschillen in kans op tubapathologie (gebaseerd op de

anamnese), meegenomen wordt. Om deze vragen te beantwoorden werden data

van 207 opeenvolgende subfertiele vrouwen gebruikt voor het construeren van

multivariabele logistische regressie modellen, die tubapathologie - vastgesteld

door middel van laparoscopie - kunnen voorspellen. Het model gebaseerd op

alleen factoren uit de anamnese toonde een oppervlak onder de ROC curve

(AUC) van 0.65 (95% BI 0.56-0.74). Toevoegen van CAT aan het anamnese model

verhoogde de AUC tot 0.70 (95% BI 0.62-0.78) [p=0.065]. CAT was positief in 40

vrouwen, met een sensitiviteit van 0.37 (95% CI 0.26-0.49) en een specificiteit van

0.88 (95% CI 0.82-0.93). In CAT positieve vrouwen verlaagde een blanco anamnese

de kans op tubapathologie niet. In de 167 CAT negatieve vrouwen, waren er 23

(14%) die ondanks afwezigheid van Chlamydia antistoffen toch een hoge kans op

tubapathologie hadden op basis van hun anamnese. Bij 11 van deze 23 vrouwen

(48%) werden inderdaad afwijkingen bij de laparoscopie gezien. Op basis van

deze gegevens kan geconcludeerd worden dat CAT een toegevoegde waarde

heeft aan het risico profiel van de vrouw gebaseerd op de anamnese, en dat een

combinatie van anamnese en CAT een betere diagnostische waarde heeft dan een

van beide alleen.

Hoofdstuk 5 onderzoekt in een veel groter cohort de waarde van de anamnese

om bij het eerste consult een inschatting van de kans op tubapathologie te

23138 Coppus.indd 156 28-09-12 14:55

Samenvatting

157

9

kunnen maken. Het uiteindelijke doel van tubadiagnostiek is namelijk vrouwen

met dubbelzijdige tubapathologie - met een vrijwel uitzichtloze kans op

spontane zwangerschap - te identificeren, zodat zij met in-vitro-fertilisatie

(IVF) of tubachirurgie behandeld kunnen worden. Om de onderzoeksvraag te

beantwoorden, gebruikten we de gegevens van 3716 vrouwen die een tubatest

ondergingen binnen een nationaal cohort onderzoek naar de waarde van de

testen uit het oriënterend fertiliteitonderzoek. Elementen uit de anamnese werden

gerelateerd aan tubapathologie. Met behulp van multivariabele logistische

regressie analyses werden twee diagnostische modellen geconstrueerd. In het

eerste model werd tubapathologie gedefinieerd als occlusie en/of ernstige

adhesies van tenminste één tuba, terwijl in het tweede model tubapathologie

werd gedefinieerd als bilaterale afwijkingen. Beide modellen konden matig goed

vrouwen met en vrouwen zonder tubapathologie onderscheiden, met AUC’s van

0.65 (95% BI 0.63-0.68) voor elke vorm van tubapathologie, en 0.68 (95% BI 0.65-

0.71) voor bilaterale tubapathologie. De modellen konden echter bijna perfect de

vrouwen met een heel hoge kans op afwijkingen onderscheiden van de vrouwen

met een lage kans op afwijkingen. Een simpele diagnostische score kaart werd

ontwikkeld om beide modellen in de dagelijkse praktijk eenvoudig te kunnen

gebruiken. Enkele rekenvoorbeelden laten zien dat een meer mathematisch

gebruik van de anamnese zou kunnen leiden tot een efficiëntere inzet van de

beperkt beschikbare diagnostische laparoscopie.

Hoofdstuk 6 beschrijft de prognostische waarde van hysterosalpingografie

en laparoscopie voor het optreden van een spontane zwangerschap in een

doorsnee subfertiele populatie. Uit gegevens die prospectief verzameld waren in

38 fertiliteitcentra werden die vrouwen geselecteerd die als onderdeel van het

fertiliteitonderzoek een HSG danwel een laparoscopie ondergingen. Vrouwen

werden tot 12 maanden na het moment van tubadiagnostiek opgevolgd. Follow-

up gegevens werden gecensureerd op het moment van het laatste patiëntcontact,

indien een vrouw niet zwanger werd binnen de follow-up periode, of op het

moment dat een fertiliteitbehandeling gestart werd. De bevindingen van het HSG

of de laparoscopie werden gecorreleerd aan de tijd tot spontane zwangerschap.

Kaplan-Meier curves werden gemaakt voor vrouwen zonder afwijkingen, voor

vrouwen met unilaterale tubapathologie en voor vrouwen met dubbelzijdige

afwijkingen bij het HSG of de laparoscopie. Met multivariabele Cox regressie analyse

werd de associatie tussen tubapathologie en het optreden van een zwangerschap

23138 Coppus.indd 157 28-09-12 14:55

Chapter 9

158

gemodelleerd. In totaal konden 3301 vrouwen geïncludeerd worden. Hiervan

ondergingen 2042 vrouwen een HSG, 747 een diagnostische laparoscopie en 511

vrouwen zowel een HSG als laparoscopie. Het HSG was unilateraal afwijkend in 322

vrouwen (13%), en bilateraal afwijkend bij 135 vrouwen (5%). Bij de laparoscopie

was in 167 vrouwen (13%) sprake van unilaterale pathologie, en in 215 vrouwen

(17%) bilaterale pathologie. Multivariabele analyses resulteerden in een fertiliteit

ratio (FR) van 0.81 (95% BI 0.59-1.1) voor unilaterale, en 0.28 (95% BI 0.13-0.59)

voor bilaterale tubapathologie op het HSG. Voor laparoscopie bedroeg de HR

0.85 (95% BI 0.47-1.52) voor unilaterale, en 0.24 (95% BI 0.11-0.5) voor bilaterale

tubapathologie. Op basis van dit onderzoek kan geconcludeerd worden dat

vrouwen met eenzijdige tuba afwijkingen op HSG en laparoscopie een matige, niet

significante verlaagde spontane zwangerschapskans hebben, terwijl een bilateraal

afwijkend HSG of laparoscopie de kans op spontane zwangerschap in hoge mate

verminderd. Deze verlaagde fertiliteit prognose was van gelijke grootte voor het

HSG en de laparoscopie, hetgeen inhoudt dat beide testen een vergelijkbare

prognostische waarde voor het optreden van spontane zwangerschap hebben.

Hoofdstuk 7 beschrijft een onderzoek naar de prognostische waarde van

positieve Chlamydia trachomatis antistoffen op het optreden van een spontane

zwangerschap als invasieve tubadiagnostiek geen afwijkingen heeft laten zien.

Hiervoor werd een cohort ovulatoire vrouwen, bij wie een HSG of laparoscopie

doorgankelijke tubae toonde, en die getest waren op de aanwezigheid

van C. trachomatis immunoglobuline (IgG) antistoffen met hetzij micro-

immunofluorescentie technieken (MIF) danwel een ELISA test, onderzocht. CAT

serologie werd als positief beschouwd indien de MIF titer ≥ 1:32 was of indien

de ELISA index > 1.1 bedroeg. De spontane zwangerschapskans na 12 maanden

follow-up werd berekend, waarbij tijd tot zwangerschap gecensureerd werd

op het moment van het laatste patiëntcontact, indien een vrouw niet zwanger

werd binnen deze periode, of op het moment dat een fertiliteitbehandeling

gestart werd. De associatie tussen positieve CAT en doorgaande zwangerschap

werd gemodelleerd met Cox regressie analyses. Van de 1882 vrouwen met

doorgankelijke tubae, werden er 338 (18%) spontaan zwanger binnen het jaar

(geschatte cumulatieve zwangerschapskans 31%; 95% BI 27%-35%). Omdat de

aannames voor het toepassen van Cox regressie analyses geschonden werden

na 9 maanden follow-up, wat noch op basis van klinische, noch op basis van

statistische gronden verklaard kon worden, moesten de regressie analyses tot 9

23138 Coppus.indd 158 28-09-12 14:55

Samenvatting

159

9

maanden follow-up duur beperkt worden. Toch waren positieve C. trachomatis

IgG antistoffen duidelijk en statistisch significant geassocieerd met een 33%

lagere kans op doorgaande zwangerschap (gecorrigeerde fertiliteit ratio 0.66

(95% BI 0.49-0.89). Wij concluderen daarom dat CAT in het fertiliteitonderzoek

niet alleen diagnostisch, maar ook prognostisch relevant is, zelfs nadat HSG of

laparoscopie evidente tubapathologie heeft uitgesloten. Subfertiele vrouwen

met een positieve CAT maar met normale tubae hebben een duidelijk lagere kans

op spontane conceptie dan de CAT negatieve vrouwen bij wie tubapathologie

uitgesloten is. Na externe validatie in een andere studie, zou deze nieuwe

bevinding toegevoegd kunnen worden aan bestaande prognostische modellen

voor spontane zwangerschap.

Hoofdstuk 8 tenslotte geeft een algemene bespreking van de resultaten uit

dit proefschrift, beschrijft de klinische implicaties en bevat aanbevelingen voor

toekomstig onderzoek.

Hoofdstuk 10 bestaat uit een epiloog, in welke de resultaten van drie proefschriften

uit onze tubapathologie onderzoeksgroep geïntegreerd worden.

23138 Coppus.indd 159 28-09-12 14:55

23138 Coppus.indd 160 28-09-12 14:55

10Epilogue

23138 Coppus.indd 161 28-09-12 14:55

Chapter 10

162

With an estimated prevalence between 11 and 30% in subfertile populations,

tubal pathology is an important cause for subfertility (Hull et al., 1985, Collins et al.,

1995, Snick et al., 1997). The American Society of Reproductive Medicine (ASRM)

recommends a careful medical history and physical examination to identify

symptoms and signs suggesting a specific cause for subfertility, which can be

the focus of subsequent diagnostic evaluation. The National Institute for Clinical

Excellence (NICE) guideline advises the use of patient characteristics to decide

whether tubal testing should be performed and women without comorbidities

should be offered HSG. The guideline of the Dutch Society for Obstetrics and

Gynaecology (NVOG) mentions the use of patient characteristics as a first step in

the diagnostic strategy (ASRM, 2006; NICE, 2004; NVOG, 2004).

Although all these guidelines recommend medical history and physical

examination as a primary evaluation in the fertility work-up, a clear evidence

based recommendation in whom, when and which tubal patency tests should

be performed is not provided. As a consequence there is wide variation in clinical

practice concerning the use of tubal patency tests. A diagnostic strategy in which

all available information from the medical history and physical examination is

integrated with the results of tubal patency tests could potentially lead to more

cost-effective testing of tubal pathology.

We have performed and published several studies on patient characteristics and

these diagnostic tubal tests since the publication of these guidelines. In one study

we developed decision rules to express the probability of tubal pathology at the

first consultation based on patient characteristics only (Coppus et al., 2007). In

another study we showed that the addition of Chlamydia trachomatis Antibody

Test (CAT) to a diagnostic model based on patient characteristics increased the

AUC for the diagnosis of any tubal pathology from 0.65 to 0.70, although not

significantly (Coppus et al., 2007).

In a separate IPD-analysis, it was shown that from three commonly used Chlamydia

Antibody Tests, the Micro Immuno Fluorescence (MIF) test, showed a moderate

ability to discriminate between women with and without tubal pathology, but

performed best of the three CAT tests (Broeze et al., 2011). Additional testing for

a high-sensitive CRP (hs-CRP), a possible marker for persistence of a Chlamydia

trachomatis infection, to the CAT, increased the diagnostic accuracy of CAT, but

23138 Coppus.indd 162 28-09-12 14:55

Epilogue

163

10

the result of that study requires confirmation before it is implemented in clinical

practice (Den Hartog et al. 2008).

In another study we provided decision rules in which information from medical

history and physical examination were combined with the results of tubal testing in

order to calculate the predicted probability of tubal pathology (Broeze et al., 2012).

The combination of patient characteristics with CAT and HSG results provided the

best diagnostic performance for the diagnosis of bilateral tubal pathology.

We also showed that the diagnostic performance of HSG is invariant over several

subgroups of patients, suggesting that HSG is able to diagnose both any and bilateral

tubal pathology equally in all subfertile women and is a useful screening test for all

subfertile women. Of note is that in women at low risk for tubal pathology (i.e. no

risk indicators in the history and a negative CAT result), the sensitivity of HSG was

low, but the specificity remained stable (Broeze et al., 2011). This is most likely due to

false positive results at laparoscopy, which is the standard reference test in diagnostic

studies on tubal patency. In women at low-risk for tubal pathology and a normal HSG,

laparoscopy can show abnormalities which, we think, are often caused by technical

problems at laparoscopy. These technical problems can consist of vaginal leakage

of dye, low pressure at chromopertubation, premature ending of the procedure,

difference in flow when one tube is patent or invisibility of the fimbrial ends.

In another study we found that HSG and laparoscopy show comparable

performance in predicting natural conception, indicating that from that

perspective there is no preference for one of these tests (Verhoeve et al., 2011).

One randomised trial showed no additional advantage of diagnostic laparoscopy if

this was performed following a normal HSG, on treatment decision and pregnancy

outcome (Tanahatoe et al., 2005) and, in another randomised trial, the number of

diagnostic laparoscopies was substantially reduced if diagnostic laparoscopy was

preceded by HSG (Perquin et al., 2006).

Combining the results of these studies, it can be concluded that medical history

and physical examination can differentiate between women at low and at high risk

for tubal pathology. Identification of those women at highest risk for bilateral tubal

pathology, who have the lowest chances for natural conception, is best obtained

by combining patient characteristics with CAT and HSG.

23138 Coppus.indd 163 28-09-12 14:55

Chapter 10

164

In a cost-effective analysis, different diagnostic strategies for presence of tubal

pathology were assessed. In this study, patient characteristics were taken into

account and obtained from IPD-analyses (Broeze et al., 2012), the prognostic

model of Hunault for unexplained subfertility was used (Hunault et al., 2004)

as well as a prognostic model for pregnancy outcome after IVF treatment in

a Dutch cohort (Lintsen et al., 2007). This study showed that no diagnostic test

and expectant management is the most cost-effective strategy until the age of

38 years, and no diagnostic test but direct treatment from the age of 39 years.

If, however, a diagnostic tubal test is planned, a strategy of first HSG followed by

diagnostic laparoscopy, where HSG shows bilateral tubal pathology, followed by

management depending on the test result, is the most cost-effective strategy

(Verhoeve et al., 2012; submitted).

We suggest the following for tubal patency tests in the fertility work-up; in women

until 38 years and at low risk for tubal pathology based on medical history, physical

examination and CAT result, expectant management and no diagnostic test for at

least 12 months is justified and will reduce the number of unnecessary invasive

diagnostic tests, complications and cost. An HSG followed by laparoscopy, if HSG

shows bilateral occlusion, should be considered, if conception does not occur

after expectant management and if a couple prefers fertility treatment other

than IVF. In women with bilateral distal occlusion, HSG can be helpful to decide

whether laparoscopic salpingostomy is preferable above or before IVF, although

randomised evidence for this is lacking. In women 39 and older, direct treatment

is the most cost-effective scenario, irrespective of the medical history, physical

examination and CAT result. It is not to be expected that every couple is prepared

to start directly with IVF treatment. The second best strategy is then to prove tubal

patency by HSG and if the tubes are found to be open, couples can be counselled

to choose between expectant management, intra uterine insemination and IVF,

obviously taking the prognosis for natural conception into account (Hunault et

al., 2004; Steures et al., 2006). In some women sonographically visible bilateral

hydrosalpinges may be detected before tubal testing. In these women direct

laparoscopy is advised and can be combined at the same time with salpingectomy

or laparoscopic tubal occlusion, since it has been shown that this improves IVF-

outcome (Johnson et al., 2010).

23138 Coppus.indd 164 28-09-12 14:55

Epilogue

165

10

Our recommendation and findings can serve as a framework for a new evidence

based guideline for the diagnosis of tubal pathology in subfertile couples. Several

aspects still need to be addressed and considered. Although our decision rules

showed good calibration (the correspondence between model-based probabilities

and observed tubal pathology rates) and can be easily applied in clinical practice,

external validation is still required (Leushuis et al., 2009).

Also further research is needed concerning the finding of unilateral tubal pathology.

Although our findings did not show a significant reduction in pregnancy rates

in this group of women (Mol et al., 1999; Verhoeve et al., 2011), the pregnancy

rate may be overestimated due to use of conventional methods of analysis (van

Geloven et al., 2012). It is thus possible that in case of unilateral tubal pathology,

active management such as surgery, IUI or IVF may result in significantly higher

pregnancy rates. To answer this question requires a randomised controlled trial in

this group of women.

Finally, we recommend expectant management and deference of tubal testing in a

substantial number of couples. A recent survey amongst patients and professionals

in the Netherlands showed that not only patients’ appreciation of expectant

management was moderate, but also the professionals’ adherence to expectant

management. Improvement of adherence may be obtained by providing more

information material to patients about prognostic models and providing protocols

and training to professionals and by improving their communications skills (van

den Boogaard et al., 2012). Tubal tests may have additional effects on patients’

health apart from the consequences of subsequent management decisions

(Bossuyt and McCaffery, 2009; Lenhard et al., 2005). These additional effects, such

as knowing the cause of the subfertility, being reassured that tubes are patent or

anxiety provoked when the tests reveal bad news, have not been studied. The

value of such information may influence the decision to test or not and should be

studied.

23138 Coppus.indd 165 28-09-12 14:55

Chapter 10

166

References

ASRM Practice Committee Reports. Optimal evaluation of the infertile female. Fertil Steril 2006; 86:s264-s267.

Bossuyt PM and McCaffery K. Additional patient outcomes and pathways in evaluations of testing. Med Decis Making 2009; 29:E30-E38.

Broeze KA, Opmeer BC, Coppus SF, Van Geloven N, Den Hartog JE, Land JA, Van der Linden PJQ, Ng EHY, Van der Steeg JW, Steures P, Van der Veen F, Mol BW. Integration of patient characteristics and the results of Chlamydia antibody testing and hysterosalpingography in the diagnosis of tubal pathology: an individual patient data meta-analysis. Hum Reprod 2012; 27:2979-2790.

Broeze K, Opmeer BC, Coppus SF, Van Geloven N, Alves MF, Ånestad G, Bhattacharya S, Allan J, Guerra-Infante MF, Den Hartog JE et al. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011; 17:301-310.

Coppus SF, Verhoeve HR, Opmeer BC, van der Steeg JW, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW. Identifying subfertile ovulatory women for timely tubal patency testing: a clinical decision rule based on medical history. Hum Reprod 2007; 22:2685-2692.

Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007; 22:1353-1358.


Hull MG, Glazener CM, Kelly NJ, Conway DI, Foster PA, Hinton RA, Coulson C, Lambert PA, Watt EM, Desai KM. Population study of causes, treatment, and outcome of infertility. BMJ 1985; 291:1693-7.

Hunault CC, Habbema JD, Eijkemans MJ, Collins JA, Evers JL, van der Veen F, te Velde ER. Two new prediction rules for spontaneous pregnancy leading to live birth among subfertile couples, based on the synthesis of three previous models. Hum Reprod 2004; 19:2019-2026.

Johnson N, van Voorst S, Sowter MC, Strandell A, Mol BW. Surgical treatment for tubal disease in women due to undergo in vitro fertilisation. Cochrane Database Syst Rev 2010; 20: CD002125.

Lenhard W, Breitenbach E, Ebert H, Schindelhauer-Deutscher HJ, Henn W. Psychological benefit of diagnostic certainty for mothers of children with disabilities: lessons from Down syndrome. Am J Med Genet 2005, 133A: 170-175.

Leushuis E, van der Steeg JW, Steures P, Bossuyt PM, Eijkemans MJ, van der Veen F, Mol BW, Hompes PG. Prediction models in reproductive medicine: a critical appraisal. Hum Reprod Update 2009; 15:537-52.

Lintsen AM, Eijkemans MJ, Hunault CC, Bouwmans CA, Hakkaart L, Habbema JD, Braat DD. Predicting ongoing pregnancy chances after IVF and ICSI: a national prospective study. Hum Reprod 2007; 22:2455-2462.


www.nice.org.uk/pdf/download.aspx?)0=CGO11fullguideline

23138 Coppus.indd 166 28-09-12 14:55

Epilogue

167

10

NVOG Richtlijn Orienterend Fertiliteits-onderzoek (OFO) Werkgroep Voortplantings Endocrinologie en Fertiliteit. Nederlandse Vereniging voor Obstetrie en Gynaecologie (NVOG). 2004.

Perquin DAM, Dorr PJ, de Craen AJM, Helmerhorst FM. Routine use of hysterosalpingography prior to laparoscopy in the fertility workup: a multicentre randomised controlled trial. Hum Reprod 2006; 21:1227-1231.

Snick HK, Snick TS, Evers JL, Collins JA. The spontaneous pregnancy prognosis in untreated subfertile couple: the Walcheren primary care study. Hum Reprod 1997; 12:1582-1588

Steures P, van der Steeg JW, Hompes PG, Habbema JD, Eijkemans MJ, Broekmans FJ, Verhoeve HR, Bossuyt PM, van der Veen F, Mol BW; Collaborative Effort on the Clinical Evaluation in Reproductive Medicine. Intrauterine insemination with controlled ovarian hyperstimulation versus expectant management for couples with unexplained subfertility and an intermediate prognosis: a randomised clinical trial. Lancet 2006; 368:216-221.

Tanahatoe S, Lambalk CB, Hompes PG. The role of laparoscopy in intrauterine insemination: a prospective randomized reallocation study. Hum Reprod 2005; 20:3225-3230.

Van Geloven N, Broeze KA, Bossuyt PM, Zwinderman AH, Mol BW. Treatment should be considered a competing risk when predicting natural conception in subfertile women. Hum Reprod 2012; 27:889-95.

Verhoeve HR, Moolenaar LM, Hompes P, Van der Veen F, Mol BW. Cost-effectiveness of tubal patency tests. 2012 Submitted.

Verhoeve HR, Coppus SF, van der Steeg JW, Steures P, Hompes PG, Bourdrez P, Bossuyt PM, van der Veen F, Mol BW. The capacity of hysterosalpingography and laparoscopy to predict natural conception. Hum Reprod 2011; 26: 134-142.

23138 Coppus.indd 167 28-09-12 14:55

23138 Coppus.indd 168 28-09-12 14:55

Part 4Appendices

23138 Coppus.indd 169 28-09-12 14:55

23138 Coppus.indd 170 28-09-12 14:55

11List of co-authors and their affiliations

23138 Coppus.indd 171 28-09-12 14:55

Chapter 11

172

23138 Coppus.indd 172 28-09-12 14:55

173

11

List of co-authors

Prof. Dr. S. Bhattacharya, Department of Obstetrics and Gynaecology, Aberdeen

Maternity Hospital, Aberdeen, UK.

Drs. P. Boudrez, Department of Obstetrics and Gynaecologie, VieCuri Medical

Centre, Venlo/Venray.

Prof. Dr. P.M. Bossuyt, Department of Clinical Epidemiology, Biostatistics and Bio-

informatics, Academic Medical Centre, Amsterdam.

Dr. M.J. Eijkemans, Julius Centrum, University Medical Centre Utrecht.

Dr. P.G. Hompes, Department of Obstetrics and Gynaecology, Vrije Universiteit

Medical Centre, Amsterdam.

Prof. Dr. J.A. Land, Department of Obstetrics and Gynaecology, University Medical

Centre Groningen, Groningen.

Dr. S. Logan, Department of Obstetrics and Gynaecology, Aberdeen Maternity

Hospital, Aberdeen, UK.

Prof. Dr. B.W. Mol, Department of Obstetrics and Gynaecology, Academic Medical

Centre, Amsterdam / Máxima Medical Centre, Veldhoven

Dr. B.C. Opmeer, Department of Clinical Epidemiology, Biostatistics and Bio-

informatics, Academic Medical Centre, Amsterdam.

Dr. J.W. van der Steeg, Department of Obstetrics and Gynaecology, Academic

Medical Centre, Amsterdam.

Dr. P. Steures, Department of Obstetrics and Gynaecology, Academic Medical

Centre, Amsterdam.

Prof. Dr. F. van der Veen, Centre for Reproductive Medicine, Academic Medical

Centre, Amsterdam

Drs. H.R. Verhoeve, Department of Obstetrics and Gynaecology, Onze Lieve Vrouwe

Gasthuis, Amsterdam.

23138 Coppus.indd 173 28-09-12 14:55

23138 Coppus.indd 174 28-09-12 14:55

11List of publications

23138 Coppus.indd 175 28-09-12 14:55

Chapter 11

176


Coppus SF, van der Veen F, Bossuyt PM, Mol BW. Quality of reporting of test accuracy studies in reproductive medicine: impact of the standards for reporting of diagnostic accuracy (STARD)-initiative. Fertil Steril 2006, 86(5): 1321-1329.*

Coppus SF, Mol BW. Diagnostic accuracy of three-dimensional hysterosalpingo-contrast sonography [letter]. Acta Obstet Gynecol Scand 2006, 85(4): 508-509.

Coppus SF, Opmeer BC, van der Veen F, Bossuyt PM, Mol BW. Routine use of hysterosalpingography prior to diagnostic laparoscopy in the fertility workup [letter]. Hum Reprod 2006, 21(10): 2725-2726.

Coppus SF, Opmeer BC, Logan S, van der Veen F, Bhattacharya S, Mol BW. The predictive value of medical history taking and Chlamydia IgG ELISA antibody testing (CAT) in the selection of subfertile women for diagnostic laparoscopy: a clinical prediction model approach. Hum Reprod 2007, 22(5): 1353-1358.*

Van Leeuwen M, Opmeer BC, Coppus SF, Mol BW. Fasting capillary glucose as a screening test for gestational diabetes [letter]. BJOG 2007, 114(3): 372.

Geomini PM, Coppus SF, Kluivers KB, Bremer GL, Kruitwagen RF, Mol BW. Is three-dimensional ultrasonography of additional value in the assessment of adnexal masses? Gynecol Oncol 2007, 106(1): 153-159.

Coppus SF, Langenveld J, Oei SG. Een onderschatte techniek voor het opheffen van schouderdystocie: baren op handen en knieën (‘all fours manoeuvre’). Ned Tijdschr Geneeskd 2007, 151(27): 1493-1497.

Coppus SF, Verhoeve HR, Opmeer BC, van der Steeg JW, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW. Identifying subfertile ovulatory women for timely tubal patency testing: a clinical decision rule based on medical history. Hum Reprod 2007, 22(10): 2685-2692.*

Coppus SF, Emparanza JI, Hadley J, Kulier R, Weinbrenner S, Arvanitis TN, Burls A, Cabello JB, Decsi T, Horvath AR, Kaczor M, Zanrei G, Pierer K, Stawiarz K, Kunz R, Mol BW, Khan KS. A clinically integrated curriculum in Evidence-Based Medicine for just-in-time learning through on-the-job training: The EU-EBM project. BMC Med Educ 2007, 7:46.

Kulier R, Hadley J, Weinbrenner S, Meyerrose B, Decsi T, Horvath AR, Nagy E, Emparanza JI, Coppus SF, Arvanitis TN, Burls A, Cabello JB, Kaczor M, Zanrei G, Pierer K, Stawiarz K, Kunz R, Mol BW, Khan KS. Harmonizing evidence-based medicine teaching: a study of the outcomes of e-learning in five European countries. BMC Med Educ 2008, 8:27.

Luttjeboer FY, Verhoeve HR, van Dessel HJ, van der Veen F, Mol BW, Coppus SF. The value of medical history taking as risk indicator for tuboperitoneal pathology: a systematic review. BJOG 2009, 116(5): 612-625.

Broeze KA, Opmeer BC, Bachmann LM, Broekmans FJ, Bossuyt PM, Coppus SF, Johnson NP, Khan KS, ter Riet G, van der Veen F, van Wely M, Mol BW. Individual patient data meta-analysis of diagnostic and prognostic studies in obstetrics, gynaecology and reproductive medicine. BMC Med Res Methodol 2009, 9:22.

23138 Coppus.indd 176 28-09-12 14:55

177

11


Weinbrenner S, Meyerrose B, Vega-Perez A, Kulier R, Coppus SF, Kunz R. [EUebm – integrating a Europe-wide harmonised training and continuing medical education in evidence based medicine (BbM) with patient care]. Z Evid Fortbild Qual Gesundhwese 2009, 103: 35-39.

Coppus SF, van der Veen F, Opmeer BC, Mol BW, Bossuyt PM. Evaluating prediction models in reproductive medicine. Hum Reprod 2009, 24(8): 1774-1778.*

Kulier R, Coppus SF, Zamora J, Hadley J, Malick S, Das K, Weinbrenner S, Meyerrose B, Decsi T, Horvath AR, Nagy E, Emparanza JI, Arvanitis TN, Burls A, Cabello JB, Kaczor M, Zanrei G, Pierer K, Stawiarz K, Kunz R, Mol BW, Khan KS. The effectiveness of a clinically integrated e-learning course in evidence-based medicine: a cluster randomised controlled trial. BMC Med Educ, 2009, 9:21.

Kunz R, Nagy E, Coppus SF, Emparanza JI, Hadley J, Kulier R, Weinbrenner S, Arvanitis TN, Burls A, Cabello JB, Desci T, Horvath AR, Walzak J, Kaczor MP, Zanrei G, Pierer K, Schaffler R, Suter K, Mol BW, Khan KS. How far did we get? How far to go? A European survey on post-graduate courses in evidence-based medicine. J Eval Clin Pract, 2009, 15(6): 1169-1204.

Hadley J, Kulier R, Zamora J, Coppus SF, Weinbrenner S, Meyerrose B, Decsi T, Horvath AR, Nagy E, Emparanza JI, Arvanitis TN, Burls A, Cabello JB, Kaczor M, Zanrei G, Pierer K, Kunz R, Wilkie V, Wall D, Mol BW, Khan KS. Effectiveness of an e-learning course in evidence-based medicine for foundation (internship) training. J R Soc Med 2010, 103(7): 288-294.

Verhoeve HR, Coppus SF, van der Steeg JW, Steures P, Hompes PG, Bourdrez P, Bossuyt PM, van der Veen F, Mol BW. The capacity of hysterosalpingography and laparoscopy to predict natural conception. Hum Reprod 2011, 26(1): 134-142.*

Broeze KA, Opmeer BC, van Geloven N, Coppus SF, Collins JA, den Hartog JE, van der Linden PJ, Marianowski P, Ng EH, van der Steeg JW, Steures P, Strandell A, van der Veen F, Mol BW. Are patient characteristics associated with the accuracy of hysterosalpingography in diagnosing tubal pathology? An individual patient data meta-analysis. Hum Reprod Update 2011, 17(3): 293-300.

Broeze KA, Opmeer BC, Coppus SF, van Geloven N, Alves MF, Anestad G, Bhattachary S, Allan J, Guerra-Infante MF, Den Hartog JE, Land JA, Idahl A, van der Linden PJ, Mouton JW, Ng EH, van der Steeg JW, Steures P, Svenstrup HF, Tiitinen A, Toye B, van der Veen F, Mol BW. Chlamydia antibody testing and diagnosing tubal pathology in subfertile women: an individual patient data meta-analysis. Hum Reprod Update 2011, 17(3): 301-310.

Coppus SF, Land JA, Opmeer BC, Steures P, Eijkemans MJ, Hompes PG, Bossuyt PM, van der Veen F, Mol BW, van der Steeg JW. Chlamydia trachomatis IgG seropositivity is associated with lower natural conception rates in ovulatory subfertile women without visible tubal pathology. Hum Reprod 2011, 26(11): 3061-3071.*

Broeze KA, Opmeer BC, Coppus SF, van Geloven N, den Hartog JE, Land JA, van der Linden PJ, Ng EH, van der Steeg JW, Steures P, van der Veen F, Mol BW. Integration of patient characteristics and the results of Chlamydia antibody testing and hysterosalpingography in the diagnosis of tubal pathology: an individual patient data meta-analysis. Hum Reprod 2012, 27(10): 2979-2790.

* presented in this thesis

23138 Coppus.indd 177 28-09-12 14:55

23138 Coppus.indd 178 28-09-12 14:55

11Dankwoord

23138 Coppus.indd 179 28-09-12 14:55

Chapter 11

180

Dankwoord

Promoveren doe je niet alleen, dat is me in al die jaren wel duidelijk geworden.

Allereerst een woord van dank aan al die patiënten die hun gegevens ter

beschikking hebben gesteld voor dit onderzoek. Daarnaast ben ik aan een aantal

specifieke personen dank verschuldigd voor de totstandkoming en afronding van

dit proefschrift.

Prof. Dr. B.W. Mol, beste Ben Willem. Ik weet nog goed hoe het allemaal begonnen

is, jij als jonge klare in Veldhoven, ik als jonge ANIOS. Of ik even met je mee wilde

komen, en vervolgens de vraag of onderzoek doen niks voor mij was. Na enig

beraad was mijn antwoord uiteraard ja, en 1 maand later hing ik totaal uitgeput aan

de telefoon omdat ik je tempo niet bij kon houden. Het vertrouwen dat je in me

gehad hebt de afgelopen jaren is mij heel dierbaar, en ondanks enkele haperingen

in de progressie is het resultaat nu daar. Ik waardeer je humor, tomeloze inzet en

werklust, en uitgesproken loyaliteit en vriendschap. Gelieve niet meer om 4.30 am

te bellen. Thx mate!

Prof. Dr. F. van der Veen, beste Fulco. Daar kwam ik in het kielzog van BW in het

AMC aangezet. Totaal geen notie van voortplantingsgeneeskunde, laat staan

van het doen van onderzoek. Vanaf t=0 ging je er echter vol voor, en kwamen

de karakteristieke commentaren met gezwinde spoed retour. Heerlijk scherp, niet

zelden zonder humor, maar ook totaal onwetend van de Wordknop ‘wijzigingen

bijhouden’. Ik weet zeker dat je het wereldrecord ‘abstract corrigeren voor de

deadline’ gebroken hebt. Ook al werd ik bijna tot wanhoop gedreven toen we

voor revisieronde 17 hadden bereikt voor een bepaald manuscript, het feit dat dit

artikel zonder verdere revisie direct geaccepteerd werd heeft je gelijk bewezen!

Dank.

Dr. B.C. Opmeer, beste Brent, dank voor al je statistische hulp en acceptatie van

een Brabander op je kamer in het AMC. Wat was het fijn om op je terug te kunnen

vallen als SPSS weer eens iets heel anders deed dan dat ik voor ogen had. Dat ik

ooit nog eens statistiek onderwijs zou gaan geven had ik tijdens mijn geneeskunde

studie niet gedacht, maar het heeft me goed gedaan!

23138 Coppus.indd 180 28-09-12 14:55

Dankwoord

181

11

Dankwoord

Prof. Dr. P.M.M. Bossuyt, beste Patrick. Dank voor het creëren van een plekje op

de afdeling Epidemiologie en Biostatistiek. Voor enkele van de stukken in dit

proefschrift is jouw hulp onontbeerlijk geweest, hoewel ik je mailtje over het

Chlamydia paper nog steeds niet opnieuw durf te openen. Gelieve voortaan

in een eerder stadium te constateren dat we toch net die andere analytische

techniek hadden moeten kiezen. Jouw opvatting, kennis en kunde van de klinische

epidemiologie, en diagnostisch onderzoek in het bijzonder is van wereldniveau.

Prof. dr. M.P.M. Burger, prof. dr. J.A. Land, prof. dr. J.L.H. Evers, prof. dr. M.J. Heineman,

prof. dr. E.W. Steyerberg en prof. dr. S. Bhattacharya dank voor het beoordelen van

mijn manuscript.

Dr. J.W. van der Steeg, beste Jan Willem – Dr. P. Steures, lieve Pieternel. 1000x dank

dat in jullie OFO data mocht rondneuzen!

Dear Bhatty, thanks for allowing me to explore your tubal pathology data.

Beste co-auteurs, zonder een ieder bij naam te noemen heeft ieder van jullie iets

essentieels aan de individuele manuscripten bijgedragen. Dank.

Maatschap gynaecologie en verloskunde van het Máxima Medisch Centrum in

Veldhoven. Wat een mooi stel zijn jullie bij elkaar en wat heb ik veel geleerd bij

jullie. Dank voor het niet aflatend vertrouwen, en de mogelijkheid om naast al

tijdens mijn ANIOS tijd gedeeltelijk vrij geroosterd te worden om mijn eerste

stappen in de wetenschap te zetten. Die laparoscopische hysterectomie heb ik

toch maar mooi in de pocket!

Huidige en oud-collega assistenten gynaecologie van het MMC. Wat heerlijk om

met jullie te werken. Lieve Judith, beste dr. van Laar, wat was het fijn om met

jouw hoogte- en dieptepunten te delen. Lieve Josje, beste dr. Langenveld, ook

met jou was het heerlijk zuchten en steunen. Ik was best trots dat je met mij je

lekenpraatje wilde oefenen. Dank voor jullie promotie tips en tricks. Stijn, Tamara,

Rafli, Minouche, Femke, Josien, Leonie, Joost, en alle jonkies, jullie kunnen het ook!

Lieve verpleging, kraamverzorgenden, verloskundigen, polikliniek medewerksters

en secretaresses in Veldhoven en Maastricht, excuses als ik met mijn hoofd

23138 Coppus.indd 181 28-09-12 14:55

Chapter 11

182

weer eens bij andere zaken was dan de kliniek. Dank voor alle kopjes koffie,

geïnteresseerde praatjes en het aanhoren van mijn geklaag. Zonder jullie geen

gynaecologie/verloskunde!

Beste stafleden en junior staf van het Maastricht Universitair Medisch Centrum.

Sorry voor mijn ‘ja maar…’, dank voor alle interesse in mijn vorderingen, en de

plezierige sfeer op de werkvloer. Beste Janneke, het moment dat we per ongeluk

ontdekten dat we zo ongeveer de helft van elkaars boekje gereviewed hadden

blijft hilarisch!

Harold Verhoeve en Kimiko Broeze, mijn tuba-collega’s. Een tuba kan in Bes, C of Es

gestemd staan, maar voor allen geldt dat je veel adem moet hebben. Dat hebben

we wel gehad volgens mij, een lange adem. Dank voor jullie hulp bij het tuba

project.

Dr. L.G.M. Mulders, mijn mentor. Beste Leon, vanaf de zijlijn heb je mijn vorderingen

steeds in de gaten gehouden en me achter de broek gezeten. Dank voor je

relativeringsvermogen, je ‘draadje van Coppus’, en je soms heerlijk ontwapende

aanpak als pragmatisch gynaecoloog. We lijken soms verdacht veel op elkaar.

Beste staf voortplantingsgeneeskunde, beste staf Klinische Epidemiologie, beste

collega onderzoekers uit het AMC. Hoewel vreemde eend in de bijt, heb ik me altijd

ontzettend in jullie clubjes opgenomen gevoeld. Bij besprekingen, congressen in

binnen- en buitenland werd ik direct in de AMC club opgenomen. Ontzettend

veel dank daarvoor, het heeft me enorm goed gedaan. Wat wel betekent dat ik nu

op nationale congressen vaak tijd tekort kom om iedereen te spreken.

Lieve vrienden en vriendinnen, dank voor alle gezellige feesten, borrels en

stapavonden, waardoor de geplande zondagochtend werken vaak alweer

verloren ging. Hoewel de meesten van jullie weinig aanstalten maken om zelf te

promoveren, zijn jullie erg goed in het wegnemen van alle promotiestress! Beste

medemuzikanten, ik probeer weer wat vaker op de repetities te zijn!

Lieve familie en schoonfamilie, het is zo fijn om samen met jullie te zijn!

Mijn paranimfen, mr. Michiel Dankers en mr. drs. Jerom Coppus. Dank voor al jullie

23138 Coppus.indd 182 28-09-12 14:55

Dankwoord

183

11

Dankwoord

relativerende opmerkingen en onuitputtelijke voorraad consumptiebonnen. Ook

al hebben jullie geen kaas gegeten van geneeskunde, op juridisch vlak zijn jullie

een kei. Wat na genoeg erosieve veranderingen dan weer dicht bij grind en zand

komt….

Lonneke en Léonieke (mét streepje!), lieve tweelingzusjes. Wat gezellig dat jullie

gewoon binnen komen vallen voor een glas sinas. En wat fijn dat ik jullie ‘broerke’

mag zijn!

Jerom, beste broer, als spreken pudding zou zijn, was jij al lang gepromoveerd tot

doctor Oetker.

Beste Sjaak, dank voor je fantastische dochter. Dank voor je nuchtere kijk op het

leven. Wat is het ontzettend jammer dat Hanneke dit niet meer mee mag maken.

Lieve mama, lieve papa, zonder jullie had ik dit alles niet gekund. Als je het talent

hebt gekregen moet je het ook gebruiken was jullie motto. Dat heb ik geprobeerd.

Ontzettend bedankt voor het altijd creëren van de mogelijkheden hiervoor. Pap,

ook al heb je dat niet gewild, je was een behoorlijk vertragende factor bij het

afronden van dit proefschrift… Wat zou je ontzettend trots geweest zou zijn. Ik

ben trots dat ik 35 jaar na dato in jouw voetsporen treedt.

Lieve Elke. Wat ben jij een kanjer! Punt.

Yfke en Fiene, kleine droppies, papa gaat voorlezen!

23138 Coppus.indd 183 28-09-12 14:55

23138 Coppus.indd 184 28-09-12 14:55

11About the author

23138 Coppus.indd 185 28-09-12 14:55

Chapter 11

186

23138 Coppus.indd 186 28-09-12 14:55

187

11

About the author

Sjors Coppus werd geboren op 17 januari 1979 te Deurne. In 1997 behaalde hij

zijn gymnasiumdiploma aan het Sint Willibrord Gymnasium te Deurne. Hetzelfde

jaar startte hij met de opleiding Geneeskunde aan de Universiteit Maastricht. Na

het cumlaude afsluiten van de doctoraalfase behaalde hij zijn artsexamen in

november 2003. Hierna werkte hij als ANIOS gynaecologie/obstetrie in het

Máxima Medisch Centrum te Veldhoven, waarbij hij in een latere fase als parttime

onderzoeker werd aangesteld in het AMC te Amsterdam. In 2007 werd een jaar

doorgebracht op de afdeling Klinische Epidemiologie van het AMC te

Amsterdam, in welke periode hij betrokken was bij het EU-EBM project, een

internationaal project naar uniformering van evidence based medicine onderwijs

binnen Europa. In 2008 startte hij met de opleiding tot gynaecoloog in het

Máxima Medisch Centrum (opleiders prof. dr. S.G. Oei en dr. M.Y. Bongers). In

datzelfde jaar verwierf hij een ZonMW AGIKO stipendium. Het academische deel

van de opleiding werd in 2009 en 2010 doorgebracht in het Maastricht

Universitair Medisch Centrum (opleiders prof. dr. G.G. Essed, prof. dr. R.F.

Kruitwagen en dr. G.A. Dunselman). In 2012 keerde hij terug naar het Máxima

Medisch Centrum voor het laatste jaar van de opleiding (opleiders dr. M.Y.

Bongers, dr. C.A. Koks en dr. J.W. Maas). Sjors Coppus is getrouwd met Elke

Verhees. In 2012 werden zij de trotse ouders van een dichoriale tweeling, Yfke en

Fiene. Per 1 januari 2013 is hij aangesteld als staflid gynaecologie in het UMC St.

Radboud te Nijmegen met aandachtsgebied endoscopische chirurgie.

23138 Coppus.indd 187 28-09-12 14:55

23138 Coppus.indd 188 28-09-12 14:55

23138 Coppus.indd 189 28-09-12 14:55

23138 Coppus.indd 190 28-09-12 14:55

23138 Coppus.indd 191 28-09-12 14:55

23138 Coppus.indd 192 28-09-12 14:55

Diagnostic and prognosticaspects of tubal patency testing

Sjors Coppus

Diagnostic and prognostic aspects of tubal patency testing Sjors Coppus

23138 Coppus_Omsl 01-10-12 14:16 Pagina 1

Diagnostic and prognostic aspects of tubal patency testingChapter 2 Quality of reporting of test...

Documents

Transcript of Diagnostic and prognostic aspects of tubal patency testingChapter 2 Quality of reporting of test...