mammo_tomo
-
Upload
anand-balaji -
Category
Documents
-
view
12 -
download
0
Transcript of mammo_tomo
Computerized Medical Imaging and Graphics 31 (2007) 224–235
Current status and future directions of computer-aideddiagnosis in mammographyq
Robert M. Nishikawa ∗
Carl J. Vyborny Translational Laboratory for Breast Imaging Research, Department of Radiology and Committee on Medical Physics,
The University of Chicago, 5841 S. Maryland Avenue, MC2026, Chicago, IL 606371463, United States
Abstract
The concept of computer-aided detection (CADe) was introduced more than 50 years ago; however, only in the last 20 years there have been
serious and successful attempts at developing CADe for mammography. CADe schemes have high sensitivity, but poor specificity compared to
radiologists. CADe has been shown to help radiologists find more cancers both in observer studies and in clinical evaluations. Clinically, CADe
increases the number of cancers detected by approximately 10%, which is comparable to double reading by two radiologists.
© 2007 Elsevier Ltd. All rights reserved.
Keywords: Breast cancer; Mammography; Computer-aided diagnosis; Computer-aided detection; Observer studies; Clinical evaluation; ROC analysis
1. Introduction
Breast cancer is a major killer of women in the United States
and in many other parts of the world. Each year approximately
41,000 women die from breast cancer in the United States, and
213,000 women are diagnosed with breast cancer [1]. Screening
of asymptomatic women by mammography has lead to a reduc-
tion in breast cancer mortality. Several randomized, controlled
screening studies have shown an overall decrease in breast can-
cer mortality of up to 30% [2–4]. Further, using mathematical
modeling, Berry et al. have shown that the recent decrease in
breast cancer mortality in the United States has been due equally
to screening with mammography and to better treatment [5].
The detection and diagnosis of breast cancer with mammog-
raphy are composed of two steps. The first is asymptomatic
screening, where suspicious areas in a mammogram are
identified. The second is diagnostic mammography, where
symptomatic women with an abnormal mammogram or some
physical or clinical abnormality (e.g., a palpable lump) receive
special view mammograms (e.g., magnification views or spot
compression views) and possibly ultrasound andMRI. The goal
q Financial disclosure: Robert M. Nishikawa has a research agreement with
Eastman Kodak Company and he is a shareholder in Hologic Inc. Both he and
the University of Chicago receive research funding and royalties from Hologic,
Inc.∗ Tel.: +1 773 702 9047; fax: +1 773 702 0371.
Email address: [email protected].
of obtaining a diagnostic mammogram is to determine whether
a woman should have a biopsy.
1.1. Screening mammography
Mammography, although effective as a screening tool, has
limitations. On a screeningmammogram, cancers can bemissed
(false-negative mammogram), and non-cancerous lesions can
be mistaken as cancer, leading to a false-positive mammogram.
Depending on how the true cancer status of a woman is deter-
mined, the miss rate in mammography can be nearly 50% [6].
Retrospective analyses of missed cancers [7–13] indicated that
approximately 60% are visible in retrospect, although in some
cases the cancer may be very subtle [12]. These studies also
show that approximately 30% of cancers are not visible in retro-
spect. In many of these cases, the reason for the cancer not being
visible is that there is normal tissue above and below the cancer
that camouflages the cancer. This is because a mammogram is
a 2D image of the 3D breast, so that the superposition of tissue
can hide cancers.
The superposition of tissues can also produce patterns in
the mammogram that look suspicious to a radiologist. As a
result, between 5 and 15% of screening mammograms are read
as abnormal [14], even though the prevalence of cancer in the
screening population is typically 0.5%.
For addressing the superposition problem, two new 3DX-ray
imaging techniques for the breast are beingdeveloped: computed
tomography [15–17] (CT) and digital breast tomosynthesis [18]
0895-6111/$ – see front matter © 2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compmedimag.2007.02.009
R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 225
(DBT). These techniques produce slices, typically 1mm or less
in thickness, that can be stacked to produce a 3D image of the
breast. Compared to CT, which can have isotropic resolution,
DBT has superior resolution within a slice, but much poorer
resolution in the direction perpendicular to the slice. Whereas,
CT collects images to cover at least a complete 360◦ angle,
DBT collects images over only 60◦ or less, leading to a loss of
spatial resolution in the direction perpendicular to the detector.
One drawback of both of these techniques is that there are many
images that a radiologist must review. For example, in DBT
there can be as many as 80 slices, with each slice having the
same information content as a standard mammogram. In this
situation, CADe may be useful in helping radiologists to handle
the large amount of data [19,20].
1.2. Diagnostic mammography
When a suspicious lesion is found on a screening mam-
mogram, or the patient has some physical symptoms (e.g., a
palpable lump), diagnostic mammography is performed. On
diagnostic mammograms, benign lesions are often difficult to
distinguish from cancers, and thus, a cancer can be misinter-
preted as a benign lesion. Clinically, differentiating benign from
malignant lesions is a difficult task. In the USA, the positive-
predictive value (PPV) for diagnostic breast imaging is generally
less than 50%. The PPV measures the percentage of all breast
biopsies that are positive for cancer. Using data from the Breast
Cancer Surveillance Consortium, Barlow et al. determined that
the PPV based on 41,427 diagnostic mammograms was 21.8%
[21]. Elmore et al. examining the results from eight large mam-
mography registries (containing the follow up information on
more than 300,000 screening mammograms), found that the
PPV ranged from 16.9 to 51.8%, with a median value of 27.5%
[22]. Thus, approximately three biopsies of benign lesions are
performed for every biopsy of a malignant lesion. Unneces-
sary biopsies are both physically and emotionally traumatic for
the patient, they are costly to the health care system, and add
unnecessarily to the workload of radiologists, pathologists, and
surgeons. Improving radiologists’ PPV can have a substantial
positive effect on patient care and on the healthcare system.
In addition, the interpretation of a mammogram are inher-
ently variable because the mammograms are read by human
beings. There is both inter- and intra-variability among radi-
ologists [23,24]. Furthermore, there are substantial differences
between the performance of radiologists in Europe and of those
in North America [14,22].
Computer-aided diagnosis (CAD) is being developed to
address some of the limitations of mammography. Two differ-
ent types of CAD systems are being developed: computer-aided
detection (CADe) can be used to help radiologists find breast
cancer on screening mammograms, and computer-aided diag-
nosis (CADx) can be used to help radiologists decide whether
a known lesion is benign or malignant on diagnostic mammo-
grams. It should be noted that here CAD refers to the whole field
and comprises both CADe and CADx. There is good evidence
that CADx systems may be useful for improving radiologists’
PPV [25–28]. Nevertheless, in this paper, I will discuss only
CADe, giving a description of the current status and possible
future directions. I will start with a brief description of the
historical development of CAD in mammography.
2. Historical development
As early as 1955, Lee Lusted talked about automated diag-
nosis of radiographs by computers. In 1967, Fred Winsberg et
al. published a paper in radiology describing a CADx system in
which the computer determined whether a lesion on a mammo-
gram was malignant or benign [29]. By today’s standards, the
film digitization, computer power, and computer vision tech-
niques at that time were very crude, and Winsberg’s method
was not successful. During the next few years, therewere several
unsuccessful attempts at automating both detection and diagno-
sis. Through most of the late seventies to the mid eighties, there
was a period of inactivity, at least as reflected in publications.
In the mid-eighties at the University of Chicago, Doi, Chan,
Giger, MacMahon, and Vyborny started to investigate the con-
cept called computer-aided diagnosis, which is different from
the automated diagnosis of many earlier attempts. Their goal
was not to replace radiologists, but to develop systems that may
help radiologists render better clinical decisions.Abreakthrough
came with two studies. In the first, Getty et al. showed that a
CADx system, the input for which is as a checklist that a radiol-
ogist used to characterize the features of a lesion, could improve
radiologists’ ability to predict whether a lesion was benign or
malignant [25]. Theirs was not an automated system. The sec-
ond was an observer study conducted by Chan et al. in which
15 radiologists read 60 mammograms, half of them containing a
cluster of microcalcifications [30]. They showed that, by using a
computer-aided detection scheme, which was completely auto-
mated, radiologists could find additional calcification clusters in
a mammogram.
These two studies opened the field to many new investigators
and approaches for developing CADe and CADx algorithms.
This has led to several observer studies that have shown the
potential for computer-aided diagnosis to help radiologists not
only inmammography, but in chest radiography and thoracic CT
as well [31–35]. In 1998, the first commercial system received
FDA approval. Another important milestone in terms of clin-
ical implementation was approval for reimbursement in 2000
by Medicare and other health care payers. A timeline of these
developments is shown in Fig. 1.
2.1. Computeraided detection (CADe) algorithms
Many different techniques are used for developing a CADe
scheme. Various techniques have been summarized in several
review papers [36–41]. In addition, much of the CADe research
has been presented at three main conferences, all of which have
conference proceedings: SPIE Medical Imaging, Computer-
Assisted Radiology and Surgery (CARS), and the International
Workshop on Digital Mammography.
A digital image is the starting point for all techniques,
although an optical computing method was proposed more than
10 years ago. The digital image may come from a full-field dig-
226 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235
Fig. 1. Timeline of CAD development.
ital mammography (FFDM) system, or it may be obtained by
digitizing of a screen-film mammogram. An FFDM image will
have properties that differ from those of a digitized screen-film
mammogram (dSFM) in terms of response to X-ray exposure
(which is linear), contrast, spatial resolution, and noise. These
differences, which are discussed below, are important when
CAD algorithms are designed.
2.1.1. Linearity
An FFDM image either has a log relationship or is linearly
related to the exposure to the X-ray detector. A dSFM image has
a sigmoidal relationship to the exposure to the X-ray detector,
even though film digitizers are inherently linear. Screen-film
(SF) systems are relatively insensitive at low X-ray exposures
and saturate at high exposures, as shown in Fig. 2. A curve of
pixel value versus log exposure or versus exposure to the detector
is called a characteristic curve.
2.1.2. Contrast
Because of the non-linear response of the SF system, contrast
is reduced at high and low exposures. The slope of the character-
istic curve shown in Fig. 2 is proportional to the contrast in the
image. For FFDM images that are linear, the inherent contrast
of the system is constant at all exposures.
2.1.3. Spatial resolution
The spatial resolution of a digital image is dependent on two
factors: the inherent resolution of the X-ray detector and the
size of the pixels in the image. SF systems have higher spatial
resolution than FFDM systems do. However, once an image is
digitized, the resolution difference can disappear. FFDM sys-
tems have pixel sizes between 0.05 and 0.1mm. Commercial
CADe systems use 0.05mm pixels; however, in many of the
systems reported in the literature, 0.1mm pixels are used. For
detection of clustered microcalcifications, the pixel size of the
image will affect the performance. Chan et al. showed that, as
the pixel size decreased from 0.105mm down to 0.035mm, the
performance of their CADe scheme improved [42]. For detec-
tion of masses, pixel size is less important, because masses
are typically 5mm or larger in diameter. Therefore pixels are
usually reduced in size to approximately 0.4mm. This reduces
the memory requirements and allows for reduced computation
time.
2.1.4. Noise
In FFDM, the image noise is proportional to the square root of
the X-ray exposure to the detector. At low exposures, however,
the electronic noise of the detector can be significant. This is
true for a linear system. If the FFDM records the log of the
measured exposure, then the noise is proportional to the inverse
of the square root of the X-ray exposure to the detector. In an
SF system, the noise is proportional to the inverse of the square
root of the detector exposure, but is modified by the slope of
the characteristic curve (shown in Fig. 2), so that it is decreased
at both high and low exposures. In addition, the film digitizer
adds noise to the digitized image, principally at high exposures.
The film is dark at high exposures, so that the amount of light
transmitted through the film is low. As a result, the electronic
noise of the filmdigitizer becomes significant, and the total noise
in the image increases.
Fig. 2. Characteristic curve for a full-field digital mammography system (left) and a digitized screen-film system (right).
R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 227
Fig. 3. Flowchart of a generic CADe scheme.
Two general approaches are used for the automated detec-
tion of cancer on mammograms. The first approach is to apply
statistical classifiers, such as artificial neural networks [43] and
support vector machines [44,45], directly to the image data. The
image is divided into small regions of interest, typically 32× 32
pixels. This produces approximately 50,000 non-overlapping
ROIs per 100-mmpixel image. Therefore, for reducing the num-
ber of false ROIs down to even five per image, the classifier must
be able to eliminate 99.99% of the false ROIs without appre-
ciably eliminating ROIs containing malignant lesions. This is
extremely difficult to achieve and this approach to automated
detection has not yet been successful.
The second approach is outlined in Fig. 3. After a digital
mammogram is obtained, potential signals are identified. This
is usually accomplished by transforming of the image by use of
linear filters, morphologic operators, wavelets, and other means.
Next, thresholding is applied. The goal is to identify as many
true signals as possible without an excessive number of false
signals. For detection of calcification, the ratio of false to true
signals can be 100:1 or higher.
Once the signals have been identified, they are segmented
from the image.Many different techniques have been developed.
Most rely on thresholding of the image either in the transformed
space or in the acquired pixel value space. More sophisticated
methods have also been developed, such as a Markov ran-
dom field model [46]. In this approach, pixels in the image are
modeled as belonging to one of four classes: background, cal-
cification, lines/edge, and film emulsion errors. Three different
features are used in the model: local contrast at two different
spatial resolutions and the output of a line/edge detector.
Once signals have been segmented, features of the signals
are extracted and used in statistical classifiers to distinguish true
from false signals. Many different types of classifiers have been
Fig. 4. An illustration of the utility of FROC curves. Two points given by the cir-
cle and triangle represent the performance of two hypothetical CADe schemes.
If the two points lie on the same FROC curve (broken line), the two CADe
schemes have the same performance. If the two points lie on different curves
(solid lines), the curve closer to the upper left corner of the graph has the best
performance.
employed. A partial list includes simple thresholds [47], artifi-
cial neural networks [48], nearest neighbor methods [49], Fuzzy
logic [50], linear discriminant analysis [51], quadratic classifier
[52], Bayesian classifier [53], genetic algorithms [54], multi-
objective genetic algorithms [55], and support vector machines
[44,45].
2.2. Evaluation of CADe schemes
CADe schemes are typically evaluated by use of free-
response receiver operating characteristic (FROC) curves (see
Fig. 4). These are plots of sensitivity versus the average number
of false detections per image. Sensitivity is calculated in two
ways. The first method is calculation by case. A case consists of
two views of each breast, or fourmammograms.Here, if a cancer
is detected by the computer in at least one view, it is considered
detected. The secondmethod is calculation by image. Here, sen-
sitivity is calculated based on each image; that is, if the computer
detects a cancer in only one of two views, the sensitivity is only
50%. The sensitivity by case is almost always higher than the
sensitivity by image. Sensitivity by case is often reported when
CADe is evaluated clinically, because it is assumed that, if the
computer detects a cancer in at least one view, the radiologist
will be able to locate the cancer in the other view, if neces-
sary. However, there is evidence that, if an overlooked cancer is
detected only in one view by the computer, it is likely that the
radiologist will not recognize the correct computer prompt [56].
FROC curves are more useful than just measuring a single
sensitivity and false-detection rate, which is essentially a sin-
gle point on the FROC curve. If one is comparing two different
CADe schemes, and one has a sensitivity of 80% with 0.1 false
detection per image and the other has 85% sensitivity with 0.5
false detections per image, it is not clear which system is better
(see Fig. 4). The two points could belong to the same FROC
curve, in which case they have the same performance. That is,
by “tuning” a CADe scheme, it is possible to obtain any sensitiv-
228 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235
Fig. 5. Effect of CADe scoring criteria on measured performance by use of
FROC curves. The circle method scores a true positive if two computer-detected
signals arewithin the circle with smallest diameter that encloses all actualmicro-
calcifications. The centroid method scores a true positive if the centroid of the
computer-detected cluster is within 6mm of the centroid of the actual cluster
and there are at least two actual microcalcifications detected by the computer.
The bounding box method scores a true positive as follows. The smallest box
that completely encloses all actual microcalcifications is drawn, and then the
smallest box that completely encloses all computer-detected signals is drawn.
The computer-detected cluster is scored as a true positive if any of the following
conditions are true: (i) the detected-cluster bounding box is entirely within the
truth bounding box; or (ii) the truth bounding box is completely in the computer-
detected bounding box and the area of the bounding box is no larger than twice
the cluster bounding box; (iii) the center of the truth bounding box lies in the
computer-detected bounding box and the center of the computer-detected bound-
ing box lies within the truth bounding box, and the area of the bounding box is
no larger than twice the cluster bounding box.
ity/false detection rate on the curve. Alternatively, the two points
could belong to different curves, in which case the scheme with
the higher curve has a better performance. For comparing two
FROC curves, a statistical technique called JAFROC (jackknife
FROC) analysis can be used [57]. Free software for performing
such an analysis is available at http://www.devchakraborty.com/.
Comparing published results of different CADe schemes is
problematic. Differences in the criteria used for scoring whether
the computer detected a cancer, the database used for the evalua-
tion, and differences in theway theCADe schemewas trained all
can affect themeasured performance. Fig. 5 shows themeasured
FROC curves for one CADe scheme evaluated by use of various
scoring criteria, all ofwhich have been used in published studies.
Fig. 6 shows the effect of the database on the measured FROC
curve. As expected, the easiest cases produce the highest per-
formance. Finally, bias can arise in the measured performance
depending on how the algorithm is trained and tested. If the
same cases are used for training and testing, there is a positive
bias, which can be very large. To avoid this, researchers often use
either bootstrapping or some type of jackknifing to train and test.
Recent studies show that the bootstrap method has advantages
over jackknifing method using cross-correlation [58]. However,
it has been shown that, if the same cases are used for select-
ing features and for training a classifier by use of those selected
features, there again will be a positive bias.
Whereas, an FROC curve characterizes the performance of
a CADe scheme, when it is used clinically, a single operating
Fig. 6. Effect of database onperformance ofCADealgorithms.The performance
of the computer detection scheme was tested with three different databases:
“easy”, “altered-easy”, and “difficult”. Each is a subset of 50 pairs of mammo-
grams from a larger database of 90 pairs. Whereas, the “easy” and “difficult”
databases have only 10 pairs of images in common, the “easy” and “altered-easy”
databases are identical except for 10 pairs.
point on the curve is chosen. There is no accepted method for
choosing the operating point. The choice is generally based on
the perceived tradeoff between sensitivity and the false-detection
rate. Usually, the operating point is chosen to give the highest
clinically acceptable false-detection rate. Because the highest
clinically acceptable false- detection rate is not known, there can
be differences in the selection of the operating point depending
on who is making the choice.
In general, clinicalCADesystemshavehigh sensitivity.Com-
mercial systems have reported sensitivities of 98% for clustered
microcalcifications and 85% for masses, which are comparable
or exceeding the sensitivity of most radiologists. Therefore, it is
possible for CADe systems to detect cancers. The difficulty is to
achieve this at a low false-detection rate. False detections reduce
radiologists’ productivity because the radiologists must spend
time reviewing all computer detections. If the false-detection
rate is high – greater than approximately one per image for an
exam with four images – then a radiologist who is trying to
read as efficiently as possible may choose to ignore all of the
computer prompts rather than spend the time to review multi-
ple prompts that are all most likely to be false—in screening
the cancer prevalence is typically 0.5%. Radiologists typically
recall between 5 and 10% of women screened, whichmeans that
there are approximately 0.012–0.05 false detections per image.
Current CADe schemes have between 0.1 and 0.5 false detec-
tions per image, an order of magnitude higher than those of
radiologists.
In the detection of clustered microcalcifications, most false
detections are caused by benign calcifications, most often cal-
cified vessels (see Fig. 7). It can be difficult to distinguish
calcified vessels from malignant calcifications that appear in
a linear pattern. In the detection of masses, most false detec-
tions are caused by superposition of tissue and benign lesions,
the same causes of false detections by radiologists. However,
even though the causes are the same, the same false detections
are not found in each image by the computer and the radiolo-
R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 229
Fig. 7. A mammogram with calcified vessels, which lead to multiple false
detections by the computer.
gist. Thus, if the radiologist cannot reliably distinguish computer
false detections from computer prompted cancers, CADe could
preferentially increase the radiologist’s recall rate (the fraction
of women considered to have an abnormal screening mam-
mogram). The difficulty for radiologists to distinguish actual
cancers from false lesions implies that actual cancers and false
lesions appear similar visually. Therefore, it is also difficult for
the computer accurately to separate cancers from false detec-
tions. As a result, the false-detection rate for mass detection is
higher than that for detection of calcifications.
2.3. Clinical effectiveness of CADe
The goal of CADe is not the detection of cancer. The goal
is to help radiologists avoid overlooking a cancer that is visible
in a mammogram. Thus, whereas a high CADe performance is
good, in theory it is neither a necessary nor a sufficient condition
for CADe to be successful clinically. Therefore, it is possible for
a CADe scheme to have a sensitivity of less than 50% and still be
a useful aid. The computer, in theory, only needs to prompt those
cancers that the radiologist missed, because the radiologist gains
no advantage from the computer prompting cancers that he or
she has already detected. However, from a practical viewpoint,
if the computer misses too many cancers that the radiologist has
detected, the radiologist will lose confidence in the ability of the
computer to detect cancers and CADe will not be an effective
aid.
The two necessary conditions for CADe to be successful are:
(1) The computer is able to detect cancers that the radiologist
misses.
(2) The radiologistmust be able to recognizewhen the computer
has detected a missed cancer.
There is good evidence in the literature that CADe can
detect clinically missed cancers. Several studies have shown
that between 50 and 77% of missed cancers can be detected by
CADe. In these studies, previous mammograms from women
with a screen-detected cancer are reviewed for signs that a can-
cer was visible in an earlier mammogram. These missed cancers
are collected and subjected to CADe.Whereas detecting 77% of
the missed cancers is a large fraction of the misses, not all com-
puter prompts are “actionable”. In women with mammograms
that appear “lumpy”, there are many areas that resemble cancer.
A small and subtle cancer cannot be reliably detected by a radi-
ologist in the presence of multiple similar lumps. Therefore, a
computer prompt in this situation is unlikely to prevent a radiol-
ogist from missing a cancer. Thus, it is important to determine
radiologists’ ability to distinguish computer prompts for cancer
from computer prompts for false lesions.
Four observer studies have been performed for measuring
the benefits of CADe. The first two studies showed a statistically
significant improvement in radiologists’ performancewhen they
used CADe [30,59–61]. These were small studies and were con-
ducted in such a manner as to produce a bias in favor of using
CADe. In the study by Chan et al., a time limit was given to
reading the images and radiologists were shown only a single
image, instead the four images that are standard in most screen-
ing exams. Nevertheless, this is the seminal paper in the field,
and it launched renewed interest in computer analysis of mam-
mograms [30]. The study by Kegelmeyer et al. looked only at
spiculated lesions, and the CADe scheme had 100% sensitiv-
ity [61]. The two more recent studies were much larger than
the first. The study by Taylor et al. did not show a statistically
significant improvement in radiologists’ performance as mea-
sured in terms of sensitivity and specificity [59]. The sensitivity
increased from 0.78 to 0.81 with 95% CI for a difference of
[−0.003, 0.064], and the specificity increased from 0.86 to 0.87
with 95% CI for a difference of [−0.003, 0.034]. These values
are close to significant and one can speculate if data were col-
lected to allow an ROC analysis to be performed, whether there
would be a statistically significant increase in the area under the
ROC curves, since ROC experiments have substantially higher
statistical power than sensitivity and specificity calculations.
The fourth observer study was performed by Gilbert et al.,
called the CADET study (computer aided detection evaluation
trial) [60]. This was a very large study (10,267 cases containing
236 cancers)with eight readers. They obtained several important
results. First, radiologists need to be trained with a large number
230 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235
of cases, at least 400 in their study, before radiologists use CADe
consistently [62]. That is, their recall rate when CADe was used
decreased as the training increased up to 400 cases. They were
able to show that, compared to double reading by two radiolo-
gists, when using CADe radiologists detected 49.1% of cancer
cases, whereas only 42.6% were found by double reading. The
sensitivity was low in this study because, in addition to the 236
cancers detected in the time frame from which the cases were
collected, there were an additional 85 patients who developed
breast cancer after the study period.
2.4. Clinical requirements
There is good evidence that CADe can detect cancers missed
by radiologists and that radiologists can use CADe to find more
cancers, at least as noted in observer studies. These results can
be extrapolated to clinical effectiveness, but there are limita-
tions. In observer studies, goal is to simulate clinical reading
conditions, and yet there are differences. The greatest differ-
ence is that, in an observer study, the radiologist’s interpretation
has no effect on patient management. Thus, radiologists are not
under the same clinical pressure in an observer study. Also, the
cancer prevalence is usually significantly higher in an observer
study, typically 25–50% compared to 0.5% in clinical practice.
Therefore, clinical studies must be performed for assessment of
the clinical effectiveness of CADe. Seven such studies have been
published, these are summarized in Table 1. Overall, the average
increase in cancers detected when using CADe is approximately
10%. This is comparable to the increase in the cancer detection
rate from double reading by two radiologists [63–65].
The first published clinical evaluation of CADe was done
by Freer and Ulissey [66]. They found a 19.6% increase in the
number of cancers detected when CADe was used. The second
published clinical evaluation was performed by Gur et al. [67]
Who found that the cancer detection rate increased only from
3.49 to 3.55 per 1000 women screened. Feig et al. did a reanaly-
sis of the Gur data and found that the low-volume readers had a
19.7% increase in the cancer detection rate, but the high volume
readers had a 3.2% decrease [68]. These two studies are impor-
tant because they represent the two different methods used for
measuring the effectiveness of CADe in screening mammogra-
phy. The Freer study was a cross-sectional study, that is, data
is collected sequentially. The radiologist first reads the image
without CADe and renders an opinion. He or she then examines
the computer results and renders a new opinion if necessary.
As a result, the effectiveness of CADe is determined patient by
patient, and the number of extra cancers detected because CADe
was used can be computed. The Gur study, on the other hand,
was a longitudinal study, that is, historical or temporal compar-
isons were made. The cancer detection rate can be compared
between two time periods, one before CADe was implemented
clinically and the other after CADe was implemented. In this
method, the effectiveness of CADe is determined by the change
in the cancer detection rate.
Overall, there is a large range inmeasured increase in cancers
detected by use of CADe in part for two reasons. The first is
that two different methodologies were used for measuring the
clinical effectiveness of CADe. The second is that, although a
large number ofwomenmayhave been screened in a given study,
the number of cancers in the population is small, typically 5 per
1000 women screened, and therefore, the statistical uncertainty
in the cancer detection is large, large enough to account for the
apparent variation.
To examine these two effects, I previously developed aMonte
Carlo simulation of CADe in screening mammography [69].
The flowchart for the simulation model is shown in Fig. 8. Can-
cers are assumed to grow exponentially, with a volume doubling
time of 157 days. Once the cancer is greater than the detection
threshold, assumed to be 0.5 cm, it can be detected in one of
threeways. First, can be detected by non-mammographicmeans,
such as palpation; these are considered to be interval cancers,
which are assumed to occur in 15% of cancers. Second, it can
be detected by the radiologist without the help of CADe. These
are assumed to constitute 85% of non-interval cancers. Third,
it can be detected by the radiologist with the help of CADe.
These are assumed to include 75% of the cancers missed by the
radiologist.
Two different conditions were simulated. The first was an
idealized situation in which all cancers grow at the same rate.
There were 125 women who developed cancer each year. We
Table 1
Summary of seven clinical studies of CADe
Study Total number screened Number of cancers detected %Change
Unaided Aided Unaided Aided
Longitudinal studies
Gur et al. [67] 56,432 59,139 197 210 1.7
Feig et al. (high volume) [68] 44,629 37,500 161 131 −3.2
Feig et al. (low volume) [68] 11,803 21,639 36 79 19.7
Cupples et al. [70] 7872 19,402 29 83 16.1
Cross-sectional studies
Freer and Ulissey [66] N/A 12,860 41 49 19.5
Birdwell et al. [73] N/A 8,692 27 29 7.4
Helvie et al. [74] N/A 2,389 10 11 10.0
Khoo et al. [75] N/A 6,111 116 118 1.7
R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 231
Fig. 8. Flowchart of simulation model.
then repeated the simulation 100 times and averaged the results.
This produces outcomes with very little statistical variation. The
secondwas amore realistic situation inwhich 125women devel-
oped cancer each year, there was a log-normal distribution of
growth rates with a median of 157 and a standard deviation
of 90 days, and there was only a single run (i.e., no averaging
over multiple repeated simulations). When the growth rate is
the same for each cancer, there is the same number of detectable
cancers in the patient population each year. With a spectrum of
growth rates, there is a large variation in the number of detectable
cancers present each year. As a result, this leads to a large vari-
ability in the number of screening-detected cancers from year
to year. This can be seen by comparing Fig. 9a and b. If the
cross-sectional method were used for assessment of the benefits
of CADe, the result would depend upon the year the data were
collected, because there is variation in the lower curve of Fig. 9a.
If we were to use the longitudinal method, it is likely that we
would measure only a very small change in the cancer detection
rate, because as shown in Fig. 9b, the actual difference is very
small. If we were to repeat the realistic simulation, we would
Fig. 9. Results of simulation of CADe in screening mammography: number of
cancers detected per year in a screening population as a function of time. CADe
is introduced in year 20. (a) All cancers grow with the same doubling time,
and the curves are averaged over 100 trials. (b) A single trial result is shown,
and the cancers have a distribution of doubling times. The horizontal lines with
arrowheads indicate two time periods to compare the benefits of CADe for
historical comparison (longitudinal method) and the vertical line indicates the
difference in the radiologist’s cancer detection with and without computer aid
(cross-sectional method).
get a different result. This result can vary greatly from the one
shown because it is possible to measure a decrease in the cancer
detection rate when using the longitudinal method [69]. There-
fore, the large variation in the results of the clinical studies is
not unexpected.
These simulation results may indicate that the cross-sectional
method is a better method to use. However, both methods have
strengths and weaknesses, as listed in Table 2. Although the list
Table 2
Strengths and weaknesses of two different methods for measuring the clinical effectiveness of CADe
Type of study Cross-sectional Longitudinal
Method Sequential reading of each patient without and with CADe Temporal comparison of two groups of patients read without and with
CADe
Outcome Change in number of cancers detected Change in cancer detection rate
Strengths Straightforward to implement Possible to conduct large studies
Not subject to variations apparent in longitudinal method Not subject to biases that may be present in the cross-sectional method
Weakness Subject to potential large positive and negative biases Subject to variation in number of prevalence screens, which are difficult
to control for
Not possible to determine which type of bias is present Subject to variation in radiologists’ ability to read mammograms
Subject to variation in the number of cancers in the screening population
from year to year
CADe effect on cancer detection rate is small
232 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235
ofweaknesses for the longitudinalmethod is longer, the potential
for large positive or negative biases in the cross-sectionalmethod
makes it difficult to interpret the results from such studies. The
three sources of variation listed in Table 2 for the longitudi-
nal method imply that there can be large variations in the results
between studies (compareGur andCupples studies). Thismakes
it extremely difficult to measure the small difference in the can-
cer detection rate that actually exists. Therefore, clinically, the
longitudinal method will not be effective in measuring the ben-
efits of using CADe. As discussed in another paper, the change
in the size of the cancer may be a better endpoint [69]. A change
in cancer stage when CADe was used was found in one of the
clinical studies [70].
2.5. Improving CADe performance
There is good evidence that radiologists often ignore CADe
prompts of actual cancers. The reason for this is not well under-
stood. One possibility is that the false-detection rate is too high
and radiologists tend to dismiss computer prompts or pay less
attention when an image has many prompts. Generally, radiol-
ogists like using CADe for clustered calcifications, but are less
enthusiastic about CADe for masses. The obvious difference
is that clustered calcification algorithms have a better perfor-
mance than do those for masses. However, there are less obvious
reasons.
There is very little structure in the breast that can mimic
clustered calcifications. Therefore, false calcification prompts
are either due to benign calcifications or obviously not to cal-
cifications. Radiologists, in general, can evaluate calcification
prompts quickly. On the other hand, the superposition of over-
lapping normal tissues can produce a pseudo-lesion in the
mammogram. This is a very common occurrence, and it is some-
times difficult for radiologists to determine whether a lesion is
real or is merely overlapping tissue. Approximately 30% of all
screening mammograms classified as abnormal are due to the
superposition of tissue. One technique that radiologists used to
determine whether an apparent lesion is real or not is to deter-
mine whether the lesion is visible in both views. If it is, then it
is highly likely to be an actual lesion. If it is not, it may or may
not be an actual lesion. Radiologists will then compare the area
containing the apparent lesions with other regions within the
mammogram. If the pattern where the apparent lesion is located
is similar to patterns elsewhere in the mammogram, then it is
likely that the apparent lesion is not real. The radiologist also
compares the current films to previous films to see whether the
apparent lesion is new or has changed over time.
CADe schemes need to use this approach to reduce the false-
detection rate. It has been shown that observers improved their
performance when using context information for classification
of false-positive and true-positive regions. That is, better dis-
crimination was achieved when radiologists looked at a whole
image rather than a small ROI around the lesion [71], and fur-
thermore, that radiologists are better than computers at this task.
There are several approaches to correlating views either
within the same examinations or between examinations done
at different times. One approach is to transform one image to
match the corresponding image taken at a previous time, or an
image of the opposite breast. Using geometry based on patient
positioning, possible match pairs of detections are determined
[72]. Then, using features of the match pairs, radiologists devel-
oped a matching pair score. This score allowed corresponding
pairs to be determined.
Another approach is to extract features of CADe detections
and compare detections from different views of the same breast
or the same view taken at different times to match the detections
between views.
2.6. CADe as a first reader
One of the most demanding and time-consuming aspects of
reading a screeningmammogram is to find clusteredmicrocalci-
fications. This is because microcalcifications can be very small,
a few tenths of a millimeter. Radiologists typically use a magni-
fying glass and carefully examine all areas of each of four film
mammograms. On a digital system, electronic zoom is used.
Given that CADe for clustered calcifications has a high sen-
sitivity, approximately 98%, as radiologist gain confidence in
the computer’s ability to find clustered calcifications, the need
to search the image with a magnifying glass may be reduced
to the point where the radiologist relies on the computer to
detect these calcifications. This would allow radiologists just to
check the computer-detected clusters of calcification and then
read the mammograms for mass lesions. This should improve
radiologists’ productivity and reduce reading fatigue.
2.7. CADe and picture archiving and communication
system (PACS)
As mammography migrates from film to digital acquisition,
it becomes important for CADe to be integrated with PACS.
Proper integration of CADe and PACS is critical for CADe to
be used as a tool to increase productivity. Digital images need
to be sent to the CADe server, where the actual CADe algo-
rithms are run; then the output of the CADe schemes needs to
be stored with the images so that they are available for review.
If the digital images are printed and viewed on light boxes, then
the method used for reviewing the CADe output with filmmam-
mography can still be used. If the digital images are viewed on
soft-copy monitors, the CADe prompts must be displayed as
an overlay on the digital mammogram. In either case, a mecha-
nism is needed for storing, transmitting, and display the CADe
marks. This can be done by use of the structure report feature
of DICOM. DICOM stands for Digital Imaging and Commu-
nications in Medicine (http://medical.nema.org/). DICOM is a
set of standards for handling, storing, printing, and transmitting
information in medical imaging. It is a global standard used by
virtually all medical imaging enterprises.
Although CAD companies have embraced DICOM structure
reports as a method for storing and retrieving information about
the output of CAD analyses, not all PACS companies currently
are able to utilize structured reports. As a result, it is not pos-
sible to store and retrieve any CAD output. Further, one of the
strong features of DICOM is its flexibility, but is also a draw-
R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 233
back in terms of integrating different systems. For example, a
woman may have a digital mammogram taken on system A,
but her pervious mammograms were taken on system B. If the
review workstation that is being used cannot display mammo-
grams from both systemA and systemB, there will be a problem
for the radiologist. This can occur even though both systems
use DICOM because, in addition to the standard features that all
systems use, each system can in addition specify added features,
and these may differ from one system to another. Even if both
images can be displayed, theymay be displayed differently (e.g.,
they may have different image-processing techniques applied to
them). This is very problematic. This is not a problem for CADe
display per se, but integration of CADe must occur in this envi-
ronment. Fortunately, a consortium of industry, radiologists, and
informaticist is developing a superset of standards for DICOM
that will allow full integration of all of the systems involved
in digital mammography (PACS, acquisition hardware, display
hardware, and CAD). This consortium is called IHE (Integrat-
ing the Healthcare Enterprise: http://www.ihe.net/). The goal of
IHE is to standardize many of the optional features of standard
DICOM. This should allow true compatibility among all of the
components necessary for digital mammography to work seam-
lessly in the clinic, allowing radiologists to work productively
and use CADe routinely.
3. Summary
The concept of computer-aided detection (CADe) was intro-
duced more than 50 years ago; however, only in the last 20 years
have there been serious and successful attempts at developing
CADe for mammography. CADe schemes have high sensitivity,
but poor specificity compared to radiologists. CADe has been
shown both in observer studies and in clinical evaluations to help
radiologists find more cancers. Recent clinical studies indicate
that CADe increases the number of cancers detected by approx-
imately 10%, which is comparable to double reading by two
radiologists. However, it is difficult to measure the clinical ben-
efits of CADe because of variability in the number of cancers
present in the screened population from year to year. Further-
more, the actual increase in the cancer detection rate is very
small, and yet a radiologist can reduce his or her missed cancer
rate by using CADe. Finally, one important goal of CADe is to
improve radiologists’ productivity. To accomplish this goal, it
is important to incorporate CADe seamlessly into the clinical
workflow. This goal can be achieved by careful integration of
CADe into the clinical PACS.
References
[1] American Cancer Society. Cancer facts and figures 2006. Atlanta, GA:
American Cancer Society; 2006.
[2] Anderson I, Aspegren K, Janzon L, et al. Mammographic screening and
mortality from breast cancer: the Malmo mammographic screening trial.
Br Med J 1988;297:943–8.
[3] ShapiroS,VenetW,StraxPH,VenetL,RoesetR.Ten to fourteen-year effect
of screening on breast cancermortality. J Natl Cancer Inst 1982;69:349–55.
[4] Tabar L, Yen MF, Vitak B, Tony Chen HH, Smith RA, Duffy SW.
Mammography service screening and mortality in breast cancer patients:
20-year follow-up before and after introduction of screening. Lancet
2003;361(9367):1405–10.
[5] Berry DA, Cronin KA, Plevritis SK, et al. Effect of screening and
adjuvant therapy on mortality from breast cancer. N Engl J Med
2005;353(17):1784–92.
[6] Pisano ED, Gatsonis C, Hendrick E, et al. Diagnostic performance of dig-
ital versus film mammography for breast-cancer screening. N Engl J Med
2005;353(17):1773–83.
[7] Andersson I. What can we learn from interval carcinomas? Recent Results
Cancer Res 1984;90:161–3.
[8] Frisell J, Eklund G, Hellstrom L, Somell A. Analysis of interval breast
carcinomas in a randomized screening trial in Stockholm. Breast Cancer
Res Treat 1987;9:219–25.
[9] Harvey JA, Fajardo LL, Innis CA. Previous mammograms in patients with
impalpable breast carcinoma: retrospective versus blinded interpretation.
Am J Roentgenol 1993;161:1167–72.
[10] Holland T, Mrvunac M, Hendriks JHCL, Bekker BV. So-called inter-
val cancers of the breast. Pathologic and radiographic analysis. Cancer
1982;49:2527–33.
[11] Ma L, Fishell E, Wright B, Hanna W, Allen S, Boyd NF. A controlled
study of the factors associated with failure to detect breast cancer by
mammography. J Natl Cancer Inst 1992;84:781–5.
[12] Martin JE, Moskowitz M, Milbrath JR. Breast cancers missed by mam-
mography. Am J Roentgenol 1979;132:737–9.
[13] Peeters PHM, Verbeek ALM, Hendriks JHCL, Holland R, Mrvunac M,
Vooijs GP. The occurrence of interval cancers in the Nijmegen screening
programme. Br J Cancer 1989;59:929–32.
[14] Smith-BindmanR,ChuPW,MigliorettiDL, et al. Comparison of Screening
Mammography in the United States and the United Kingdom. J Am Med
Assoc 2003;290:2129–37.
[15] Boone JM, Kwan AL, Yang K, Burkett GW, Lindfors KK, Nelson TR.
Computed tomography for imaging the breast. J Mammary Gland Biol
Neoplasia; 2006.
[16] Boone JM, Nelson TR, Lindfors KK, Seibert JA. Dedicated breast CT:
radiation dose and image quality evaluation. Radiology 2001;221(3):
657–67.
[17] Chen B, Ning R. Cone-beam volume CT breast imaging: feasibility study.
Med Phys 2002;29(5):755–70.
[18] Niklason LT, Christian BT, Niklason LE, et al. Digital tomosynthesis in
breast imaging. Radiology 1997;205(2):399–406.
[19] Chan HP, Wei J, Sahiner B, et al. Computer-aided detection system for
breast masses on digital tomosynthesis mammograms: preliminary experi-
ence. Radiology 2005;237(3):1075–80.
[20] Reiser I, Nishikawa RM, Giger ML, et al. Computerized mass detection
for digital breast tomosynthesis directly from the projection images. Med
Phys 2006;33(2):482–91.
[21] Barlow WE, Chi C, Carney PA, et al. Accuracy of screening mammog-
raphy interpretation by characteristics of radiologists. J Natl Cancer Inst
2004;96(24):1840–50.
[22] Elmore JG, Nakano CY, Koepsell TD, Desnick LM, D’Orsi CJ,
Ransohoff DF. International variation in screening mammography inter-
pretations in community-based programs. J Natl Cancer Inst 2003;95(18):
1384–93.
[23] Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of
screening mammograms by US radiologists. Findings from a national
sample. Arch Intern Med 1996;156(2):209–13.
[24] Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variabil-
ity in radiologists’ interpretations of mammograms. N Engl J Med
1994;331(22):1493–9.
[25] Getty DJ, Pickett RM, D’Orsi CJ, Swets JA. Enhanced interpretation of
diagnostic images. Invest Radiol 1988;23:240–52.
[26] Horsch K, Giger ML, Vyborny CJ, Lan L, Mendelson EB, Hendrick RE.
Classification of breast lesions with multimodality computer-aided diag-
nosis: observer study results on an independent clinical data set. Radiology
2006;240(2):357–68.
[27] Huo Z, Giger ML, Vyborny CJ, Metz CE. Effectiveness of computer-aided
diagnosis—observer study with independent database of mammograms.
Radiology 2002;224:560–8.
234 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235
[28] Jiang Y, Nishikawa RM, Schmidt RA,Metz CE, GigerML, Doi K. Improv-
ing breast cancer diagnosis with computer-aided diagnosis. Acad Radiol
1999;6(1):22–33.
[29] Winsberg F, Elkin M, Macy J, Bordaz V,WeymouthW. Detection of radio-
graphic abnormalities in mammograms by means of optical scanning and
computer analysis. Radiology 1967;89:211–5.
[30] Chan H-P, Doi K, Vyborny CJ, et al. Improvement in radiologists’ detec-
tion of clustered microcalcifications on mammograms: the potential of
computer-aided diagnosis. Invest Radiol 1990;25(10):1102–10.
[31] Kobayashi T, Xu X-W, MacMahon H, Metz CE, Doi K. Effect of
a computer-aided diagnosis scheme on radiologists’ performance in
detection of lung nodules on radiographs. Radiology 1996;199:843–
8.
[32] Abe K, Doi K, MacMahon H, et al. Computer-aided diagnosis in chest
radiography: analysis of results in a large clinical series. Invest Radiol
1993;28:987–93.
[33] Abe H, MacMahon H, Engelmann R, et al. Computer-aided diagnosis in
chest radiography: results of large-scale observer tests at the 1996-2001
RSNA scientific assemblies. Radiographics 2003;23(1):255–65.
[34] Abe H, Ashizawa K, Li F, et al. Artificial neural networks (ANNs) for
differential diagnosis of interstitial lung disease: results of a simulation test
with actual clinical cases. Acad Radiol 2004;11(1):29–37.
[35] Shiraishi J, Abe H, Engelmann R, Aoyama M, MacMahon H, Doi K.
Computer-aided diagnosis for distinction between benign and malignant
solitary pulmonary nodules in chest radiographs: ROC analysis of radiol-
ogists’ performance. Radiology 2003;227:469–74.
[36] Malich A, Fischer DR, Bottcher J. CAD for mammography: the
technique, results, current role and further developments. Eur Radiol
2006;16:1449–60.
[37] Karssemeijer N. Detection of masses in mammograms. In: Strickland RN,
editor. Image-processing techniques in tumor detection. New York, NY:
Marcel Dekker Inc.; 2002. p. 187–212.
[38] Nishikawa RM.Detection of microcalcifications. In: Strickland RN, editor.
Image-processing techniques in tumor detection. New York, NY: Marcel
Dekker Inc.; 2002. p. 131–53.
[39] Giger ML, Huo Z, Kupinski MA, Vyborny CJ. Computer-aided diagno-
sis in mammography. In: Sonka M, Fitzpatrick JM, editors. Handbook of
medical imaging, vol. 2. Bellingham, WA: The Society of Photo-Optical
Instrumentation Engineers; 2000. p. 915–1004.
[40] Sampat MP, Markey MK, Bovik AC. Computer-aided detection and diag-
nosis in mammography. In: Bovik AC, editor. The handbook of image
and video processing. 2nd ed. New York: Elsevier; 2005. p. 1195–
217.
[41] Nishikawa RM. Computer-assisted detection and diagnosis. Wiley; 2005.
[42] Chan H-P, Niklason LT, Ikeda DM, Lam KL, Adler DD. Digitization
requirements in mammography: effects on computer-aided detection of
microcalcifications. Med Phys 1994;21(7):1203–11.
[43] Stafford RG, Beutel J, Mickewich DJ. Application of neural networks
to computer-aided pathology detection in mammography. Proc SPIE
1993:1898.
[44] El-Naqa I, YangY,WernickMN,GalatsanosNP,NishikawaRM.A support
vector machine approach for detection of microcalcifications. IEEE Trans
Med Imag 2002;21(12):1552–63.
[45] Campanini R, Dongiovanni D, Iampieri E, et al. A novel featureless
approach tomass detection in digitalmammograms based on support vector
machines. Phys Med Biol 2004;49(6):961–75.
[46] VeldkampWJH,KarssemeijerN. Improved correction for signal dependent
noise applied to automatic detection of microcalcifications. In: Karsse-
meijer N, Thijssen M, Hendriks J, van Erning L, editors. Digital mam-
mography nijmegen 98. Amsterdam: Kluwer Academic Publishers; 1998.
p. 160–76.
[47] Chan HP, Doi K, Galhotra S, Vyborny CJ, MacMahon H, Jokich PM.
Image feature analysis and computer-aided diagnosis in digital radiography
I. Automated detection of microcalcifications in mammography. Med Phys
1987;14(4):538–48.
[48] Nagel RH, Nishikawa RM, Doi K. Analysis of methods for reducing false
positives in the automated detection of clustered microcalcifications in
mammograms. Med Phys 1998;25(8):1502–6.
[49] Davies DH, Dance DR. Automatic computer detection of clustered calci-
fications in digital mammograms. Phys Med Biol 1990;35:1111–8.
[50] Cheng H-D, Lui YM, Freimanis RI. A novel approach to microcalci-
fication detection using fuzzy logic technology. IEEE Trans Med Imag
1998;17(3):442–50.
[51] Cernadas E, Zwiggelaar R,VeldkampW, et al. Detection ofmammographic
microclcifications using a statistical model. In: Karssemeijer N, Thijssen
M, Hendriks J, van Erning L, editors. Digital mammography. Amsterdam:
Kuwester; 1998. p. 205–8.
[52] Brown S, Li R, Brandt L, Wilson L, Kossoff G, Kossoff M. Development
of a multi-feature CAD system for mammography. In: Karssemeijer N,
Thijssen M, Hendriks J, van Erning L, editors. Digital mammography.
Amsterdam: Kuwester; 1998. p. 189–96.
[53] Bankman IN, Christens-Barry WA, Kim DW, Weinberg IN, Gatewood
OB, Brody WR. Automated recognition of microcalcification clusters in
mammograms. Proc SPIE 1993;1905:731–8.
[54] Anastasio MA, Yoshida H, Nagel R, Nishikawa RM, Doi K. A genetic
algorithm-based method for optimizing the performance of a computer-
aided diagnosis scheme for detection of clustered microcalcifications in
mammograms. Med Phys 1998;25(9):1613–20.
[55] Anastasio MA, Kupinski MA, Nishikawa RM. Optimization and FROC
analysis of rule-based detection schemes using a multiobjective approach.
IEEE Trans Med Imag 1998;17(6):1089–93.
[56] Nishikawa R, Edwards A, Schmidt R, Papaioannou J, Linver M. Can radi-
ologists recognize that a computer has identified cancers that they have
overlooked? Proc SPIE 2006;6146:1–8.
[57] Chakraborty DP, Berbaum KS. Observer studies involving detec-
tion and localization: modeling, analysis, and validation. Med Phys
2004;31(8):2313–30.
[58] Efron B, Tibshirani R. Improvements on cross-validation: the.632+ boot-
strap method. J Am Stat Assoc 1997;92(438):548–60.
[59] Taylor P, Champness J, Given-Wilson R, Johnston K, Potts H. Impact
of computer-aided detection prompts on the sensitivity and specificity of
screening mammography. Health Technol Assess 2005;9(6):1–70.
[60] Gilbert FJ, Astley SM, McGee MA, et al. Single reading with computer-
aided detection and double reading of screening mammograms in
the United Kingdom National Breast Screening Program. Radiology
2006;241(1):47–53.
[61] Kegelmeyer Jr WP, Pruneda JM, Bourland PD, Hillis A, Riggs MW, Nip-
per ML. Computer-aided mammographic screening for spiculated lesions.
Radiology 1994;191(2):331–7.
[62] Astley S, Quarterman C, Al Nuaimi Y, et al. Computer-aided detection
in screening mammography: the impact of training on reader perfor-
mance. In: Pisano E, editor. Digital mammography 2004. Chapel Hill;
2004.
[63] Anderson ED, Muir BB, Walsh JS, Kirkpatrick AE. The efficacy of dou-
ble reading mammograms in breast screening. Clin Radiol 1994;49(4):
248–51.
[64] Harvey SC, Geller B, Oppenheimer RG, Pinet M, Riddell L, Garra
B. Increase in cancer detection and recall rates with independent
double interpretation of screening mammography. Am J Roentgenol
2003;180(5):1461–7.
[65] Thurfjell EL,LernevallKA,TaubeAA.Benefit of independent double read-
ing in a population-based mammography screening program. Radiology
1994;191(1):241–4.
[66] Freer TW, Ulissey MJ. Screening mammography with computer-aided
detection: prospective study of 12,860 patients in a community breast
center. Radiology 2001;220(3):781–6.
[67] Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection
and mammography recall rates after the introduction of a computer-aided
detection system. J Natl Cancer Inst 2004;96(3):185–90.
[68] Feig SA, Sickles EA, Evans WP, Linver MN. Re: changes in breast can-
cer detection and mammography recall rates after the introduction of a
computer-aided detection system. J Natl Cancer Inst 2004;96(16):1260–1,
author reply 1261.
[69] Nishikawa RM. Modeling the effect of computer-aided detection on the
sensitivity of screeningmammography. In: Astley SM, editor. Digitalmam-
mography 2006. Berlin: Springer-Verlag; 2006. p. 136–42.
R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 235
[70] Cupples TE, Cunningham JE, Reynolds JC. Impact of computer-aided
detection in a regional screening mammography program. AJR Am J
Roentgenol 2005;185(4):944–50.
[71] van Engeland S, Varela C, Timp S, Snoeren PR, Karssemeijer N. Using
context for mass detection and classification in mammograms. Proc SPIE
2006;5749:94–102.
[72] Paquerault S, Petrick N, Chan HP, Sahiner B, Helvie MA. Improvement
of computerized mass detection on mammograms: fusion of two-view
information. Med Phys 2002;29(2):238–47.
[73] Birdwell RL, Bandodkar P, Ikeda DM. Computer-aided detection with
screening mammography in a university hospital setting. Radiology
2005;236:451–7.
[74] Helvie MA, Hadjiiski L, Makariou E, et al. Sensitivity of noncommer-
cial computer-aided detection system for mammographic breast cancer
detection: pilot clinical trial. Radiology 2004;231(1):208–14.
[75] Khoo LA, Taylor P, Given-Wilson RM. Computer-aided detection in the
UnitedKingdomNational Breast Screening Programme: prospective study.
Radiology 2005;237(2):444–9.
Robert M. Nishikawa received his B.Sc. in Physics in 1981 and his M.Sc. and
Ph.D. in Medical Biophysics in 1984 and 1990, respectively, all from the Uni-
versity of Toronto. He is currently an Associate Professor in the Department of
Radiology and the Committee on Medical Physics at the University of Chicago.
He is director of the Carl J. Vyborny Translational Laboratory for Breast Imag-
ing Research. He is also a fellow of the American Association of Physicists in
Medicine (AAPM).
His research has three intertwining themes. The first is the development of
computer-aided diagnosis (CAD) techniques for x-ray imaging of the breast, in
particular for digital breast tomosynthesis and full-field digital mammography
(FFDM). The second is the evaluation of CAD, principally its clinical effective-
ness. The evaluations include Monte Carlo modeling of using computer-aided
detection in screening mammography and observer studies to understand how
effectively radiologists can use computers as aids when interpreting mammo-
grams. The third is the investigation of the performance of new breast x-ray
imaging systems. These studies include the evaluation of new clinical systems,
such as FFDMand phase contrast mammography, and the optimization of digital
breast tomosynthesis.