mammo_tomo

12
ComputerizedMedicalImagingandGraphics31(2007)224–235 Currentstatusandfuturedirectionsofcomputer-aided diagnosisinmammography RobertM.Nishikawa Carl J. Vyborny Translational Laboratory for Breast Imaging Research, Department of Radiology and Committee on Medical Physics, The University of Chicago, 5841 S. Maryland Avenue, MC-2026, Chicago, IL 60637-1463, United States Abstract Theconceptofcomputer-aideddetection(CADe)wasintroducedmorethan50yearsago;however,onlyinthelast20yearstherehavebeen serious and successful attempts at developing CADe for mammography. CADe schemes have high sensitivity, but poor specificity compared to radiologists.CADehasbeenshowntohelpradiologistsfindmorecancersbothinobserverstudiesandinclinicalevaluations.Clinically,CADe increasesthenumberofcancersdetectedbyapproximately10%,whichiscomparabletodoublereadingbytworadiologists. ©2007ElsevierLtd.Allrightsreserved. Keywords: Breastcancer;Mammography;Computer-aideddiagnosis;Computer-aideddetection;Observerstudies;Clinicalevaluation;ROCanalysis 1. Introduction BreastcancerisamajorkillerofwomenintheUnitedStates andinmanyotherpartsoftheworld.Eachyearapproximately 41,000womendiefrombreastcancerintheUnitedStates,and 213,000womenarediagnosedwithbreastcancer [1].Screening ofasymptomaticwomenbymammographyhasleadtoareduc- tion in breast cancer mortality. Several randomized, controlled screeningstudieshaveshownanoveralldecreaseinbreastcan- cer mortality of up to 30% [2–4]. Further, using mathematical modeling, Berry et al. have shown that the recent decrease in breastcancermortalityintheUnitedStateshasbeendueequally toscreeningwithmammographyandtobettertreatment [5]. Thedetectionanddiagnosisofbreastcancerwithmammog- raphy are composed of two steps. The first is asymptomatic screening, where suspicious areas in a mammogram are identified. The second is diagnostic mammography, where symptomatic women with an abnormal mammogram or some physicalorclinicalabnormality(e.g.,apalpablelump)receive special view mammograms (e.g., magnification views or spot compressionviews)andpossiblyultrasoundandMRI.Thegoal Financial disclosure: Robert M. Nishikawa has a research agreement with EastmanKodakCompanyandheisashareholderinHologicInc.Bothheand theUniversityofChicagoreceiveresearchfundingandroyaltiesfromHologic, Inc. Tel.:+17737029047;fax:+17737020371. E-mail address: [email protected]. ofobtainingadiagnosticmammogramistodeterminewhether awomanshouldhaveabiopsy. 1.1. Screening mammography Mammography, although effective as a screening tool, has limitations.Onascreeningmammogram,cancerscanbemissed (false-negative mammogram), and non-cancerous lesions can bemistakenascancer,leadingtoafalse-positivemammogram. Dependingonhowthetruecancerstatusofawomanisdeter- mined, the miss rate in mammography can be nearly 50% [6]. Retrospectiveanalysesofmissedcancers [7–13] indicated that approximately 60% are visible in retrospect, although in some cases the cancer may be very subtle [12]. These studies also showthatapproximately30%ofcancersarenotvisibleinretro- spect.Inmanyofthesecases,thereasonforthecancernotbeing visibleisthatthereisnormaltissueaboveandbelowthecancer thatcamouflagesthecancer.Thisisbecauseamammogramis a2Dimageofthe3Dbreast,sothatthesuperpositionoftissue canhidecancers. The superposition of tissues can also produce patterns in the mammogram that look suspicious to a radiologist. As a result,between5and15%ofscreeningmammogramsareread as abnormal [14], even though the prevalence of cancer in the screeningpopulationistypically0.5%. Foraddressingthesuperpositionproblem,twonew3DX-ray imagingtechniquesforthebreastarebeingdeveloped:computed tomography [15–17] (CT)anddigitalbreasttomosynthesis [18] 0895-6111/$–seefrontmatter©2007ElsevierLtd.Allrightsreserved. doi:10.1016/j.compmedimag.2007.02.009

Transcript of mammo_tomo

Page 1: mammo_tomo

Computerized Medical Imaging and Graphics 31 (2007) 224–235

Current status and future directions of computer-aideddiagnosis in mammographyq

Robert M. Nishikawa ∗

Carl J. Vyborny Translational Laboratory for Breast Imaging Research, Department of Radiology and Committee on Medical Physics,

The University of Chicago, 5841 S. Maryland Avenue, MC­2026, Chicago, IL 60637­1463, United States

Abstract

The concept of computer-aided detection (CADe) was introduced more than 50 years ago; however, only in the last 20 years there have been

serious and successful attempts at developing CADe for mammography. CADe schemes have high sensitivity, but poor specificity compared to

radiologists. CADe has been shown to help radiologists find more cancers both in observer studies and in clinical evaluations. Clinically, CADe

increases the number of cancers detected by approximately 10%, which is comparable to double reading by two radiologists.

© 2007 Elsevier Ltd. All rights reserved.

Keywords: Breast cancer; Mammography; Computer-aided diagnosis; Computer-aided detection; Observer studies; Clinical evaluation; ROC analysis

1. Introduction

Breast cancer is a major killer of women in the United States

and in many other parts of the world. Each year approximately

41,000 women die from breast cancer in the United States, and

213,000 women are diagnosed with breast cancer [1]. Screening

of asymptomatic women by mammography has lead to a reduc-

tion in breast cancer mortality. Several randomized, controlled

screening studies have shown an overall decrease in breast can-

cer mortality of up to 30% [2–4]. Further, using mathematical

modeling, Berry et al. have shown that the recent decrease in

breast cancer mortality in the United States has been due equally

to screening with mammography and to better treatment [5].

The detection and diagnosis of breast cancer with mammog-

raphy are composed of two steps. The first is asymptomatic

screening, where suspicious areas in a mammogram are

identified. The second is diagnostic mammography, where

symptomatic women with an abnormal mammogram or some

physical or clinical abnormality (e.g., a palpable lump) receive

special view mammograms (e.g., magnification views or spot

compression views) and possibly ultrasound andMRI. The goal

q Financial disclosure: Robert M. Nishikawa has a research agreement with

Eastman Kodak Company and he is a shareholder in Hologic Inc. Both he and

the University of Chicago receive research funding and royalties from Hologic,

Inc.∗ Tel.: +1 773 702 9047; fax: +1 773 702 0371.

E­mail address: [email protected].

of obtaining a diagnostic mammogram is to determine whether

a woman should have a biopsy.

1.1. Screening mammography

Mammography, although effective as a screening tool, has

limitations. On a screeningmammogram, cancers can bemissed

(false-negative mammogram), and non-cancerous lesions can

be mistaken as cancer, leading to a false-positive mammogram.

Depending on how the true cancer status of a woman is deter-

mined, the miss rate in mammography can be nearly 50% [6].

Retrospective analyses of missed cancers [7–13] indicated that

approximately 60% are visible in retrospect, although in some

cases the cancer may be very subtle [12]. These studies also

show that approximately 30% of cancers are not visible in retro-

spect. In many of these cases, the reason for the cancer not being

visible is that there is normal tissue above and below the cancer

that camouflages the cancer. This is because a mammogram is

a 2D image of the 3D breast, so that the superposition of tissue

can hide cancers.

The superposition of tissues can also produce patterns in

the mammogram that look suspicious to a radiologist. As a

result, between 5 and 15% of screening mammograms are read

as abnormal [14], even though the prevalence of cancer in the

screening population is typically 0.5%.

For addressing the superposition problem, two new 3DX-ray

imaging techniques for the breast are beingdeveloped: computed

tomography [15–17] (CT) and digital breast tomosynthesis [18]

0895-6111/$ – see front matter © 2007 Elsevier Ltd. All rights reserved.

doi:10.1016/j.compmedimag.2007.02.009

Page 2: mammo_tomo

R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 225

(DBT). These techniques produce slices, typically 1mm or less

in thickness, that can be stacked to produce a 3D image of the

breast. Compared to CT, which can have isotropic resolution,

DBT has superior resolution within a slice, but much poorer

resolution in the direction perpendicular to the slice. Whereas,

CT collects images to cover at least a complete 360◦ angle,

DBT collects images over only 60◦ or less, leading to a loss of

spatial resolution in the direction perpendicular to the detector.

One drawback of both of these techniques is that there are many

images that a radiologist must review. For example, in DBT

there can be as many as 80 slices, with each slice having the

same information content as a standard mammogram. In this

situation, CADe may be useful in helping radiologists to handle

the large amount of data [19,20].

1.2. Diagnostic mammography

When a suspicious lesion is found on a screening mam-

mogram, or the patient has some physical symptoms (e.g., a

palpable lump), diagnostic mammography is performed. On

diagnostic mammograms, benign lesions are often difficult to

distinguish from cancers, and thus, a cancer can be misinter-

preted as a benign lesion. Clinically, differentiating benign from

malignant lesions is a difficult task. In the USA, the positive-

predictive value (PPV) for diagnostic breast imaging is generally

less than 50%. The PPV measures the percentage of all breast

biopsies that are positive for cancer. Using data from the Breast

Cancer Surveillance Consortium, Barlow et al. determined that

the PPV based on 41,427 diagnostic mammograms was 21.8%

[21]. Elmore et al. examining the results from eight large mam-

mography registries (containing the follow up information on

more than 300,000 screening mammograms), found that the

PPV ranged from 16.9 to 51.8%, with a median value of 27.5%

[22]. Thus, approximately three biopsies of benign lesions are

performed for every biopsy of a malignant lesion. Unneces-

sary biopsies are both physically and emotionally traumatic for

the patient, they are costly to the health care system, and add

unnecessarily to the workload of radiologists, pathologists, and

surgeons. Improving radiologists’ PPV can have a substantial

positive effect on patient care and on the healthcare system.

In addition, the interpretation of a mammogram are inher-

ently variable because the mammograms are read by human

beings. There is both inter- and intra-variability among radi-

ologists [23,24]. Furthermore, there are substantial differences

between the performance of radiologists in Europe and of those

in North America [14,22].

Computer-aided diagnosis (CAD) is being developed to

address some of the limitations of mammography. Two differ-

ent types of CAD systems are being developed: computer-aided

detection (CADe) can be used to help radiologists find breast

cancer on screening mammograms, and computer-aided diag-

nosis (CADx) can be used to help radiologists decide whether

a known lesion is benign or malignant on diagnostic mammo-

grams. It should be noted that here CAD refers to the whole field

and comprises both CADe and CADx. There is good evidence

that CADx systems may be useful for improving radiologists’

PPV [25–28]. Nevertheless, in this paper, I will discuss only

CADe, giving a description of the current status and possible

future directions. I will start with a brief description of the

historical development of CAD in mammography.

2. Historical development

As early as 1955, Lee Lusted talked about automated diag-

nosis of radiographs by computers. In 1967, Fred Winsberg et

al. published a paper in radiology describing a CADx system in

which the computer determined whether a lesion on a mammo-

gram was malignant or benign [29]. By today’s standards, the

film digitization, computer power, and computer vision tech-

niques at that time were very crude, and Winsberg’s method

was not successful. During the next few years, therewere several

unsuccessful attempts at automating both detection and diagno-

sis. Through most of the late seventies to the mid eighties, there

was a period of inactivity, at least as reflected in publications.

In the mid-eighties at the University of Chicago, Doi, Chan,

Giger, MacMahon, and Vyborny started to investigate the con-

cept called computer-aided diagnosis, which is different from

the automated diagnosis of many earlier attempts. Their goal

was not to replace radiologists, but to develop systems that may

help radiologists render better clinical decisions.Abreakthrough

came with two studies. In the first, Getty et al. showed that a

CADx system, the input for which is as a checklist that a radiol-

ogist used to characterize the features of a lesion, could improve

radiologists’ ability to predict whether a lesion was benign or

malignant [25]. Theirs was not an automated system. The sec-

ond was an observer study conducted by Chan et al. in which

15 radiologists read 60 mammograms, half of them containing a

cluster of microcalcifications [30]. They showed that, by using a

computer-aided detection scheme, which was completely auto-

mated, radiologists could find additional calcification clusters in

a mammogram.

These two studies opened the field to many new investigators

and approaches for developing CADe and CADx algorithms.

This has led to several observer studies that have shown the

potential for computer-aided diagnosis to help radiologists not

only inmammography, but in chest radiography and thoracic CT

as well [31–35]. In 1998, the first commercial system received

FDA approval. Another important milestone in terms of clin-

ical implementation was approval for reimbursement in 2000

by Medicare and other health care payers. A timeline of these

developments is shown in Fig. 1.

2.1. Computer­aided detection (CADe) algorithms

Many different techniques are used for developing a CADe

scheme. Various techniques have been summarized in several

review papers [36–41]. In addition, much of the CADe research

has been presented at three main conferences, all of which have

conference proceedings: SPIE Medical Imaging, Computer-

Assisted Radiology and Surgery (CARS), and the International

Workshop on Digital Mammography.

A digital image is the starting point for all techniques,

although an optical computing method was proposed more than

10 years ago. The digital image may come from a full-field dig-

Page 3: mammo_tomo

226 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235

Fig. 1. Timeline of CAD development.

ital mammography (FFDM) system, or it may be obtained by

digitizing of a screen-film mammogram. An FFDM image will

have properties that differ from those of a digitized screen-film

mammogram (dSFM) in terms of response to X-ray exposure

(which is linear), contrast, spatial resolution, and noise. These

differences, which are discussed below, are important when

CAD algorithms are designed.

2.1.1. Linearity

An FFDM image either has a log relationship or is linearly

related to the exposure to the X-ray detector. A dSFM image has

a sigmoidal relationship to the exposure to the X-ray detector,

even though film digitizers are inherently linear. Screen-film

(SF) systems are relatively insensitive at low X-ray exposures

and saturate at high exposures, as shown in Fig. 2. A curve of

pixel value versus log exposure or versus exposure to the detector

is called a characteristic curve.

2.1.2. Contrast

Because of the non-linear response of the SF system, contrast

is reduced at high and low exposures. The slope of the character-

istic curve shown in Fig. 2 is proportional to the contrast in the

image. For FFDM images that are linear, the inherent contrast

of the system is constant at all exposures.

2.1.3. Spatial resolution

The spatial resolution of a digital image is dependent on two

factors: the inherent resolution of the X-ray detector and the

size of the pixels in the image. SF systems have higher spatial

resolution than FFDM systems do. However, once an image is

digitized, the resolution difference can disappear. FFDM sys-

tems have pixel sizes between 0.05 and 0.1mm. Commercial

CADe systems use 0.05mm pixels; however, in many of the

systems reported in the literature, 0.1mm pixels are used. For

detection of clustered microcalcifications, the pixel size of the

image will affect the performance. Chan et al. showed that, as

the pixel size decreased from 0.105mm down to 0.035mm, the

performance of their CADe scheme improved [42]. For detec-

tion of masses, pixel size is less important, because masses

are typically 5mm or larger in diameter. Therefore pixels are

usually reduced in size to approximately 0.4mm. This reduces

the memory requirements and allows for reduced computation

time.

2.1.4. Noise

In FFDM, the image noise is proportional to the square root of

the X-ray exposure to the detector. At low exposures, however,

the electronic noise of the detector can be significant. This is

true for a linear system. If the FFDM records the log of the

measured exposure, then the noise is proportional to the inverse

of the square root of the X-ray exposure to the detector. In an

SF system, the noise is proportional to the inverse of the square

root of the detector exposure, but is modified by the slope of

the characteristic curve (shown in Fig. 2), so that it is decreased

at both high and low exposures. In addition, the film digitizer

adds noise to the digitized image, principally at high exposures.

The film is dark at high exposures, so that the amount of light

transmitted through the film is low. As a result, the electronic

noise of the filmdigitizer becomes significant, and the total noise

in the image increases.

Fig. 2. Characteristic curve for a full-field digital mammography system (left) and a digitized screen-film system (right).

Page 4: mammo_tomo

R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 227

Fig. 3. Flowchart of a generic CADe scheme.

Two general approaches are used for the automated detec-

tion of cancer on mammograms. The first approach is to apply

statistical classifiers, such as artificial neural networks [43] and

support vector machines [44,45], directly to the image data. The

image is divided into small regions of interest, typically 32× 32

pixels. This produces approximately 50,000 non-overlapping

ROIs per 100-mmpixel image. Therefore, for reducing the num-

ber of false ROIs down to even five per image, the classifier must

be able to eliminate 99.99% of the false ROIs without appre-

ciably eliminating ROIs containing malignant lesions. This is

extremely difficult to achieve and this approach to automated

detection has not yet been successful.

The second approach is outlined in Fig. 3. After a digital

mammogram is obtained, potential signals are identified. This

is usually accomplished by transforming of the image by use of

linear filters, morphologic operators, wavelets, and other means.

Next, thresholding is applied. The goal is to identify as many

true signals as possible without an excessive number of false

signals. For detection of calcification, the ratio of false to true

signals can be 100:1 or higher.

Once the signals have been identified, they are segmented

from the image.Many different techniques have been developed.

Most rely on thresholding of the image either in the transformed

space or in the acquired pixel value space. More sophisticated

methods have also been developed, such as a Markov ran-

dom field model [46]. In this approach, pixels in the image are

modeled as belonging to one of four classes: background, cal-

cification, lines/edge, and film emulsion errors. Three different

features are used in the model: local contrast at two different

spatial resolutions and the output of a line/edge detector.

Once signals have been segmented, features of the signals

are extracted and used in statistical classifiers to distinguish true

from false signals. Many different types of classifiers have been

Fig. 4. An illustration of the utility of FROC curves. Two points given by the cir-

cle and triangle represent the performance of two hypothetical CADe schemes.

If the two points lie on the same FROC curve (broken line), the two CADe

schemes have the same performance. If the two points lie on different curves

(solid lines), the curve closer to the upper left corner of the graph has the best

performance.

employed. A partial list includes simple thresholds [47], artifi-

cial neural networks [48], nearest neighbor methods [49], Fuzzy

logic [50], linear discriminant analysis [51], quadratic classifier

[52], Bayesian classifier [53], genetic algorithms [54], multi-

objective genetic algorithms [55], and support vector machines

[44,45].

2.2. Evaluation of CADe schemes

CADe schemes are typically evaluated by use of free-

response receiver operating characteristic (FROC) curves (see

Fig. 4). These are plots of sensitivity versus the average number

of false detections per image. Sensitivity is calculated in two

ways. The first method is calculation by case. A case consists of

two views of each breast, or fourmammograms.Here, if a cancer

is detected by the computer in at least one view, it is considered

detected. The secondmethod is calculation by image. Here, sen-

sitivity is calculated based on each image; that is, if the computer

detects a cancer in only one of two views, the sensitivity is only

50%. The sensitivity by case is almost always higher than the

sensitivity by image. Sensitivity by case is often reported when

CADe is evaluated clinically, because it is assumed that, if the

computer detects a cancer in at least one view, the radiologist

will be able to locate the cancer in the other view, if neces-

sary. However, there is evidence that, if an overlooked cancer is

detected only in one view by the computer, it is likely that the

radiologist will not recognize the correct computer prompt [56].

FROC curves are more useful than just measuring a single

sensitivity and false-detection rate, which is essentially a sin-

gle point on the FROC curve. If one is comparing two different

CADe schemes, and one has a sensitivity of 80% with 0.1 false

detection per image and the other has 85% sensitivity with 0.5

false detections per image, it is not clear which system is better

(see Fig. 4). The two points could belong to the same FROC

curve, in which case they have the same performance. That is,

by “tuning” a CADe scheme, it is possible to obtain any sensitiv-

Page 5: mammo_tomo

228 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235

Fig. 5. Effect of CADe scoring criteria on measured performance by use of

FROC curves. The circle method scores a true positive if two computer-detected

signals arewithin the circle with smallest diameter that encloses all actualmicro-

calcifications. The centroid method scores a true positive if the centroid of the

computer-detected cluster is within 6mm of the centroid of the actual cluster

and there are at least two actual microcalcifications detected by the computer.

The bounding box method scores a true positive as follows. The smallest box

that completely encloses all actual microcalcifications is drawn, and then the

smallest box that completely encloses all computer-detected signals is drawn.

The computer-detected cluster is scored as a true positive if any of the following

conditions are true: (i) the detected-cluster bounding box is entirely within the

truth bounding box; or (ii) the truth bounding box is completely in the computer-

detected bounding box and the area of the bounding box is no larger than twice

the cluster bounding box; (iii) the center of the truth bounding box lies in the

computer-detected bounding box and the center of the computer-detected bound-

ing box lies within the truth bounding box, and the area of the bounding box is

no larger than twice the cluster bounding box.

ity/false detection rate on the curve. Alternatively, the two points

could belong to different curves, in which case the scheme with

the higher curve has a better performance. For comparing two

FROC curves, a statistical technique called JAFROC (jackknife

FROC) analysis can be used [57]. Free software for performing

such an analysis is available at http://www.devchakraborty.com/.

Comparing published results of different CADe schemes is

problematic. Differences in the criteria used for scoring whether

the computer detected a cancer, the database used for the evalua-

tion, and differences in theway theCADe schemewas trained all

can affect themeasured performance. Fig. 5 shows themeasured

FROC curves for one CADe scheme evaluated by use of various

scoring criteria, all ofwhich have been used in published studies.

Fig. 6 shows the effect of the database on the measured FROC

curve. As expected, the easiest cases produce the highest per-

formance. Finally, bias can arise in the measured performance

depending on how the algorithm is trained and tested. If the

same cases are used for training and testing, there is a positive

bias, which can be very large. To avoid this, researchers often use

either bootstrapping or some type of jackknifing to train and test.

Recent studies show that the bootstrap method has advantages

over jackknifing method using cross-correlation [58]. However,

it has been shown that, if the same cases are used for select-

ing features and for training a classifier by use of those selected

features, there again will be a positive bias.

Whereas, an FROC curve characterizes the performance of

a CADe scheme, when it is used clinically, a single operating

Fig. 6. Effect of database onperformance ofCADealgorithms.The performance

of the computer detection scheme was tested with three different databases:

“easy”, “altered-easy”, and “difficult”. Each is a subset of 50 pairs of mammo-

grams from a larger database of 90 pairs. Whereas, the “easy” and “difficult”

databases have only 10 pairs of images in common, the “easy” and “altered-easy”

databases are identical except for 10 pairs.

point on the curve is chosen. There is no accepted method for

choosing the operating point. The choice is generally based on

the perceived tradeoff between sensitivity and the false-detection

rate. Usually, the operating point is chosen to give the highest

clinically acceptable false-detection rate. Because the highest

clinically acceptable false- detection rate is not known, there can

be differences in the selection of the operating point depending

on who is making the choice.

In general, clinicalCADesystemshavehigh sensitivity.Com-

mercial systems have reported sensitivities of 98% for clustered

microcalcifications and 85% for masses, which are comparable

or exceeding the sensitivity of most radiologists. Therefore, it is

possible for CADe systems to detect cancers. The difficulty is to

achieve this at a low false-detection rate. False detections reduce

radiologists’ productivity because the radiologists must spend

time reviewing all computer detections. If the false-detection

rate is high – greater than approximately one per image for an

exam with four images – then a radiologist who is trying to

read as efficiently as possible may choose to ignore all of the

computer prompts rather than spend the time to review multi-

ple prompts that are all most likely to be false—in screening

the cancer prevalence is typically 0.5%. Radiologists typically

recall between 5 and 10% of women screened, whichmeans that

there are approximately 0.012–0.05 false detections per image.

Current CADe schemes have between 0.1 and 0.5 false detec-

tions per image, an order of magnitude higher than those of

radiologists.

In the detection of clustered microcalcifications, most false

detections are caused by benign calcifications, most often cal-

cified vessels (see Fig. 7). It can be difficult to distinguish

calcified vessels from malignant calcifications that appear in

a linear pattern. In the detection of masses, most false detec-

tions are caused by superposition of tissue and benign lesions,

the same causes of false detections by radiologists. However,

even though the causes are the same, the same false detections

are not found in each image by the computer and the radiolo-

Page 6: mammo_tomo

R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 229

Fig. 7. A mammogram with calcified vessels, which lead to multiple false

detections by the computer.

gist. Thus, if the radiologist cannot reliably distinguish computer

false detections from computer prompted cancers, CADe could

preferentially increase the radiologist’s recall rate (the fraction

of women considered to have an abnormal screening mam-

mogram). The difficulty for radiologists to distinguish actual

cancers from false lesions implies that actual cancers and false

lesions appear similar visually. Therefore, it is also difficult for

the computer accurately to separate cancers from false detec-

tions. As a result, the false-detection rate for mass detection is

higher than that for detection of calcifications.

2.3. Clinical effectiveness of CADe

The goal of CADe is not the detection of cancer. The goal

is to help radiologists avoid overlooking a cancer that is visible

in a mammogram. Thus, whereas a high CADe performance is

good, in theory it is neither a necessary nor a sufficient condition

for CADe to be successful clinically. Therefore, it is possible for

a CADe scheme to have a sensitivity of less than 50% and still be

a useful aid. The computer, in theory, only needs to prompt those

cancers that the radiologist missed, because the radiologist gains

no advantage from the computer prompting cancers that he or

she has already detected. However, from a practical viewpoint,

if the computer misses too many cancers that the radiologist has

detected, the radiologist will lose confidence in the ability of the

computer to detect cancers and CADe will not be an effective

aid.

The two necessary conditions for CADe to be successful are:

(1) The computer is able to detect cancers that the radiologist

misses.

(2) The radiologistmust be able to recognizewhen the computer

has detected a missed cancer.

There is good evidence in the literature that CADe can

detect clinically missed cancers. Several studies have shown

that between 50 and 77% of missed cancers can be detected by

CADe. In these studies, previous mammograms from women

with a screen-detected cancer are reviewed for signs that a can-

cer was visible in an earlier mammogram. These missed cancers

are collected and subjected to CADe.Whereas detecting 77% of

the missed cancers is a large fraction of the misses, not all com-

puter prompts are “actionable”. In women with mammograms

that appear “lumpy”, there are many areas that resemble cancer.

A small and subtle cancer cannot be reliably detected by a radi-

ologist in the presence of multiple similar lumps. Therefore, a

computer prompt in this situation is unlikely to prevent a radiol-

ogist from missing a cancer. Thus, it is important to determine

radiologists’ ability to distinguish computer prompts for cancer

from computer prompts for false lesions.

Four observer studies have been performed for measuring

the benefits of CADe. The first two studies showed a statistically

significant improvement in radiologists’ performancewhen they

used CADe [30,59–61]. These were small studies and were con-

ducted in such a manner as to produce a bias in favor of using

CADe. In the study by Chan et al., a time limit was given to

reading the images and radiologists were shown only a single

image, instead the four images that are standard in most screen-

ing exams. Nevertheless, this is the seminal paper in the field,

and it launched renewed interest in computer analysis of mam-

mograms [30]. The study by Kegelmeyer et al. looked only at

spiculated lesions, and the CADe scheme had 100% sensitiv-

ity [61]. The two more recent studies were much larger than

the first. The study by Taylor et al. did not show a statistically

significant improvement in radiologists’ performance as mea-

sured in terms of sensitivity and specificity [59]. The sensitivity

increased from 0.78 to 0.81 with 95% CI for a difference of

[−0.003, 0.064], and the specificity increased from 0.86 to 0.87

with 95% CI for a difference of [−0.003, 0.034]. These values

are close to significant and one can speculate if data were col-

lected to allow an ROC analysis to be performed, whether there

would be a statistically significant increase in the area under the

ROC curves, since ROC experiments have substantially higher

statistical power than sensitivity and specificity calculations.

The fourth observer study was performed by Gilbert et al.,

called the CADET study (computer aided detection evaluation

trial) [60]. This was a very large study (10,267 cases containing

236 cancers)with eight readers. They obtained several important

results. First, radiologists need to be trained with a large number

Page 7: mammo_tomo

230 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235

of cases, at least 400 in their study, before radiologists use CADe

consistently [62]. That is, their recall rate when CADe was used

decreased as the training increased up to 400 cases. They were

able to show that, compared to double reading by two radiolo-

gists, when using CADe radiologists detected 49.1% of cancer

cases, whereas only 42.6% were found by double reading. The

sensitivity was low in this study because, in addition to the 236

cancers detected in the time frame from which the cases were

collected, there were an additional 85 patients who developed

breast cancer after the study period.

2.4. Clinical requirements

There is good evidence that CADe can detect cancers missed

by radiologists and that radiologists can use CADe to find more

cancers, at least as noted in observer studies. These results can

be extrapolated to clinical effectiveness, but there are limita-

tions. In observer studies, goal is to simulate clinical reading

conditions, and yet there are differences. The greatest differ-

ence is that, in an observer study, the radiologist’s interpretation

has no effect on patient management. Thus, radiologists are not

under the same clinical pressure in an observer study. Also, the

cancer prevalence is usually significantly higher in an observer

study, typically 25–50% compared to 0.5% in clinical practice.

Therefore, clinical studies must be performed for assessment of

the clinical effectiveness of CADe. Seven such studies have been

published, these are summarized in Table 1. Overall, the average

increase in cancers detected when using CADe is approximately

10%. This is comparable to the increase in the cancer detection

rate from double reading by two radiologists [63–65].

The first published clinical evaluation of CADe was done

by Freer and Ulissey [66]. They found a 19.6% increase in the

number of cancers detected when CADe was used. The second

published clinical evaluation was performed by Gur et al. [67]

Who found that the cancer detection rate increased only from

3.49 to 3.55 per 1000 women screened. Feig et al. did a reanaly-

sis of the Gur data and found that the low-volume readers had a

19.7% increase in the cancer detection rate, but the high volume

readers had a 3.2% decrease [68]. These two studies are impor-

tant because they represent the two different methods used for

measuring the effectiveness of CADe in screening mammogra-

phy. The Freer study was a cross-sectional study, that is, data

is collected sequentially. The radiologist first reads the image

without CADe and renders an opinion. He or she then examines

the computer results and renders a new opinion if necessary.

As a result, the effectiveness of CADe is determined patient by

patient, and the number of extra cancers detected because CADe

was used can be computed. The Gur study, on the other hand,

was a longitudinal study, that is, historical or temporal compar-

isons were made. The cancer detection rate can be compared

between two time periods, one before CADe was implemented

clinically and the other after CADe was implemented. In this

method, the effectiveness of CADe is determined by the change

in the cancer detection rate.

Overall, there is a large range inmeasured increase in cancers

detected by use of CADe in part for two reasons. The first is

that two different methodologies were used for measuring the

clinical effectiveness of CADe. The second is that, although a

large number ofwomenmayhave been screened in a given study,

the number of cancers in the population is small, typically 5 per

1000 women screened, and therefore, the statistical uncertainty

in the cancer detection is large, large enough to account for the

apparent variation.

To examine these two effects, I previously developed aMonte

Carlo simulation of CADe in screening mammography [69].

The flowchart for the simulation model is shown in Fig. 8. Can-

cers are assumed to grow exponentially, with a volume doubling

time of 157 days. Once the cancer is greater than the detection

threshold, assumed to be 0.5 cm, it can be detected in one of

threeways. First, can be detected by non-mammographicmeans,

such as palpation; these are considered to be interval cancers,

which are assumed to occur in 15% of cancers. Second, it can

be detected by the radiologist without the help of CADe. These

are assumed to constitute 85% of non-interval cancers. Third,

it can be detected by the radiologist with the help of CADe.

These are assumed to include 75% of the cancers missed by the

radiologist.

Two different conditions were simulated. The first was an

idealized situation in which all cancers grow at the same rate.

There were 125 women who developed cancer each year. We

Table 1

Summary of seven clinical studies of CADe

Study Total number screened Number of cancers detected %Change

Unaided Aided Unaided Aided

Longitudinal studies

Gur et al. [67] 56,432 59,139 197 210 1.7

Feig et al. (high volume) [68] 44,629 37,500 161 131 −3.2

Feig et al. (low volume) [68] 11,803 21,639 36 79 19.7

Cupples et al. [70] 7872 19,402 29 83 16.1

Cross-sectional studies

Freer and Ulissey [66] N/A 12,860 41 49 19.5

Birdwell et al. [73] N/A 8,692 27 29 7.4

Helvie et al. [74] N/A 2,389 10 11 10.0

Khoo et al. [75] N/A 6,111 116 118 1.7

Page 8: mammo_tomo

R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 231

Fig. 8. Flowchart of simulation model.

then repeated the simulation 100 times and averaged the results.

This produces outcomes with very little statistical variation. The

secondwas amore realistic situation inwhich 125women devel-

oped cancer each year, there was a log-normal distribution of

growth rates with a median of 157 and a standard deviation

of 90 days, and there was only a single run (i.e., no averaging

over multiple repeated simulations). When the growth rate is

the same for each cancer, there is the same number of detectable

cancers in the patient population each year. With a spectrum of

growth rates, there is a large variation in the number of detectable

cancers present each year. As a result, this leads to a large vari-

ability in the number of screening-detected cancers from year

to year. This can be seen by comparing Fig. 9a and b. If the

cross-sectional method were used for assessment of the benefits

of CADe, the result would depend upon the year the data were

collected, because there is variation in the lower curve of Fig. 9a.

If we were to use the longitudinal method, it is likely that we

would measure only a very small change in the cancer detection

rate, because as shown in Fig. 9b, the actual difference is very

small. If we were to repeat the realistic simulation, we would

Fig. 9. Results of simulation of CADe in screening mammography: number of

cancers detected per year in a screening population as a function of time. CADe

is introduced in year 20. (a) All cancers grow with the same doubling time,

and the curves are averaged over 100 trials. (b) A single trial result is shown,

and the cancers have a distribution of doubling times. The horizontal lines with

arrowheads indicate two time periods to compare the benefits of CADe for

historical comparison (longitudinal method) and the vertical line indicates the

difference in the radiologist’s cancer detection with and without computer aid

(cross-sectional method).

get a different result. This result can vary greatly from the one

shown because it is possible to measure a decrease in the cancer

detection rate when using the longitudinal method [69]. There-

fore, the large variation in the results of the clinical studies is

not unexpected.

These simulation results may indicate that the cross-sectional

method is a better method to use. However, both methods have

strengths and weaknesses, as listed in Table 2. Although the list

Table 2

Strengths and weaknesses of two different methods for measuring the clinical effectiveness of CADe

Type of study Cross-sectional Longitudinal

Method Sequential reading of each patient without and with CADe Temporal comparison of two groups of patients read without and with

CADe

Outcome Change in number of cancers detected Change in cancer detection rate

Strengths Straightforward to implement Possible to conduct large studies

Not subject to variations apparent in longitudinal method Not subject to biases that may be present in the cross-sectional method

Weakness Subject to potential large positive and negative biases Subject to variation in number of prevalence screens, which are difficult

to control for

Not possible to determine which type of bias is present Subject to variation in radiologists’ ability to read mammograms

Subject to variation in the number of cancers in the screening population

from year to year

CADe effect on cancer detection rate is small

Page 9: mammo_tomo

232 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235

ofweaknesses for the longitudinalmethod is longer, the potential

for large positive or negative biases in the cross-sectionalmethod

makes it difficult to interpret the results from such studies. The

three sources of variation listed in Table 2 for the longitudi-

nal method imply that there can be large variations in the results

between studies (compareGur andCupples studies). Thismakes

it extremely difficult to measure the small difference in the can-

cer detection rate that actually exists. Therefore, clinically, the

longitudinal method will not be effective in measuring the ben-

efits of using CADe. As discussed in another paper, the change

in the size of the cancer may be a better endpoint [69]. A change

in cancer stage when CADe was used was found in one of the

clinical studies [70].

2.5. Improving CADe performance

There is good evidence that radiologists often ignore CADe

prompts of actual cancers. The reason for this is not well under-

stood. One possibility is that the false-detection rate is too high

and radiologists tend to dismiss computer prompts or pay less

attention when an image has many prompts. Generally, radiol-

ogists like using CADe for clustered calcifications, but are less

enthusiastic about CADe for masses. The obvious difference

is that clustered calcification algorithms have a better perfor-

mance than do those for masses. However, there are less obvious

reasons.

There is very little structure in the breast that can mimic

clustered calcifications. Therefore, false calcification prompts

are either due to benign calcifications or obviously not to cal-

cifications. Radiologists, in general, can evaluate calcification

prompts quickly. On the other hand, the superposition of over-

lapping normal tissues can produce a pseudo-lesion in the

mammogram. This is a very common occurrence, and it is some-

times difficult for radiologists to determine whether a lesion is

real or is merely overlapping tissue. Approximately 30% of all

screening mammograms classified as abnormal are due to the

superposition of tissue. One technique that radiologists used to

determine whether an apparent lesion is real or not is to deter-

mine whether the lesion is visible in both views. If it is, then it

is highly likely to be an actual lesion. If it is not, it may or may

not be an actual lesion. Radiologists will then compare the area

containing the apparent lesions with other regions within the

mammogram. If the pattern where the apparent lesion is located

is similar to patterns elsewhere in the mammogram, then it is

likely that the apparent lesion is not real. The radiologist also

compares the current films to previous films to see whether the

apparent lesion is new or has changed over time.

CADe schemes need to use this approach to reduce the false-

detection rate. It has been shown that observers improved their

performance when using context information for classification

of false-positive and true-positive regions. That is, better dis-

crimination was achieved when radiologists looked at a whole

image rather than a small ROI around the lesion [71], and fur-

thermore, that radiologists are better than computers at this task.

There are several approaches to correlating views either

within the same examinations or between examinations done

at different times. One approach is to transform one image to

match the corresponding image taken at a previous time, or an

image of the opposite breast. Using geometry based on patient

positioning, possible match pairs of detections are determined

[72]. Then, using features of the match pairs, radiologists devel-

oped a matching pair score. This score allowed corresponding

pairs to be determined.

Another approach is to extract features of CADe detections

and compare detections from different views of the same breast

or the same view taken at different times to match the detections

between views.

2.6. CADe as a first reader

One of the most demanding and time-consuming aspects of

reading a screeningmammogram is to find clusteredmicrocalci-

fications. This is because microcalcifications can be very small,

a few tenths of a millimeter. Radiologists typically use a magni-

fying glass and carefully examine all areas of each of four film

mammograms. On a digital system, electronic zoom is used.

Given that CADe for clustered calcifications has a high sen-

sitivity, approximately 98%, as radiologist gain confidence in

the computer’s ability to find clustered calcifications, the need

to search the image with a magnifying glass may be reduced

to the point where the radiologist relies on the computer to

detect these calcifications. This would allow radiologists just to

check the computer-detected clusters of calcification and then

read the mammograms for mass lesions. This should improve

radiologists’ productivity and reduce reading fatigue.

2.7. CADe and picture archiving and communication

system (PACS)

As mammography migrates from film to digital acquisition,

it becomes important for CADe to be integrated with PACS.

Proper integration of CADe and PACS is critical for CADe to

be used as a tool to increase productivity. Digital images need

to be sent to the CADe server, where the actual CADe algo-

rithms are run; then the output of the CADe schemes needs to

be stored with the images so that they are available for review.

If the digital images are printed and viewed on light boxes, then

the method used for reviewing the CADe output with filmmam-

mography can still be used. If the digital images are viewed on

soft-copy monitors, the CADe prompts must be displayed as

an overlay on the digital mammogram. In either case, a mecha-

nism is needed for storing, transmitting, and display the CADe

marks. This can be done by use of the structure report feature

of DICOM. DICOM stands for Digital Imaging and Commu-

nications in Medicine (http://medical.nema.org/). DICOM is a

set of standards for handling, storing, printing, and transmitting

information in medical imaging. It is a global standard used by

virtually all medical imaging enterprises.

Although CAD companies have embraced DICOM structure

reports as a method for storing and retrieving information about

the output of CAD analyses, not all PACS companies currently

are able to utilize structured reports. As a result, it is not pos-

sible to store and retrieve any CAD output. Further, one of the

strong features of DICOM is its flexibility, but is also a draw-

Page 10: mammo_tomo

R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 233

back in terms of integrating different systems. For example, a

woman may have a digital mammogram taken on system A,

but her pervious mammograms were taken on system B. If the

review workstation that is being used cannot display mammo-

grams from both systemA and systemB, there will be a problem

for the radiologist. This can occur even though both systems

use DICOM because, in addition to the standard features that all

systems use, each system can in addition specify added features,

and these may differ from one system to another. Even if both

images can be displayed, theymay be displayed differently (e.g.,

they may have different image-processing techniques applied to

them). This is very problematic. This is not a problem for CADe

display per se, but integration of CADe must occur in this envi-

ronment. Fortunately, a consortium of industry, radiologists, and

informaticist is developing a superset of standards for DICOM

that will allow full integration of all of the systems involved

in digital mammography (PACS, acquisition hardware, display

hardware, and CAD). This consortium is called IHE (Integrat-

ing the Healthcare Enterprise: http://www.ihe.net/). The goal of

IHE is to standardize many of the optional features of standard

DICOM. This should allow true compatibility among all of the

components necessary for digital mammography to work seam-

lessly in the clinic, allowing radiologists to work productively

and use CADe routinely.

3. Summary

The concept of computer-aided detection (CADe) was intro-

duced more than 50 years ago; however, only in the last 20 years

have there been serious and successful attempts at developing

CADe for mammography. CADe schemes have high sensitivity,

but poor specificity compared to radiologists. CADe has been

shown both in observer studies and in clinical evaluations to help

radiologists find more cancers. Recent clinical studies indicate

that CADe increases the number of cancers detected by approx-

imately 10%, which is comparable to double reading by two

radiologists. However, it is difficult to measure the clinical ben-

efits of CADe because of variability in the number of cancers

present in the screened population from year to year. Further-

more, the actual increase in the cancer detection rate is very

small, and yet a radiologist can reduce his or her missed cancer

rate by using CADe. Finally, one important goal of CADe is to

improve radiologists’ productivity. To accomplish this goal, it

is important to incorporate CADe seamlessly into the clinical

workflow. This goal can be achieved by careful integration of

CADe into the clinical PACS.

References

[1] American Cancer Society. Cancer facts and figures 2006. Atlanta, GA:

American Cancer Society; 2006.

[2] Anderson I, Aspegren K, Janzon L, et al. Mammographic screening and

mortality from breast cancer: the Malmo mammographic screening trial.

Br Med J 1988;297:943–8.

[3] ShapiroS,VenetW,StraxPH,VenetL,RoesetR.Ten to fourteen-year effect

of screening on breast cancermortality. J Natl Cancer Inst 1982;69:349–55.

[4] Tabar L, Yen MF, Vitak B, Tony Chen HH, Smith RA, Duffy SW.

Mammography service screening and mortality in breast cancer patients:

20-year follow-up before and after introduction of screening. Lancet

2003;361(9367):1405–10.

[5] Berry DA, Cronin KA, Plevritis SK, et al. Effect of screening and

adjuvant therapy on mortality from breast cancer. N Engl J Med

2005;353(17):1784–92.

[6] Pisano ED, Gatsonis C, Hendrick E, et al. Diagnostic performance of dig-

ital versus film mammography for breast-cancer screening. N Engl J Med

2005;353(17):1773–83.

[7] Andersson I. What can we learn from interval carcinomas? Recent Results

Cancer Res 1984;90:161–3.

[8] Frisell J, Eklund G, Hellstrom L, Somell A. Analysis of interval breast

carcinomas in a randomized screening trial in Stockholm. Breast Cancer

Res Treat 1987;9:219–25.

[9] Harvey JA, Fajardo LL, Innis CA. Previous mammograms in patients with

impalpable breast carcinoma: retrospective versus blinded interpretation.

Am J Roentgenol 1993;161:1167–72.

[10] Holland T, Mrvunac M, Hendriks JHCL, Bekker BV. So-called inter-

val cancers of the breast. Pathologic and radiographic analysis. Cancer

1982;49:2527–33.

[11] Ma L, Fishell E, Wright B, Hanna W, Allen S, Boyd NF. A controlled

study of the factors associated with failure to detect breast cancer by

mammography. J Natl Cancer Inst 1992;84:781–5.

[12] Martin JE, Moskowitz M, Milbrath JR. Breast cancers missed by mam-

mography. Am J Roentgenol 1979;132:737–9.

[13] Peeters PHM, Verbeek ALM, Hendriks JHCL, Holland R, Mrvunac M,

Vooijs GP. The occurrence of interval cancers in the Nijmegen screening

programme. Br J Cancer 1989;59:929–32.

[14] Smith-BindmanR,ChuPW,MigliorettiDL, et al. Comparison of Screening

Mammography in the United States and the United Kingdom. J Am Med

Assoc 2003;290:2129–37.

[15] Boone JM, Kwan AL, Yang K, Burkett GW, Lindfors KK, Nelson TR.

Computed tomography for imaging the breast. J Mammary Gland Biol

Neoplasia; 2006.

[16] Boone JM, Nelson TR, Lindfors KK, Seibert JA. Dedicated breast CT:

radiation dose and image quality evaluation. Radiology 2001;221(3):

657–67.

[17] Chen B, Ning R. Cone-beam volume CT breast imaging: feasibility study.

Med Phys 2002;29(5):755–70.

[18] Niklason LT, Christian BT, Niklason LE, et al. Digital tomosynthesis in

breast imaging. Radiology 1997;205(2):399–406.

[19] Chan HP, Wei J, Sahiner B, et al. Computer-aided detection system for

breast masses on digital tomosynthesis mammograms: preliminary experi-

ence. Radiology 2005;237(3):1075–80.

[20] Reiser I, Nishikawa RM, Giger ML, et al. Computerized mass detection

for digital breast tomosynthesis directly from the projection images. Med

Phys 2006;33(2):482–91.

[21] Barlow WE, Chi C, Carney PA, et al. Accuracy of screening mammog-

raphy interpretation by characteristics of radiologists. J Natl Cancer Inst

2004;96(24):1840–50.

[22] Elmore JG, Nakano CY, Koepsell TD, Desnick LM, D’Orsi CJ,

Ransohoff DF. International variation in screening mammography inter-

pretations in community-based programs. J Natl Cancer Inst 2003;95(18):

1384–93.

[23] Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of

screening mammograms by US radiologists. Findings from a national

sample. Arch Intern Med 1996;156(2):209–13.

[24] Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variabil-

ity in radiologists’ interpretations of mammograms. N Engl J Med

1994;331(22):1493–9.

[25] Getty DJ, Pickett RM, D’Orsi CJ, Swets JA. Enhanced interpretation of

diagnostic images. Invest Radiol 1988;23:240–52.

[26] Horsch K, Giger ML, Vyborny CJ, Lan L, Mendelson EB, Hendrick RE.

Classification of breast lesions with multimodality computer-aided diag-

nosis: observer study results on an independent clinical data set. Radiology

2006;240(2):357–68.

[27] Huo Z, Giger ML, Vyborny CJ, Metz CE. Effectiveness of computer-aided

diagnosis—observer study with independent database of mammograms.

Radiology 2002;224:560–8.

Page 11: mammo_tomo

234 R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235

[28] Jiang Y, Nishikawa RM, Schmidt RA,Metz CE, GigerML, Doi K. Improv-

ing breast cancer diagnosis with computer-aided diagnosis. Acad Radiol

1999;6(1):22–33.

[29] Winsberg F, Elkin M, Macy J, Bordaz V,WeymouthW. Detection of radio-

graphic abnormalities in mammograms by means of optical scanning and

computer analysis. Radiology 1967;89:211–5.

[30] Chan H-P, Doi K, Vyborny CJ, et al. Improvement in radiologists’ detec-

tion of clustered microcalcifications on mammograms: the potential of

computer-aided diagnosis. Invest Radiol 1990;25(10):1102–10.

[31] Kobayashi T, Xu X-W, MacMahon H, Metz CE, Doi K. Effect of

a computer-aided diagnosis scheme on radiologists’ performance in

detection of lung nodules on radiographs. Radiology 1996;199:843–

8.

[32] Abe K, Doi K, MacMahon H, et al. Computer-aided diagnosis in chest

radiography: analysis of results in a large clinical series. Invest Radiol

1993;28:987–93.

[33] Abe H, MacMahon H, Engelmann R, et al. Computer-aided diagnosis in

chest radiography: results of large-scale observer tests at the 1996-2001

RSNA scientific assemblies. Radiographics 2003;23(1):255–65.

[34] Abe H, Ashizawa K, Li F, et al. Artificial neural networks (ANNs) for

differential diagnosis of interstitial lung disease: results of a simulation test

with actual clinical cases. Acad Radiol 2004;11(1):29–37.

[35] Shiraishi J, Abe H, Engelmann R, Aoyama M, MacMahon H, Doi K.

Computer-aided diagnosis for distinction between benign and malignant

solitary pulmonary nodules in chest radiographs: ROC analysis of radiol-

ogists’ performance. Radiology 2003;227:469–74.

[36] Malich A, Fischer DR, Bottcher J. CAD for mammography: the

technique, results, current role and further developments. Eur Radiol

2006;16:1449–60.

[37] Karssemeijer N. Detection of masses in mammograms. In: Strickland RN,

editor. Image-processing techniques in tumor detection. New York, NY:

Marcel Dekker Inc.; 2002. p. 187–212.

[38] Nishikawa RM.Detection of microcalcifications. In: Strickland RN, editor.

Image-processing techniques in tumor detection. New York, NY: Marcel

Dekker Inc.; 2002. p. 131–53.

[39] Giger ML, Huo Z, Kupinski MA, Vyborny CJ. Computer-aided diagno-

sis in mammography. In: Sonka M, Fitzpatrick JM, editors. Handbook of

medical imaging, vol. 2. Bellingham, WA: The Society of Photo-Optical

Instrumentation Engineers; 2000. p. 915–1004.

[40] Sampat MP, Markey MK, Bovik AC. Computer-aided detection and diag-

nosis in mammography. In: Bovik AC, editor. The handbook of image

and video processing. 2nd ed. New York: Elsevier; 2005. p. 1195–

217.

[41] Nishikawa RM. Computer-assisted detection and diagnosis. Wiley; 2005.

[42] Chan H-P, Niklason LT, Ikeda DM, Lam KL, Adler DD. Digitization

requirements in mammography: effects on computer-aided detection of

microcalcifications. Med Phys 1994;21(7):1203–11.

[43] Stafford RG, Beutel J, Mickewich DJ. Application of neural networks

to computer-aided pathology detection in mammography. Proc SPIE

1993:1898.

[44] El-Naqa I, YangY,WernickMN,GalatsanosNP,NishikawaRM.A support

vector machine approach for detection of microcalcifications. IEEE Trans

Med Imag 2002;21(12):1552–63.

[45] Campanini R, Dongiovanni D, Iampieri E, et al. A novel featureless

approach tomass detection in digitalmammograms based on support vector

machines. Phys Med Biol 2004;49(6):961–75.

[46] VeldkampWJH,KarssemeijerN. Improved correction for signal dependent

noise applied to automatic detection of microcalcifications. In: Karsse-

meijer N, Thijssen M, Hendriks J, van Erning L, editors. Digital mam-

mography nijmegen 98. Amsterdam: Kluwer Academic Publishers; 1998.

p. 160–76.

[47] Chan HP, Doi K, Galhotra S, Vyborny CJ, MacMahon H, Jokich PM.

Image feature analysis and computer-aided diagnosis in digital radiography

I. Automated detection of microcalcifications in mammography. Med Phys

1987;14(4):538–48.

[48] Nagel RH, Nishikawa RM, Doi K. Analysis of methods for reducing false

positives in the automated detection of clustered microcalcifications in

mammograms. Med Phys 1998;25(8):1502–6.

[49] Davies DH, Dance DR. Automatic computer detection of clustered calci-

fications in digital mammograms. Phys Med Biol 1990;35:1111–8.

[50] Cheng H-D, Lui YM, Freimanis RI. A novel approach to microcalci-

fication detection using fuzzy logic technology. IEEE Trans Med Imag

1998;17(3):442–50.

[51] Cernadas E, Zwiggelaar R,VeldkampW, et al. Detection ofmammographic

microclcifications using a statistical model. In: Karssemeijer N, Thijssen

M, Hendriks J, van Erning L, editors. Digital mammography. Amsterdam:

Kuwester; 1998. p. 205–8.

[52] Brown S, Li R, Brandt L, Wilson L, Kossoff G, Kossoff M. Development

of a multi-feature CAD system for mammography. In: Karssemeijer N,

Thijssen M, Hendriks J, van Erning L, editors. Digital mammography.

Amsterdam: Kuwester; 1998. p. 189–96.

[53] Bankman IN, Christens-Barry WA, Kim DW, Weinberg IN, Gatewood

OB, Brody WR. Automated recognition of microcalcification clusters in

mammograms. Proc SPIE 1993;1905:731–8.

[54] Anastasio MA, Yoshida H, Nagel R, Nishikawa RM, Doi K. A genetic

algorithm-based method for optimizing the performance of a computer-

aided diagnosis scheme for detection of clustered microcalcifications in

mammograms. Med Phys 1998;25(9):1613–20.

[55] Anastasio MA, Kupinski MA, Nishikawa RM. Optimization and FROC

analysis of rule-based detection schemes using a multiobjective approach.

IEEE Trans Med Imag 1998;17(6):1089–93.

[56] Nishikawa R, Edwards A, Schmidt R, Papaioannou J, Linver M. Can radi-

ologists recognize that a computer has identified cancers that they have

overlooked? Proc SPIE 2006;6146:1–8.

[57] Chakraborty DP, Berbaum KS. Observer studies involving detec-

tion and localization: modeling, analysis, and validation. Med Phys

2004;31(8):2313–30.

[58] Efron B, Tibshirani R. Improvements on cross-validation: the.632+ boot-

strap method. J Am Stat Assoc 1997;92(438):548–60.

[59] Taylor P, Champness J, Given-Wilson R, Johnston K, Potts H. Impact

of computer-aided detection prompts on the sensitivity and specificity of

screening mammography. Health Technol Assess 2005;9(6):1–70.

[60] Gilbert FJ, Astley SM, McGee MA, et al. Single reading with computer-

aided detection and double reading of screening mammograms in

the United Kingdom National Breast Screening Program. Radiology

2006;241(1):47–53.

[61] Kegelmeyer Jr WP, Pruneda JM, Bourland PD, Hillis A, Riggs MW, Nip-

per ML. Computer-aided mammographic screening for spiculated lesions.

Radiology 1994;191(2):331–7.

[62] Astley S, Quarterman C, Al Nuaimi Y, et al. Computer-aided detection

in screening mammography: the impact of training on reader perfor-

mance. In: Pisano E, editor. Digital mammography 2004. Chapel Hill;

2004.

[63] Anderson ED, Muir BB, Walsh JS, Kirkpatrick AE. The efficacy of dou-

ble reading mammograms in breast screening. Clin Radiol 1994;49(4):

248–51.

[64] Harvey SC, Geller B, Oppenheimer RG, Pinet M, Riddell L, Garra

B. Increase in cancer detection and recall rates with independent

double interpretation of screening mammography. Am J Roentgenol

2003;180(5):1461–7.

[65] Thurfjell EL,LernevallKA,TaubeAA.Benefit of independent double read-

ing in a population-based mammography screening program. Radiology

1994;191(1):241–4.

[66] Freer TW, Ulissey MJ. Screening mammography with computer-aided

detection: prospective study of 12,860 patients in a community breast

center. Radiology 2001;220(3):781–6.

[67] Gur D, Sumkin JH, Rockette HE, et al. Changes in breast cancer detection

and mammography recall rates after the introduction of a computer-aided

detection system. J Natl Cancer Inst 2004;96(3):185–90.

[68] Feig SA, Sickles EA, Evans WP, Linver MN. Re: changes in breast can-

cer detection and mammography recall rates after the introduction of a

computer-aided detection system. J Natl Cancer Inst 2004;96(16):1260–1,

author reply 1261.

[69] Nishikawa RM. Modeling the effect of computer-aided detection on the

sensitivity of screeningmammography. In: Astley SM, editor. Digitalmam-

mography 2006. Berlin: Springer-Verlag; 2006. p. 136–42.

Page 12: mammo_tomo

R.M. Nishikawa / Computerized Medical Imaging and Graphics 31 (2007) 224–235 235

[70] Cupples TE, Cunningham JE, Reynolds JC. Impact of computer-aided

detection in a regional screening mammography program. AJR Am J

Roentgenol 2005;185(4):944–50.

[71] van Engeland S, Varela C, Timp S, Snoeren PR, Karssemeijer N. Using

context for mass detection and classification in mammograms. Proc SPIE

2006;5749:94–102.

[72] Paquerault S, Petrick N, Chan HP, Sahiner B, Helvie MA. Improvement

of computerized mass detection on mammograms: fusion of two-view

information. Med Phys 2002;29(2):238–47.

[73] Birdwell RL, Bandodkar P, Ikeda DM. Computer-aided detection with

screening mammography in a university hospital setting. Radiology

2005;236:451–7.

[74] Helvie MA, Hadjiiski L, Makariou E, et al. Sensitivity of noncommer-

cial computer-aided detection system for mammographic breast cancer

detection: pilot clinical trial. Radiology 2004;231(1):208–14.

[75] Khoo LA, Taylor P, Given-Wilson RM. Computer-aided detection in the

UnitedKingdomNational Breast Screening Programme: prospective study.

Radiology 2005;237(2):444–9.

Robert M. Nishikawa received his B.Sc. in Physics in 1981 and his M.Sc. and

Ph.D. in Medical Biophysics in 1984 and 1990, respectively, all from the Uni-

versity of Toronto. He is currently an Associate Professor in the Department of

Radiology and the Committee on Medical Physics at the University of Chicago.

He is director of the Carl J. Vyborny Translational Laboratory for Breast Imag-

ing Research. He is also a fellow of the American Association of Physicists in

Medicine (AAPM).

His research has three intertwining themes. The first is the development of

computer-aided diagnosis (CAD) techniques for x-ray imaging of the breast, in

particular for digital breast tomosynthesis and full-field digital mammography

(FFDM). The second is the evaluation of CAD, principally its clinical effective-

ness. The evaluations include Monte Carlo modeling of using computer-aided

detection in screening mammography and observer studies to understand how

effectively radiologists can use computers as aids when interpreting mammo-

grams. The third is the investigation of the performance of new breast x-ray

imaging systems. These studies include the evaluation of new clinical systems,

such as FFDMand phase contrast mammography, and the optimization of digital

breast tomosynthesis.