1,2 1 2 - arXiv
Transcript of 1,2 1 2 - arXiv
A Survey on Deep Learning of Small Sample in Biomedical Image Analysis
Pengyi Zhang1,2, Yulin Deng1,2, Xiaoying Tang1,2, Yunxin Zhong1,2, Xiaoqiong Li1,2*
1School of Life Science, Beijing Institute of Technology, Haidian District, Beijing, P.R. China
2Key Laboratory of Convergence Medical Engineering System and Healthcare Technology, Ministry of
Industry and Information Technology, Haidian District, Beijing, P.R. China
Abstract
The success of deep learning has been witnessed as a promising technique for computer-aided biomedical image analysis, due to end-to-end learning framework and availability of large-scale labelled samples. However, in many cases of biomedical image analysis, deep learning techniques suffer from the small sample learning (SSL) dilemma caused mainly by lack of annotations. To be more practical for biomedical image analysis, in this paper we survey the key SSL techniques that help relieve the suffering of deep learning by combining with the development of related techniques in computer vision applications. In order to accelerate the clinical usage of biomedical image analysis based on deep learning techniques, we intentionally expand this survey to include the explanation methods for deep models that are important to clinical decision making. We survey the key SSL techniques by dividing them into five categories: (1) explanation techniques, (2) weakly supervised learning techniques, (3) transfer learning techniques, (4) active learning techniques, and (5) miscellaneous techniques involving data augmentation, domain knowledge, traditional shallow methods and attention mechanism. These key techniques are expected to effectively support the application of deep learning in clinical biomedical image analysis, and furtherly improve the analysis performance, especially when large-scale annotated samples are not available. We bulid demos at https://github.com/PengyiZhang/MIADeepSSL.
Keywords: deep learning, biomedical imaging, survey, small sample, explanation
1. Introduction
Biomedical imaging techniques have been widely used in clinical practice. Specialists used tomanually analyze biomedical images and further fit all clinical clues together to make the final diagnosis based on their own experience. Nowadays, the manual biomedical image analysis is facing four major challenges: (1) Manual analysis is limited by individual experience and thus diagnosis might be different among individuals. (2) Training a qualified specialist requires lots of money and years of efforts. (3) Rapid growth of biomedical images in both number and modality puts enormous pressure on specialists. (4) Repetitive and tedious analysis work on unpleasing biomedical images makes specialists feel tiredeasily, which might lead to a delayed diagnosis or even a misdiagnosis and thus pose a great risk topatients. These challenges exacerbate shortage of healthcare resources to a certain extent especially inunder-developed regions. An alternative solution is computer-aided biomedical image analysis.
Driven by the growth of computing power and availability of large-scale labelled samples (e.g., ImageNet(Deng et al., 2009) and COCO(Lin et al., 2014)), deep neural network has been extensively studied due to its fast, scalable and end-to-end learning framework. Especially, Convolution Neural Network (CNN)(LeCun et al., 2015) models have achieved significant improvements compared with traditional shallow methods in image classification (e.g., ResNet(He et al., 2016) and DenseNet(G. Huang et al., 2017)), object detection (e.g., Faster R-CNN(Ren et al., 2017) and SSD(Liu et al., 2016)) and semantic segmentation (e.g., UNet(Ronneberger et al., 2015) and Mask R-CNN(He et al., 2017)), etc. As biomedical imaging equipment is increasingly applied and popularized in clinical practice, large numbers of biomedical images have been accumulated. Therefore, deep learning techniques have been gradually applied in computer-aided biomedical image analysis, such as diabetic retinopathy detection(Gulshan et al., 2016), skin cancer diagnosis(Esteva et al., 2017), pulmonary disease detection based on X-ray radiography(Rajpurkar et al., 2018) and heart disease risk prediction in fundus retinal images(Poplin et al., 2018), etc. Computer-aided biomedical image analysis using deep learning techniques has been one of the most active research topics. Some studies(Esteva et al., 2017; Rajpurkar et al., 2018) claim that their biomedical image analysis applications based on deep learning techniques have achieved expert-level or even beyond expert-level performance. However, in many cases of biomedical image analysis, deep learning techniques suffer from small sample learning (SSL) dilemma caused mainly by lack of annotations. Labeling multimodal biomedical images is a highly specialized work that experienced biomedical experts are required to perform fine-grained annotations, which puts extra enormous pressure on biomedical experts. This is a major reason why biomedical images usually lack adequate annotations in comparison of large-scale natural images. With inadequate training samples, deep models are prone to overfitting, thus leading to a big drop in performance. Therefore, how to apply
*Corresponding author
deep learning techniques effectively on a small biomedical image set becomes an urgent problem to be well studied.
Many surveys(Litjens et al., 2017; Ker et al., 2018; Anwar et al., 2018; Razzak et al., 2018; Esteva et al., 2019; Maier et al., 2019; Moen et al., 2019) review the progress of deep learning techniques applied in biomedical image analysis from four directions: (1) deep models; (2) analysis tasks, e.g., classification, detection and segmentation; (3) modalities of biomedical images; (4) anatomical structure, e.g., brain, chest and retina. This SSL dilemma is simply mentioned as a challenge and its solution is not surveyed comprehensively. Different from these studies, our survey aims to deal with the SSL dilemma that deep learning techniques generally suffer from in biomedical image analysis tasks. Shu et al.(2018) investigated SSL methods in big data era by dividing them into concept learning and experience learning. Their study theoretically clarified the rationality of the SSL regime by providing neuroscience evidences and furtherly presented a general view of SSL techniques for various application scenarios. To be more practical, in this paper we survey key SSL techniques for deep learning of small sample in biomedical image analysis by combining with the development of relevant techniques in computer vision applications. In order to support the clinical usage of biomedical image analysis based on deep learning techniques, we intentionally expand our survey to include explanation methods for deep models that are important to clinical decision making.
In this paper, we divide our surveyed key SSL techniques into five categories (Fig. 1): (1) Explanation techniques. The clinical applications require that the decision-making process of
deep models is explainable and transparent. We thus explore the potential procedure of computer-aided diagnosis system applied in clinical practice and survey the explanation techniques for deep models in Section 2.
(2) Weakly supervised learning techniques. Since the fine-grained annotations required by fine-grained biomedical image analysis tasks (e.g., lesion localization and gland segmentation) are costly, can we just use relatively low-cost and coarse-grained annotations to perform these fine-grained tasks in biomedical image analysis? Therefore, the weakly supervised learning techniques for dealing with such problem are discussed in Section 3.
(3) Transfer learning techniques. Deep learning has made significant progress in solving various domain related problems. Can the knowledge gained by deep models while solving problems in other domains be applied in biomedical image analysis and thus alleviate the demands for large-scale annotated biomedical images? We present transfer learning techniques for answering this question in Section 4.
(4) Active learning techniques. Not every annotated sample contributes equally to deep models. Can we just select the few most valuable samples for annotations and train deep models with them for biomedical image analysis? We introduce active learning techniques for solving this problem in Section 5.
(5) Miscellaneous techniques. Miscellaneous techniques that help improve the performance of deep models in biomedical image analysis are studied in Section 6, involving data augmentation, domain knowledge, shallow methods and attention mechanism.
Design of deep models
Clinical applications
Training of deep models
Dataset collection and
annotation
Procedure of deep learning
Enlarging training set with software-generated biomedical images
Adopting expertise as a design guide for deep networks
Combining with shallow methods to improve generalization
Motivating deep models to observe carefully
Selecting the few most valuable samples for annotations
Performing fine-grained tasks with weak annotations
Reusing the knowledge gained while solving problems
Providing visual explanation for deep models
Object classification
Object
localization/detection
Semantic/Instance
segmentation
Biomedical image analysis
5. Deep models heavily relies on
large-scale labeled samples.
1. Biomedical images present
multimodality.
3. Only experienced biomedical
experts can perform annotations.
4. Labeling unpleasing biomedical
images is tedious.
6. Clinical practice requires that
decision-making process of deep
models is transparent.
2. Biomedical image analysis
requires expertise.
Deep learning techniques
Explanation methods
Data Augmentation
Attention mechanism
Domain knowledge
Traditional methods
Active learning
Weakly supervised learning
Transfer learning
Surveyed SSL techniques
Figure 1. Overview of our surveyed SSL techniques for deep learning in clinical biomedical image analysis.
To our best knowledge, this is the first attempt to present a comprehensive survey on key SSL
techniques that are expected to effectively support the application of deep learning in clinical biomedical image analysis and furtherly improve analysis performance especially in small sample cases. Notably, our survey focusing on key SSL techniques can be mutually complementary with previous reviews on deep learning in biomedical image analysis.
2. Explanation techniques
Due to highly abstract feature representation and end-to-end learning framework, decision-making process of deep models remains largely unclear(Petsiuk et al., 2018). In clinical practice, an incorrect decision might do great harm to patient’s physical and mental health. Explaining decision-making process of deep models can provide decision-making evidences for doctors’ final diagnosis to enroll doctors in the decision-making process of computer-aided biomedical image analysis (Fig. 2a) and make
the clinical decision more reliable. Thus, clinical applications require that the decision-making process of deep models is explainable and transparent, which is vital to build a trusty computer-aided diagnosis system. Existing explanation methods for deep models can be divided into local or global explanation and white-box or black-box methods (Fig. 2b). Table 1 outlines some representative studies on explanation methods.
CAM-based methods. One of the well-known explanation methods is Class Activation Map (CAM)(Zhou et al., 2016). This method constructs CAMs by calculating the class-specific activation in each spatial location of the last convolutional feature maps and tries to clarify the classification decision-making process by providing visual explanation that highlights class-specific and discriminative regions. In particular, Rajpurkar et al. (2018)(2017) adopted this approach to explain deep models for abnormality detection in musculoskeletal X-rays (Fig. 3a) and pulmonary disease classification in chest X-rays. Compared with the captions provided by a board-certified radiologist, visual CAMs presented a high relevance to clinical abnormality. Essentially, CAM takes fully use of the last convolutional features and the class-specific weights of output layer, thus making it an efficient method for visual explanations. Other similar methods adopted different pooling methods (e.g., global max pooling, log-sum-exp pooling, and log-mean-exp pooling(B. Zhang et al., 2018)) to retrieve different visual CAMs. Besides, Selvaraju et al.(2017)proposed Grad-CAM by using gradients instead of weights of output layer to calculate CAMs. Grad-CAM greatly expands the applications of CAM without any modifications to deep models and without restriction to classification models. Table 1. Representative researches on explanation methods for deep models.
Reference Category Method & remark
Erhan et al. (2009)
Global explanation, White-box
Activation maximization; gradient ascent in the input space
Simonyan et al. (2013)
Global explanation, White-box
Activation maximization; gradient ascent in the input space
Local explanation, Black-box
Examine component importance; perturbing input image pixels
Zeiler et al. (2013)
Local explanation, White-box
Examine component importance; covering up different portions of input image with a gray square
Bach et al. (2015)
Local explanation, White-box
Backpropagation-based; class-specific error
Yosinski et al. (2015)
Global explanation, White-box
Activation maximization; a regularized optimization in image space
Zhou et al. (2016)
Local explanation, White-box
CAM-based; multiplying activation with the weights of output layer
Mahendran et al. (2016)
Local explanation, White-box
Backpropagation-based; visualizing features by mapping them back to the input pixel space
Zhang et al. (2016)
Local explanation, White-box
Backpropagation-based; class-specific error
Ribeiro et al. (2016)
Local explanation, Black-box
Examine component importance; perturbing the pixels of input image randomly to learn a linear explainable decision model
Selvaraju et al. (2017)
Local explanation, White-box
CAM-based; extending CAM by replacing the weights of output layer with the gradients
Fong et al. (2017)
Local explanation, Black-box
Examine component importance; learning meaningful perturbation on input image by gradient decent
Nguyen et al. (2017)
Global explanation, White-box
Activation maximization; a optimization regularized by a conditional generative adversarial network
Esteva et al. (2017)
Local explanation, White-box
Backpropagation-based; L1 norm of loss gradients to input layer
Rajpurkar et al. (2018)
Local explanation, White-box
CAM-based; explaining a deep model for abnormality detection in musculoskeletal X-rays
Rajpurkar et al. (2018)
Local explanation, White-box
CAM-based; explaining a deep model for abnormality detection in chest X-rays.
Zhang et al. (2018)
Local explanation, White-box
CAM-based; modifing CAM with different global pooling methods
Kermany et al. (2018)
Local explanation, White-box
Examine component importance; convolving an occluding kernel across the input image to find the clinically relevant regions of age-related macular degeneration and diabetic macular edema
Petsiuk et al. (2018)
Local explanation, Black-box
Examine component importance; perturbing input image with random masks to calculate the average mask weighted by class-specific output
Backpropagation-based methods. The other type of explanation methods is built on backpropagation
of class-specific gradients(Zeiler and Fergus, 2014; Mahendran and Vedaldi, 2016) or errors(Bach et al., 2015; Jianming Zhang et al., 2018). The visual explanation from backpropagation-based method is generally sharper than that from CAM-based methods. In particular, Esteva et al.(2017) adopted such a backpropagation-based approach to explain a deep model for skin cancer classification in dermoscopy images. The generated saliency maps demonstrated a highly clinical relevance to the skin cancer diagnosis (Fig. 3b).
Examining component importance. Another type of explanation methods(Petsiuk et al., 2018; Erhan et al., 2009; Simonyan et al., 2013; Ribeiro et al., 2016; Fong and Vedaldi, 2017; Kermany et al., 2018)
examine the importance of each component for decision-making by removing or modifying the components (e.g., pixel, image patch and segment) in an input image. The visual explanation is retrieved by combining each component weighted by its importance together. In particular, Kermany et al.(2018)
developed such an explanation method to explain deep models for the classification tasks of age-related macular degeneration and diabetic macular edema. The generated occlusion maps highlighted
pathological areas in retina OCT images, presenting a highly clinical relevance (Fig. 3c). Activation maximization. For global explanation, activation maximization(Erhan et al., 2009) is
generally performed as an optimization problem by gradient ascent algorithm, where the parameters required to be updated are image pixels instead of the weights of deep models. For better visualization, some studies(Yosinski et al., 2015; Nguyen et al., 2017) have tried to add various regularization constraints on activation maximization to make the optimization results more realistic.
Existing explanation methods for deep models are expected to help build trusty computer-aided diagnosis systems and accelerate such diagnosis systems to be implemented in clinical practice. The explanation results, i.e., visual attention map or saliency map, can also be used for weakly supervised object localization and instance segmentation to alleviate the demands for fine-grained annotations. However, a key limitation across these studies is that they cannot explain how robust a decision is, e.g., when deep models will fail in a specific biomedical image analysis task, which is very important to construct automatic and unattended diagnosis system. Besides, the majority of these methods are designed for CNN-based recognition tasks. When there exists multiple targets in one image, the produced visual explanation is prone to highlighting only the few most discriminative targets. The incompleteness of visual explanation might lead to a sub-optimal result. Future work likely focus on developing explanation methods for deep models in complex tasks to explain how robust the decisions are by digging deep into clinical expertise underlying biomedical images. Moreover, the cooperative mode how biomedical image analysis using deep learning techniques effectively help clinical specialists is also a research topic worthy of exploring.
Output
ActivationsGradientsWeights
Black-box methods
White-box methods
InputBlack-box model
White-box model
class 1
class 2
cN
0.02 0.01 0.88 0.54 0.05 0.00
0.07 0.38 0.17
lass
Localexplanation
Globalexplanation
Final diagnosisComputer-aided diagnosis system
Specialist
Computer-aided diagnosis
Visual explanation
Biomedical Images
Figure 2. Left-to-right: (a) A potential procedure of computer-aided diagnosis system applied in clinical practice. Explanation methods for the decision-making process are expected to help build trusty computer-aided diagnosis systems. (b) Taxonomy of explanation methods. For instance, the global explanation methods aims to explain a model by visualizing the model’s favorite “husky”, while the local explanation methods try to explain why a model think a input image is a “husky” by highlighting the regions that the model relies on to make the decision. The black-box models build the explanation with only the input and output of a model while the white-box explanation methods construct the explanation by using the input and output as well as internal information, such as activations, gradients and weights, etc.
Figure 3. Examples of visual explanation for deep models built on three different modalities of biomedical images. Left-to-right: (a) Visual CAMs. The visual CAMs(Rajpurkar et al., 2017) highlight the abnormal region in musculoskeletal X-rays, presenting a highly clinical relevance; (b) Saliency maps. In skin cancer classification based on dermoscopy images, the saliency maps(Esteva et al., 2017) describe the importance of each pixel for diagnosis, presenting a highly clinical relevance; (c) Occlusion maps. The occlusion maps(Kermany et al., 2018) highlight the clinically relevant areas of pathology for the classification task of age-related macular degeneration and diabetic macular edema in retina OCT images.
3. Weakly supervised learning techniques
With the popularity of high resolution biomedical images in clinical diagnosis, many fine-grained tasks have been introduced in biomedical image analysis, such as lesion localization and gland segmentation. Training deep models for these fine-grained tasks generally requires fine-grained annotations that are costly and difficult to acquire. A natural idea is to use relatively low-cost and coarse-grained annotations to perform these fine-grained tasks in biomedical image analysis. Weakly supervised learning is such a promising approach to alleviate the demands for fine-grained annotations.
Weakly supervised learning(Zhou, 2018) mainly addresses learning problems for incomplete, inaccurate and inexact supervision environments. In computer vision, the research frontier problems of weakly supervised deep learning have started a gradual transition from simple object localization tasks(Zhou et al., 2016; Fong and Vedaldi, 2017) to complex object detection tasks(Bilen and Vedaldi, 2016; Tang et al., 2017; Diba et al., 2017; Tang et al., 2018; Wan et al., 2018; Arun et al., 2019) and semantic segmentation tasks(Z. Huang et al., 2018; Li et al., 2018; Kervadec et al., 2019). The supervision modalities in weakly supervised learning also present a diversified development trend, involving image-level supervision(Bilen and Vedaldi, 2016), box supervision(Bonechi et al., 2018), scribble supervision(Lin et al., 2016; Can et al., 2018), point supervision(Bearman et al., 2016; Kozuka and Niebles, 2017), and webly supervision(Navarro et al., 2018; Jin et al., 2017), etc. The performance
of weakly supervised learning methods in various computer vision tasks has been significantly improved, revealing a tendency to catch up with the fully supervised learning algorithms. Taking object detection task as an example, the mean average precision (mAP) score of weakly supervised object detection has been increasingly improved on the PascalVOC 2007(Everingham et al., 2010) dataset (Fig. 4). The gap of mAP (i.e., mean of average precision) scores between weakly supervised object detection algorithms and a well-known fully supervised object detection algorithm, i.e., YOLO(Redmon et al., 2016), is less than 10%. Therefore, the weakly supervised learning techniques are expected to achieve excellent performance in fine-grained biomedical image analysis using weak annotations.
0.3483
0.412 0.4280.453
0.473
0.536
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
WSDDN (2016)
OICR (2017)
WCCN (2017)
WeakRPN (2018)
MELM (2018)
Pred Net (Ens) (2018)
mAP on PascalVOC 2007
9.4%
YOLO (2016) 0.630
Figure 4. Illustration of the progress of weakly supervised object detection methods, including WSDDN(Bilen and Vedaldi, 2016), OICR(Tang et al., 2017), WCCN(Diba et al., 2017), WeakRPN(Tang et al., 2018), MELM(Wan et al., 2018) and Pred Net (Ens)(Arun et al., 2019).
Table 2. Representative papers for weakly supervised deep learning techniques used in biomedical image analysis.
Reference Supervision modality Method Task Dataset
Gondal et al. (2017)
Image-level supervision
CAM-based attention map + thresholding method
Localization of Diabetic Retinopathy (DR) lesions
88,702 fundus images in Kaggle Dataset (https://www.kaggle.com/c/diabetic-retinopathy-detection) and 89 fundus images in DiaretDB1 Dataset(Kauppi et al., 2007)
Quellec et al. (2017)
Image-level supervision
Backpropagation-based attention map + thresholding method
Localization of Diabetic Retinopathy (DR) lesions
88,702 fundus images in Kaggle Dataset and 89 fundus images in DiaretDB1 Dataset
Rajchl et al. (2017)
Box supervision
CNN-based classifier + dense conditional random field (DenseCRF), iteratively updating the learning target classes for input patches
Brain and lung segmentation MR images of 55 fetal subjects(Damodaram et al., 2012)
Can et al. (2018)
Scribble supervision Region growing on scribble annotations + UNet + CRF, iteratively refining UNet
Cardiac segmentation and prostate segmentation
200 volumes in cardiac ACDC(Bernard et al., 2018) and 29 volumes in prostate NCI-ISBI(Bloch et al., 2015)
Fernando et al. (2018)
Webly-supervision
Augmenting dermoscopy images from the Internet and using transfer learning to deal with the natural noise of webly-supervision
Skin lesion classification 1300 skin lesion images(Ballerini et al., 2013)
Yang et al. (2018)
Box supervision
Combining fully convolutional network (FCN) for coarse segmentation and graph search (GS) for fine segmentation
Gland segmentation 14 whole-slide clinical H&E stained histology images of human intestinal tissues(Zhang et al., 2016a)
Lymph node segmentation 207 clinical ultrasound images of human neck lymph nodes(Zhang et al., 2016b)
Fungus segmentation 84 images captured by serial block-face scanning electron microscopy(Zhang et al., 2017)
Qu et al. (2019)
Point supervision
Combining Voronoi diagram and k-means clustering methods for coarse segmentation and modified UNet based on DenseCRF loss for fine segmentation
Nuclei Segmentation
40 images from 8 different lung cancer cases in Lung Cancer dataset and 30 image including a diversity of nuclear appearances from seven organs in MultiOrgan dataset(Kumar et al., 2017)
Kervadec et al. (2019)
Point supervision
ENet based on inequality (lower and upper bounds on size) constrained loss function
Cardiac segmentation
100 cine magnetic resonance (MR) exams covering: dilated cardiomyopathy, hypertrophic cardiomyopathy, myocardial infarction with altered left ventricular ejection fraction and abnormal right ventricle (https://www.creatis.insa-lyon.fr/Challenge/acdc/)
Vertebral body segmentation 23 3D T2-weighted turbo spin echo MR images from 23 patients (http://dx.doi.org/10.5281/zenodo.22304)
UNet based on inequality (lower and upper bounds on size) constrained loss function
Prostate segmentation Prostate transversal T2-weighted MR images of 50 patients (https://promise12.grand-challenge.org)
Many studies have started to apply weakly supervised learning techniques to train deep models for biomedical image analysis tasks (Table 2). These applications mainly include weakly supervised object localization(Gondal et al., 2017; Quellec et al., 2017) and weakly supervised semantic segmentation(Kervadec et al., 2019; Can et al., 2018; Rajchl et al., 2017; Yang et al., 2018; Qu et al., 2019) , where the supervision modalities involve image-level supervision(Gondal et al., 2017; Quellec et al., 2017), scribble supervision(Can et al., 2018), box supervision(Rajchl et al., 2017; Yang et al., 2018), and point supervision(Qu et al., 2019; Kervadec et al., 2019). Lesion localization is very important to clinical diagnosis. For instance, accurate localization of Diabetic Retinopathy (DR) lesions in fundus retina photos is of great help to improve the performance of DR lesion recognition. To implement weakly supervised localization, researchers(Gondal et al., 2017; Quellec et al., 2017) first train a CNN-based classifier with image-level annotations (e.g., whether a fundus photo contains DR lesions or not). Second, they retrieve attention maps or saliency maps that highlight the discriminative and lesion-related regions in biomedical images through the explanation methods for this CNN-based classifier. Finally, the lesion areas are localized by performing thresholding methods on attention maps or saliency maps (Fig. 5a). For weakly supervised segmentation tasks, scribble, box and point supervision modalities are generally used in biomedical image analysis. A representative solution is an iterative optimization with three steps (Fig. 5b), including (1) pre-processing for initial pseudo pixel-level labels (e.g., region growing(Can et al., 2018), Voronoi diagram and k-means clustering methods(Qu et al., 2019)), (2) training deep segmentation model (e.g., UNet(Kervadec et al., 2019; Can et al., 2018; Qu et al., 2019), FCN(Yang et al., 2018) and ENet(Kervadec et al., 2019)) with pseudo pixel-level labels, and (3) post-processing (e.g., GS(Yang et al., 2018) and DenseCRF(Can et al., 2018; Rajchl et al., 2017; Qu et al., 2019)) for refined pseudo pixel-level labels.
Refine
Wea
k s
uper
vis
ion
(box
, sc
ribble
or
poin
t) Pseudo
pixel-level
labels
Pre-processing
(region growing/
k-means clustering)
Post-processing(DenseCRF/
Graph Search)
Training deep segmentation networks(UNet/FCN/ENet )
Abnorm
al?
Gradients
Weakly supervised object localization
Gra
d-C
AM
Thresholding
method
Training CNN with image-level annotations
Figure 5. Illustration of weakly supervised learning techniques. Left-to-right: (a) A representative solution for weakly supervised object localization using image-level annotations. (b) A representative for weakly supervised semantic segmentation using box, scribble or point supervision.
Many successful applications have demonstrated that weakly supervised learning techniques are
expected to relieve the urgent demands of deep models for fine-grained annotations in many fine-grained biomedical image analysis tasks. Although the performance of weakly supervised learning has been notably improved, there still exists a gap between weakly supervised learning and fully supervised learning that requires to be furtherly studied. Besides, how to utilize the results of weakly supervised tasks to improve the performance of original tasks might be a potential and extended research topic.
4. Transfer learning techniques
Labeled biomedical images are scarce, while large-scale natural image sets with various levels of annotations are publicly available, such as ImageNet, COCO, and PascalVOC, etc. A natural idea is whether the annotated samples from other domains can help improve deep models in biomedical image analysis. Unfortunately, when the deep models trained in source domain (e.g., natural images and synthetic images) are applied to the target domain, i.e., biomedical images, the performance of deep models generally drops a lot. Transfer learning is a key technique to solve the problem of model adaptation across domains.
Task2 (Modality 2)
Tas
k-s
pec
ific
lay
ers
Lea
rn a
sh
ared
feat
ure
re
pre
sen
tati
on
s
Inh
eren
t co
rrel
atio
ns
bet
wee
n t
wo
task
s Task1 (Modality 1)
Biomedical images
Biomedical images
Tas
k-s
pec
ific
lay
ers
Fix
ed f
eatu
re e
xtr
acto
r
Pre-training
ImageNetCOCO
Biomedical images
Fine-tuning
Tas
k-s
pec
ific
lay
ers
Lea
rn d
om
ain-i
nvari
ant
featu
re r
epre
sen
tati
on
s
Do
mai
n d
iscr
imin
ato
r
source or target
Source domain
Labeled
Target domain
Unlabled
Bac
kbone
net
wo
rk
Figure 6. Illustration of transfer learning techniques. Left-to-right: (a) Fine-tuning. Deep models are initially pre-trained on ImageNet or COCO and sequentially fine-tuned on biomedical images by fixing some layers and updating weights of task-specific layers at a small learning rate. (b) Multi-task learning. Deep models for multi-task learning that share some layers and differ in task-specific layers are trained alternatively across different but related tasks. (c) Adversarial training. A domain discriminator branch network is added to standard deep model to motivate backbone network to learn domain invariant feature representation. The entire model is optimized alternatively similarly with multi-task learning.
Building every deep model from scratch is time-consuming and relies heavily on large-scale annotated
samples. Transfer learning techniques focuses on extracting the knowledge gained while solving problems in source domain and applying it to different but related problems in target domain. For convenience and efficiency, the image sets with large-scale annotated samples are prone to being selected as source domain in many computer vision applications. To sum up, the transfer learning methods have roughly experienced five stages development(Zhang, 2019), including instance re-weighting adaptation(Huang et al., 2007), classifier adaptation(Lixin Duan et al., 2012), feature adaptation(Xu et al., 2016), deep network adaptation(Long et al., 2015) and adversarial adaptation(Chen et al., 2018). The research frontier problems of transfer learning techniques have been gradually moved from simple image classification tasks(Lixin Duan et al., 2012) to complex and fine-grained tasks, such as object detection(Chen et al., 2018; Inoue et al., 2018) and semantic segmentation(Yang Zhang et al., 2017; Kamnitsas et al., 2017a), etc. Compared with deep models trained from scratch in target domain or trained in source domain but applied directly in target domain, deep models using transfer learning techniques generally achieve much better performance in various computer vision tasks. Therefore, transfer learning might be a promising technique for applying deep learning methods effectively to biomedical image analysis tasks, especially in the small sample cases.
Many studies have adopted transfer learning techniques to improve deep models in biomedical image analysis tasks (Table 3), mainly involving fine-tuning(Kermany et al., 2018; Tajbakhsh et al., 2016; Zhou et al., 2017; Zhou et al., 2019), multi-task learning(Moeskops et al., 2016; Liao and Luo, 2017; Samala et al., 2017) and adversarial training(Dong et al., 2018; Moeskops et al., 2017; Javanmardi and Tasdizen, 2018).
Fine-tuning might be one of the most popular transfer learning techniques. This method (Fig. 6a) generally adapts deep models (e.g., AlexNet(Tajbakhsh et al., 2016; Zhou et al., 2017; Zhou et al., 2019) and InceptionV3(Kermany et al., 2018)) pre-trained on ImageNet to biomedical image analysis by freezing some layers in pre-trained deep models while retraining the others (task-specific layers). The fixed layers are supposed to encode the knowledge gained while solving problems in source domain and act as a fixed feature extractor. During fine-tuning process, the weights of deep models are generally updated at a small learning rate nearby the values of pre-trained weights, equivalent to narrowing the search space of parameters. Thus, fine-tuning method is expected to reduce the dependency of deep models on large-scale annotated biomedical images. Existing studies(Kermany et al., 2018; Tajbakhsh et al., 2016; Zhou et al., 2017; Zhou et al., 2019) consistently reveal that deep models trained by fine-tuning method are more robust and accurate than counterparts trained from scratch in various biomedical image analysis tasks.
Multi-task learning aims to exploit the inherent correlations amongst different but related tasks, such as similar tasks in different biomedical imaging modalities(Moeskops et al., 2016; Samala et al., 2017), and different tasks in same biomedical imaging modalities(Liao and Luo, 2017). The deep models modified for multiple tasks (e.g., ImageNet DCNN(Samala et al., 2017) and ResNet-50(Liao and Luo, 2017)) generally share some layers to learn a same feature representation but differ in task-specific layers to perform multiple tasks (Fig. 6b). Thus, these multi-task deep models can be jointly trained on multiple image sets for multiple biomedical image analysis tasks. In comparison of respective single-task learning, multi-task learning techniques in these studies(Moeskops et al., 2016; Liao and Luo, 2017; Samala et al., 2017) are prone to achieving better results. It implies that multi-task learning techniques have great potential to effectively apply train deep models in biomedical image analysis when training samples from
a single modality are limited. Table 3. Representative papers using transfer learning techniques in biomedical image analysis based on deep models.
Reference Transfer learning techniques
Tasks Dataset Deep models
Tajbakhsh et al. (2016)
Fine-tuning
Polyp detection 40 short colonoscopy videos
AlexNet Pulmonary embolism detection
121 CT pulmonary angiography datasets with a total of 326 PEs
Colonoscopy frame classification 6 complete colonoscopy videos Intima-media boundary segmentation
92 carotid intima-media thickness videos
Moeskops et al. (2016)
Multi-task learning Anatomical structure segmentation 34 MR brain images, 34 MR breast images and 10 cardiac CT angiography scans
Customized CNN model
Zhou et al. (2017)
Fine-tuning
Colonoscopy frame classification 6 complete colonoscopy videos
AlexNet Polyp detection 38 short colonoscopy videos
Pulmonary embolism detection 121 CT pulmonary angiography datasets with a total of 326 PEs
Moeskops et al. (2017)
Adversarial training
Brain MRI segmentation 35 MR brain images from adult subjects and 20 axial MR brain images from elderly subjects
Customized CNN model
Samala et al. (2017)
Multi-task learning Classification of malignant and benign breast masses
1,655 SFM views and 310 DM views with 2454 masses (1057 malignant, 1397 benign)
ImageNet DCNN
Liao et al. (2017)
Multi-task learning Skin lesion classification 21657 dermoscopy images that contain both the skin lesion and body location labels
ResNet-50
Zhou et al. (2018)
Fine-tuning Carotid intima-media thickness video interpretation
92 carotid intima-media thickness videos from 23 patients
AlexNet
Kermany et al. (2018)
Fine-tuning
Age-related macular degeneration and diabetic macular edema classification
207,130 retina OCT images Inception V3 architecture
Pneumonia Detection 5,232 chest X-ray images from a total of 5,856 patients
Dong et al. (2018)
Adversarial training
Chest organ segmentation 247 grayscale chest X-rays and 221 grayscale chest X-rays
FCN-based model
Mehran et al. (2018)
Adversarial training
Eye vasculture segmentation 40 eye fundus images FCN-based model Neuron membrane detection
100 electron microscopy images of size 1024×1024 and 125 electron microscopy images of size 1250×1250
Domain adversarial training techniques try to enforce deep models to learn a domain invariant
representation (e.g., deep features) for related tasks across different domains (e.g., two biomedical image sets from different sources(Dong et al., 2018; Javanmardi and Tasdizen, 2018)) or motivate deep models to generate outputs close to ground-truth(Dong et al., 2018; Moeskops et al., 2017). In general, these techniques are implemented by adding a branch network of domain discriminator to a standard deep model, where domain discriminator and standard deep model share one backbone network (Fig. 6c). Thus, similar to multi-task learning, the entire model can be optimized alternatively across different domains. Domain adversarial training techniques are expected to reduce the distribution discrepancy of deep features(Dong et al., 2018; Javanmardi and Tasdizen, 2018) (or outputs(Dong et al., 2018; Moeskops et al., 2017)) across domains and thus improve the performance of deep models in biomedical image analysis even without the requirement of annotated samples in target domain.
This key technique for deep learning of small sample in biomedical image analysis mainly focuses on how to extract the knowledge gained while solving problems in source domain and apply it to different but related problems in target domain, thus making the deep models converge quickly, robustly and accurately especially in small sample cases. Although there exists various transfer learning techniques, fine-tuning a deep model that is pre-trained on ImageNet, COCO or PascalVOC might be a good start to perform biomedical image analysis tasks with inadequate annotated samples.
5. Active learning techniques
The performance of deep models is gradually improved with the increase of annotated samples before reaching an inflection point, while after that the increase of annotated samples will not improve deep models much (Fig. 7a). Not every annotated sample contributes equally to deep models. Selecting the few most valuable samples to accelerate the arrival of inflection point might be a potential solution to reduce the dependence of deep models on large-scale annotated samples, and that is what active learning techniques expect to do. Due to the high cost of labeling multimodal biomedical images, it is of great value to achieve the performance of deep models trained on small-scale biomedical images comparable to that of deep models trained on large-scale biomedical images. The active learning technique might be a promising approach to achieve this goal. The problem that active learning techniques expect to solve is how to train an effective model with the least annotation cost. There are generally four key components in active learning framework (Fig. 7b), including low-cost unlabeled samples, an oracle, a task-specific model and a querying strategy, where the oracle plays the role of an annotator (e.g., biomedical expert). At a given point, querying strategy selects a sample that is most valuable to train task-specific model from low-cost unlabeled samples. The selected sample is labeled by oracle and is subsequently added to training set to furtherly train task-specific model. The core problem that has been investigated over years in active learning is how to customize a querying strategy for a specific task. Many querying strategy frameworks have been
proposed, involving query by committee, expected model change, uncertainty sampling and expected error reduction, etc. The majority of these querying strategies are designed to query the labels for the most informative sample(Zhou et al., 2017; Zhou et al., 2019; K. Wang et al., 2017; Huang et al., 2014) or (and) the most representative sample(Zhou et al., 2019; Huang et al., 2014; S.-J. Huang et al., 2018; Wang et al., 2016). Compared with the random sampling strategy that is generally used to train deep models (also called passive learning), the querying strategy in active learning is able to use the selected and labeled samples to train deep models more efficiently and effectively. The current research of active learning techniques focus on solving complex problems in practical application scenarios, such as active learning with weak supervision(Hua et al., 2018; Xu et al., 2017) , cost-sensitive active learning(S.-J. Huang et al., 2017; Yan and Huang, 2018) and active learning for deep models(Hua et al., 2018; Sener and Savarese, 2017; K. Wang et al., 2017) , etc.
Oracle Labeled samples
Outputs
Features
Train
Unlabeled samples
Inference
Queryingstrategy
Entropy
Diversity
DistanceDeep model
Number of training samples with annotations
Per
form
an
ce o
f d
eep
mo
del
s
Inflection point
Active learningRandom sampling
Figure 7. Left-to-right: (a) The learning curve of model performance over training sample size; (b) illustration of active learning
framework. Table 4. Representative papers using active learning techniques in biomedical image analysis based on deep models.
Reference Querying strategy Tasks Dataset Deep models
Zhou et al. (2017)
Taking the diversity and entropy of model’s predictions for augmented unlabeled samples as a sampling criterion
Colonoscopy frame classification
6 complete colonoscopy videos
AlexNet Polyp detection 38 short colonoscopy videos
Pulmonary embolism detection
121 CT pulmonary angiography datasets with a total of 326 PEs
Yang et al. (2017)
Taking the similarity and uncertainty of unlabeled samples determined by multiple FCNs’ predictions as querying strategy for annotation suggestion
Gland segmentation 165 colon histology images(Sirinukunwattana et al., 2017) FCN
Lymph node segmentation
74 ultrasound images(Zhang et al., 2016b)
Chowdhury et al. (2017)
Uncertainty sampling based on model’s confidence in unlabeled patches
Cellular segmentation
NIH-3T3 (mouse embryonic fibroblasts), MCF10A (human breast epithelial cells) and HeLa-S3 (cervix adenocarcinoma)(Van Valen et al., 2016)
Customized CNN-based model
Gorriz et al. (2017)
Pixel-wise uncertainty built by using Monte Carlo dropout to deep model during testing on unlabeled samples
Melanoma segmentation
2000 RGB dermoscopy images(Gutman et al., 2016)
UNet
Folmsbee et al. (2018)
Uncertainty sampling based on the qualitative comparison of the model’s outputs for unlabeled samples by a trained pathologist
Tissue classification in oral cavity cancer
143 tissue pathology slides from oral cavity cancer tumor resections
Customized CNN-based model
Sourati et al. (2018)
Sampling criterion formulated by diversified Fisher information
Newborn and adolescent brain segmentation
66 brain MR images of adolescents and 25 brain MRI images of newborns(Makropoulos et al., 2018)
Customized CNN-based model
Ozdemir et al. (2018)
Uncertainty sampling based on prediction variance and representativeness sampling based on content distance
Bone and muscle tissue segmentation
36 MR images of patients diagnosed with rotator cuff tear (shoulders)
DCAN(Chen et al., 2017)
Zhou et al. (2018)
Taking the diversity and entropy of model’s predictions for augmented unlabeled samples as a sampling criterion
Carotid intima-media thickness video interpretation
92 carotid intima-media thickness videos from 23 patients
AlexNet
Smailagic et al. (2018)
Taking the maximum mean distance between unlabeled and labelled samples in the oriented FAST and rotated BIREF feature space as sampling criterion
Diabetic retinopathy detection
1200 eye fundus images from 654 diabetic and 546 healthy patients
Inception V3 Breast cancer classification
400 breast tissue cells images(Aresta et al., 2019)
Skin cancer classification
900 benign and malignant cell tissue images(Gutman et al., 2016)
In order to dramatically reduce annotation cost, many studies have adopted active learning techniques to train deep models in biomedical image analysis tasks (Table 4), mainly involving classification tasks(Zhou et al., 2017; Zhou et al., 2019; Chowdhury et al., 2017; Folmsbee et al., 2018; Smailagic et al., 2018) and segmentation tasks(Yang et al., 2017; Gorriz et al., 2017; Ozdemir et al., 2018; Sourati et al., 2018). For relatively simple biomedical image classification tasks, the predictions of deep classification models for unlabeled samples are generally useful for active learning techniques. Many querying strategies have formulated the criterion for uncertainty sampling based on model’s predictions, such as the symmetric Kullback Leibler divergence and entropy of model’s predictions for augmented samples(Zhou et al., 2017; Zhou et al., 2019), model’s confidence in unlabeled patches(Chowdhury et al., 2017) and subjective comparison of CNN’s predictions from an experienced pathologist(Folmsbee et al., 2018). Regarding to relatively complex biomedical image segmentation tasks, the uncertainty of model’s prediction mask for individual unlabeled sample and the representativeness amongst unlabeled samples based on the distance of deep features are generally combined to formulate the sampling criterion for querying strategy(Yang et al., 2017; Ozdemir et al., 2018; Sourati et al., 2018). To formulate a more effective criterion for uncertainty sampling, there generally exists three major approaches, where multiple predictions are required for the same candidate unlabeled sample. First, under the consideration that an uncertain sample is prone to making deep models sensitive to data augmentation, the sampling criterion can be formulated based on model’s predictions for multiple augmented samples of a candidate(Zhou et al., 2017; Zhou et al., 2019). Second, with the claim that an uncertain sample is prone to being sensitive to dropout operation applied to CNN while a confident sample is generally less sensitive to dropout, the sampling criterion can be formulated based on the inconsistency of model’s predictions at different random dropout conditions(Gorriz et al., 2017; Ozdemir et al., 2018). Third, the strategy of query by committee trains multiple deep models to construct a committee and thus adopt the variance of these models’ predictions as sampling criterion(Yang et al., 2017). Existing studies using active learning techniques to train deep models on multimodal biomedical images have achieved impressive results in reducing the required annotation cost. This key technique for deep learning of small sample data mainly focuses on how to customize a querying strategy for a specific biomedical image analysis task that can select the few most valuable samples for annotation, and how to effectively train deep models with these actively selected samples. However, querying for the few most valuable samples from unlabeled samples is computationally expensive in general and in practice, training deep models with selected samples effectively is very challenging. Combining active learning techniques and transfer learning techniques(Zhou et al., 2017; Zhou et al., 2019) might be one promising research method. Such combination has the potential to make computer-aided diagnosis system based on deep models incrementally evolve through online feedback of false instances corrected by biomedical experts, where the challenge is how to deal with catastrophic forgetting of false instances.
6. Miscellaneous techniques
6.1 Domain knowledge
Biomedical image analysis is a highly specialized work, which requires large expertise. In general, there is a large amount of prior domain knowledge in a specific biomedical image analysis task. The prior domain knowledge is able to regularize deep models in a certain sense, which can narrow the parameter space of deep models and thus relieve overfitting. Therefore, combining domain knowledge might be a promising approach to alleviate the dependence of deep learning on large-scale labeled samples in medical image analysis applications at the meantime of improving the robustness of deep models.
Table 5. Representative papers using domain knowledge in biomedical image analysis based on deep models.
There are two main approaches to combine domain knowledge and deep learning in biomedical image
analysis tasks (Table 5): (1) the domain knowledge is characterized and digitized by hand-crafted features, which is furtherly fused with deep features(Xie et al., 2019; Xie et al., 2018) , and (2) the domain knowledge is used as a design guide for deep networks(Yan et al., 2015; Fu et al., 2019). Deep models using prior domain knowledge in these studies(Xie et al., 2019; Xie et al., 2018; Yan et al., 2015; Fu et al., 2019) consistently achieved superior performance over standard methods.
Domain knowledge is important to train deep models for a specific biomedical image analysis task. However, the generality and end-to-end learning framework of deep models suffers due to the use of
Reference Knowledge & method Tasks
Xie et al. (2018)(2019)
The malignant nodule generally has a lobed contour, an unsmooth edge and an inhomogeneous density; characterized and digitized by hand-crafted features, and furtherly fused with deep features in decision-level.
Benign-malignant nodule diagnosis in pulmonary CT images
Yan et al. (2015)
Radiologists usually recognize the body part of a slice based on local discriminative regions rather than global image context; used as a design guide for multi-stage deep network.
CT slice-based body part recognition
Fu et al. (2019)
Global anterior segment structure, local iris region and anterior chamber angle (ACA) patch are clinically important to angle-closure detection; used as a design guide for multilevel deep network.
Closed-angle detection in anterior segment OCT (AS-OCT) images
task-specific domain knowledge. Besides, distilling prior domain knowledge requires years of clinical experience in biomedical image analysis, where only experienced biomedical experts can do that. Thus, digging deep into clinical expertise underlying biomedical images through collaboration with clinicians might be a good start to apply deep learning techniques in biomedical image analysis tasks with inadequate annotated samples.
6.2 Shallow methods
Compared with the traditional shallow methods based on hand-crafted features, deep learning techniques have made significant progress in various computer vision applications, which relies heavily on large-scale labeled samples. With inadequate training samples, the deep models are prone to suffering from overfitting, thus leading to a big drop in performance. In comparison of deep learning techniques, the shallow methods generally have lower dependency on large-scale labeled samples due to the more general hand-crafted features and lower model complexity. A natural research idea is to combine shallow methods and deep models to alleviate the dependence on large-scale labeled samples and improve performance. Shallow methods and deep models can be combined in four different levels for biomedical image analysis (Table 6), including input-level(Wang et al., 2014) (e.g., original image and edge image), feature-level(Lai and Deng, 2018) (e.g., deep features and hand-crafted features), decision-level(Xie et al., 2018; Wang et al., 2014; Jianpeng Zhang et al., 2018) (e.g., CNN-based classifier and SVM) and mixed-level(Wang et al., 2014) (e.g., SVM classifier based on deep features). The combined model is expected to inherit the advantages of both shallow methods and deep learning techniques to improve the analysis performance and generalization especially in small sample cases. Table 6. Representative papers combining shallow methods and deep models in biomedical image analysis.
In summary, combining traditional shallow methods and deep models is a promising approach to apply
deep learning of small sample in biomedical image analysis. This key technique mainly focuses on how to select appropriate hand-crafted features and design the fusion strategy for a specific biomedical image analysis task. However, a key limitation of these studies is that the fusion strategy is so tricky that an inappropriate combination might aggravate overfitting problem in small sample cases, thus leading to a big drop in performance. Prior domain knowledge might be a good design guide for hand-crafted features and fusion strategy.
6.3 Attention mechanism
Table 7. Representative papers using attention mechanism in biomedical image analysis based on deep models.
Deep models show an attention mechanism of background suppression and foreground enhancement
in feedforward operation. As mentioned earlier, this attention mechanism can be used for visual explanation and weakly supervised object detection and segmentation. The attention mechanism is a useful tool to refine the deep feature representations. Through regulating the attention of deep models, deep models can observe input image more “carefully”, which is expected to improve the performance
Reference Fusion level & method Tasks
Wang et al. (2014)
Input-level; edges based on a thresholding method and Laplacian of Gaussian + CNN-based classifier Mitosis detection in breast
cancer pathology images Mixed-level; random forest classifier + the deep features Decision-level; 2 random forest classifiers + a CNN-based classifier
Xie et al. (2018)
Decision-level; texture descriptor based on gray level co-occurrence matrix (GLCM), Fourier shape descriptor and CNN-based deep features are fused by three AdaBoosted backpropagation neural networks (BPNN)
Benign-malignant nodule diagnosis in pulmonary CT images
Zhang et al. (2018)
Decision-level; the bag of feature (BoF), local binary pattern (LBP) features and CNN-based deep features are fused by BPNN
ImageCLEF 2016 medical classification task(García Seco de Herrera et al., 2016)
Lai et al. (2018)
Feature-level; hand-crafted features based on texture and color statistics and CNN-based deep features are fused by the multilayer perceptron
Skin lesion classification and tissue classification tasks
Reference Method Tasks
Wang et al. (2018)
Adding an attention network branch based on Grad-CAM to a ResNet-152 network
Lung disease diagnosis in chest X-ray images
Schlemper et al. (2018)
Adding two designed attention-gated branches to a CNN-based network
Ultrasound scan plane detection
Schlemper et al. (2018)
Adding attention-gated skip connections to a UNet Pancreas segmentation in abdominal 3D CT scans
Zhang et al. (2019)
Using designed attention residual learning (ARL) block Skin lesion classification
of deep learning in small sample cases. In some biomedical image analysis tasks (Table 7), researchers have tried to regulate the attention of
deep models by adding attention branch network(Wang and Xia, 2018; Schlemper et al., 2018; Oktay et al., 2018) or self-attention blocks(Zhang et al., 2019) to standard deep models. These methods notably improve the analysis performance in comparison of counterparts without attention regulation. Properly regulating the attention of the deep models can be very helpful to biomedical image analysis. The attention mechanism enables deep models to focus on both global and local representation of input images simultaneously without using any extra localization modules. Such ability is equivalent to stimulating deep models to observe the input image more “carefully”, which is much more valuable for biomedical image analysis in small sample cases. Besides, this technique might be an available approach to extend weakly supervised learning techniques (e.g., weakly supervised object localization) by utilizing the (intermediate) results of weakly supervised tasks (e.g., attention map or lesion location) to improve the performance of original tasks (e.g., image classification).
6.4 Data Augmentation
Visual appearance of the object of interest generally experiences various variations, involving illumination variation, scale variation, ration, translation and deformation, etc. Machine learning theory claims that training a model on an image set with higher diversity is prone to getting a more robust and generalized model. However, collecting and labeling all the images that involve these variations seems to be an impossible task. Data augmentation techniques are thus proposed to increase both the amount and diversity of training samples by applying various transformations (e.g., horizontally flipping) to original samples. Intuitively, data augmentation is used to teach a model about invariances behind the variations in data domain(Cubuk et al., 2019). Many data augmentation techniques have been proposed with the development of deep learning, concisely categorized as classical transformations (Rajpurkar et al., 2018; Rajpurkar et al., 2017) , generative adversarial network (GAN)(S.-W. Huang et al., 2018; Pan et al., 2018) and learning to augment(Cubuk et al., 2019; Perez and Wang, 2017).
Classical transformations. Classical transformations mainly involve horizontal flipping, random cropping (extracting patches), random scaling, random rotating, translations, shearing and elastic deformations, which can be performed on-the-fly or off-line. In general, the classical transformations are widely adopted in deep learning techniques and the best augmentation strategies are dataset-specific and task-specific. To be more concise, we survey the classical transformation techniques based on some representative applications in biomedical image classification and segmentation tasks (Table 8). Random flipping, random rotating, and random cropping are the three most frequently-used classical transformations. Moreover, extracting patches from original images is prone to being used to augment training samples from high-resolution biomedical images, especially in small sample cases. Besides, elastic deformations seem to be important to biomedical image segmentation tasks since deformations are the most common variations in tissue as Ronneberger et al.(2015) pointed out.
GAN. GAN techniques have been widely used to synthesize biomedical images(Pan et al., 2018; Huo et al., 2018; Chen et al., 2019). The synthetic biomedical images show a remarkable competitiveness in data augmentations due to their variety and reality in appearance variations. Biomedical images are generally synthesized from one modality to another, such as synthesizing PET from MRI in diagnosis of Alzheimer’s disease (AD)(Pan et al., 2018) and synthesizing CT from MR in cross-modality medical image segmentation tasks(Huo et al., 2018; Chen et al., 2019). The synthetic images can be furtherly used for all kinds of biomedical image analysis with or without images from source modality. For further exploration, various applications of GANs in biomedical imaging can be found in Yi’s work(Yi et al., 2018). Although the GAN techniques have achieved many successes in biomedical image synthesis, it is worth noting that there still exists lots of risks involving non-convergence, mode collapse and highly sensitive hyper-parameter selections.
Learning to augment. Compared with classical transformations, learning to augment aims to find the most effective data augmentation strategy automatically, which is sort of similar to the active sampling strategy in active learning techniques. For convenience, the majority of existing studies(Cubuk et al., 2019; Perez and Wang, 2017) evaluate their methods of learning to augment on natural datasets. Inspired by these studies, Zhao et al. (2019) proposed an automated data augmentation method for one-shot brain MRI segmentation. Specifically, they designed two independent spatial and appearance transformation models based on the UNet architecture(Ronneberger et al., 2015) to learn how to implement natural variations in MRI scans. The two transformation models were sequentially used to generate realistic labeled MRI scans to train segmentation model. This method achieved superior performance over state-of-the-art methods for one-shot biomedical image segmentation.
Data augmentation is very important to deep learning techniques that are applied in biomedical image analysis, especially in small sample cases. The augmentation techniques are able to increase both the amount and diversity of training samples with lower cost in comparison of manual acquisition. Proper augmentation helps relieve the suffering of deep models from small sample learning and make deep models robust to variations. However, aggressive and improper augmentation might break the nature distribution of training samples, thus leading to a notable drop in the analysis performance of practical applications. Therefore, a soft and stepwise augmentation strategy is suggested to be used in biomedical image analysis.
Table 8. Classical transformations applied in some representative biomedical image classification and segmentation tasks based on deep models.
7. Conclusion
A comprehensive survey on key SSL techniques for deep learning of small sample in clinical biomedical image analysis is presented. First, we introduce explanation techniques for deep models that are expected to satisfy the demands of clinical diagnosis for explainable and transparent decision-making process. Second, we present weakly supervised learning techniques to deal with the problem how to effectively train deep models with coarse-grained annotations for fine-grained biomedical image analysis tasks. Third, transfer learning techniques are surveyed to answer this question how to retrieve the knowledge gained by deep models while solving problems in other domains and reuse it in biomedical
Task Reference Dataset Data augmentation strategy
Classification
Anthimopoulos et al. (2016) 135 CT scans with 512×512 pixels per slice Extracting patches, random flipping and
random rotating
Tajbakhsh et al. (2016)
40 short colonoscopy videos Extracting patches, scaling, translation, horizontal and vertical mirroring and flipping
121 CT pulmonary angiography datasets with a total of 326 PEs
Extracting patches, translation and rotating
6 complete colonoscopy videos Extracting patches
Esteva et al. (2017) 129,450 skin lesion images comprising 2,032 different diseases
Random rotating, cropping and vertical flipping
Gondal et al. (2017) 88,702 fundus images in Kaggle Dataset and 89 fundus images in DiaretDB1 Dataset(Kauppi et al., 2007, p. 1).
Rotating, horizontal and vertical flipping
Quellec et al. (2017) 88,702 fundus images in Kaggle Dataset and 89 fundus images in DiaretDB1 Dataset(Kauppi et al., 2007, p. 1).
Rotating, translation, scaling, horizontal flipping and cropping
Zhou et al. (2017)
6 complete colonoscopy videos Extracting patches
38 short colonoscopy videos Scaling, translation, mirroring and flipping
121 CT pulmonary angiography datasets with a total of 326 PEs
Extracting patches, scaling, translation and rotating
Samala et al. (2017) 1,655 SFM views and 310 DM views with 2454 masses (1057 malignant, 1397 benign)
Flipping and rotating
Rajpurkar et al. (2018) 40,561 multi-view musculoskeletal radiographs
Random horizontal flipping and rotating
Fernando et al. (2018) 1300 skin lesion images(Ballerini et al., 2013) Web-crawling
Smailagic et al. (2018)
1200 eye fundus images from 654 diabetic and 546 healthy patients
Cropping, flipping and translation 400 breast tissue cells images(Aresta et al., 2019) 900 benign and malignant cell tissue images(Gutman et al., 2016)
Xie et al. (2018) 1018 clinical chest CT scans with lung nodules(Armato et al., 2011)
Random translation, rotating and horizontal or vertical flipping
Lai et al. (2018)
2828 fundamental tissue images in HIS2828 dataset Randomly cropping, horizontal or
vertical flipping 2000 skin lesion images in ISIC2017 dataset(Codella et al., 2018)
Schlemper et al. (2018) 2694 2D ultrasound examinations of volunteers with gestational ages between 18 and 22 weeks
Horizontal and vertical translation, horizontal flipping, rotating and scaling
Zhang et al. (2019) 2750 skin dermoscopy images in ISIC2017 dataset(Codella et al., 2018)
Random rotation, scaling, horizontal and vertical flip
Segmentation
Ronneberger et al. (2015)
30 images (512x512 pixels) from serial section transmission electron microscopy
Elastic deformations and drop-out 35 partially annotated light microscopic images 20 partially annotated light microscopic images
Chen et al. (2016) 85 histopathological images (benign / malignant = 37 / 48)
Translation, rotating, and elastic deformations
Kamnitsas et al. (2016) 61 brain MRI sequences Adding images reflected along the
sagittal axis and intensity shifting
Isensee et al. (2017) 135 MRI scans of glioblastoma and 108 MRI scans of lower grade glioma
Random rotating, scaling, elastic deformations, gamma correction augmentation and mirroring
Schlemper et al. (2018)
150 gastric cancer 3D CT scans(Roth et al., 2017) Affine transformations, horizontal
flipping and random cropping 82 contrast enhanced 3D CT scans with pancreas(Roth et al., 2016)
Laves et al. (2018) 536 laryngeal endoscopic images Horizontal flipping and rotating
Kervadec et al. (2019) Prostate transversal T2-weighted MR images of 50 patients (https://promise12.grand-challenge.org)
Mirroring, flipping and rotating
Isensee et al. (2019)
100 cine-MRI images Elastic deformations, random scaling, random rotating, gamma augmentation, and spatial transformations
131 CT images 30 abdominal CT images 50 anisotropic MRI images
image analysis. Fourth, active learning techniques are discussed to solve the problem how to select the few most valuable samples for annotations and train deep models with them for optimal performance in biomedical image analysis. Finally, we study miscellaneous techniques that are also important to apply deep models for biomedical image analysis, involving data augmentation, domain knowledge, traditional shallow methods and attention mechanism. These key techniques are expected to effectively support the application of deep learning in clinical biomedical image analysis, and furtherly improve the analysis performance, especially when large-scale annotated samples are not available. To our best knowledge, this is the first attempt to present a comprehensive survey on key SSL techniques for deep learning of small sample in biomedical image analysis through combining with the development of related techniques in computer vision applications. Notably, our survey focusing on key SSL techniques can be mutually complementary with previous reviews on deep learning techniques in biomedical image analysis.
Acknowledgements
The authors would like to thank members of the Medical Image Analysis for discussions and suggestions. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
References
Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S., 2016. Lung Pattern
Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network. IEEE
Trans. Med. Imaging 35, 1207–1216. https://doi.org/10.1109/TMI.2016.2535865
Anwar, S.M., Majid, M., Qayyum, A., Awais, M., Alnowami, M., Khan, M.K., 2018. Medical Image
Analysis using Convolutional Neural Networks: A Review. J Med Syst 42, 226.
https://doi.org/10.1007/s10916-018-1088-1
Aresta, G., Araújo, T., Kwok, S., Chennamsetty, S.S., Safwan, M., Alex, V., Marami, B., Prastawa, M.,
Chan, M., Donovan, M., Fernandez, G., Zeineh, J., Kohl, M., Walz, C., Ludwig, F., Braunewell,
S., Baust, M., Vu, Q.D., To, M.N.N., Kim, E., Kwak, J.T., Galal, S., Sanchez-Freire, V., Brancati,
N., Frucci, M., Riccio, D., Wang, Y., Sun, L., Ma, K., Fang, J., Kone, I., Boulmane, L., Campilho,
A., Eloy, C., Polónia, A., Aguiar, P., 2019. BACH: Grand challenge on breast cancer histology
images. Medical Image Analysis 56, 122–139. https://doi.org/10.1016/j.media.2019.05.010
Armato, S.G., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer, C.R., Reeves, A.P., Zhao, B., Aberle,
D.R., Henschke, C.I., Hoffman, E.A., Kazerooni, E.A., MacMahon, H., van Beek, E.J.R.,
Yankelevitz, D., Biancardi, A.M., Bland, P.H., Brown, M.S., Engelmann, R.M., Laderach, G.E.,
Max, D., Pais, R.C., Qing, D.P.-Y., Roberts, R.Y., Smith, A.R., Starkey, A., Batra, P., Caligiuri,
P., Farooqi, A., Gladish, G.W., Jude, C.M., Munden, R.F., Petkovska, I., Quint, L.E., Schwartz,
L.H., Sundaram, B., Dodd, L.E., Fenimore, C., Gur, D., Petrick, N., Freymann, J., Kirby, J.,
Hughes, B., Vande Casteele, A., Gupte, S., Sallam, M., Heath, M.D., Kuhn, M.H., Dharaiya, E.,
Burns, R., Fryd, D.S., Salganicoff, M., Anand, V., Shreter, U., Vastagh, S., Croft, B.Y., Clarke,
L.P., 2011. The Lung Image Database Consortium (LIDC) and Image Database Resource
Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans: The
LIDC/IDRI thoracic CT database of lung nodules. Med. Phys. 38, 915–931.
https://doi.org/10.1118/1.3528204
Arun, A., Jawahar, C.V., Kumar, M.P., 2019. Dissimilarity Coefficient Based Weakly Supervised Object
Detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 9432–9441.
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., Samek, W., 2015. On Pixel-Wise
Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS
ONE 10, e0130140. https://doi.org/10.1371/journal.pone.0130140
Ballerini, L., Fisher, R.B., Aldridge, B., Rees, J., 2013. A Color and Texture Based Hierarchical K-NN
Approach to the Classification of Non-melanoma Skin Lesions, in: Celebi, M.E., Schaefer, G.
(Eds.), Color Medical Image Analysis, Lecture Notes in Computational Vision and
Biomechanics. Springer Netherlands, Dordrecht, pp. 63–86. https://doi.org/10.1007/978-94-
007-5389-1_4
Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L., 2016. What’s the Point: Semantic Segmentation
with Point Supervision, in: Leibe, B., Matas, J., Sebe, N., Welling, M. (Eds.), Computer Vision
– ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing, pp. 549–
565.
Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P.-A., Cetin, I., Lekadir, K., Camara,
O., Gonzalez Ballester, M.A., Sanroma, G., Napel, S., Petersen, S., Tziritas, G., Grinias, E.,
Khened, M., Kollerathu, V.A., Krishnamurthi, G., Rohe, M.-M., Pennec, X., Sermesant, M.,
Isensee, F., Jager, P., Maier-Hein, K.H., Full, P.M., Wolf, I., Engelhardt, S., Baumgartner, C.F.,
Koch, L.M., Wolterink, J.M., Isgum, I., Jang, Y., Hong, Y., Patravali, J., Jain, S., Humbert, O.,
Jodoin, P.-M., 2018. Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures
Segmentation and Diagnosis: Is the Problem Solved? IEEE Trans. Med. Imaging 37, 2514–2525.
https://doi.org/10.1109/TMI.2018.2837502
Bilen, H., Vedaldi, A., 2016. Weakly Supervised Deep Detection Networks, in: 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR). Presented at the 2016 IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA, pp. 2846–
2854. https://doi.org/10.1109/CVPR.2016.311
Bloch, N., Madabhushi, A., Huisman, H., 2015. NCI-ISBI 2013 Challenge: Automated Segmentation of
Prostate Structures. The Cancer Imaging Archive 370.
https://doi.org/10.7937/k9/tcia.2015.zf0vlopv
Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F., 2018. Generating Bounding Box Supervision for
Semantic Segmentation with Deep Learning, in: Pancioni, L., Schwenker, F., Trentin, E. (Eds.),
Artificial Neural Networks in Pattern Recognition. Springer International Publishing, Cham, pp.
190–200. https://doi.org/10.1007/978-3-319-99978-4_15
Can, Y.B., Chaitanya, K., Mustafa, B., Koch, L.M., Konukoglu, E., Baumgartner, C.F., 2018. Learning
to Segment Medical Images with Scribble-Supervision Alone, in: Stoyanov, D., Taylor, Z.,
Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S., Bradley, A.,
Papa, J.P., Belagiannis, V., Nascimento, J.C., Lu, Z., Conjeti, S., Moradi, M., Greenspan, H.,
Madabhushi, A. (Eds.), Deep Learning in Medical Image Analysis and Multimodal Learning
for Clinical Decision Support, Lecture Notes in Computer Science. Springer International
Publishing, pp. 236–244.
Chen, C., Dou, Q., Chen, H., Qin, J., Heng, P.-A., 2019. Synergistic Image and Feature Adaptation:
Towards Cross-Modality Domain Adaptation for Medical Image Segmentation.
arXiv:1901.08211 [cs].
Chen, H., Qi, X., Yu, L., Dou, Q., Qin, J., Heng, P.-A., 2017. DCAN: Deep contour-aware networks for
object instance segmentation from histology images. Medical Image Analysis 36, 135–146.
https://doi.org/10.1016/j.media.2016.11.004
Chen, H., Qi, X., Yu, L., Heng, P.-A., 2016. DCAN: Deep Contour-Aware Networks for Accurate Gland
Segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. Presented at the Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 2487–2496.
Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L., 2018. Domain Adaptive Faster R-CNN for Object
Detection in the Wild, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern
Recognition. Presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), IEEE, Salt Lake City, UT, USA, pp. 3339–3348.
https://doi.org/10.1109/CVPR.2018.00352
Chowdhury, A., Biswas, S., Bianco, S., 2017. Active deep learning reduces annotation burden in
automatic cell segmentation (preprint). Bioinformatics. https://doi.org/10.1101/211060
Codella, N.C.F., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris,
K., Mishra, N., Kittler, H., Halpern, A., 2018. Skin lesion analysis toward melanoma detection:
A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the
international skin imaging collaboration (ISIC), in: 2018 IEEE 15th International Symposium
on Biomedical Imaging (ISBI 2018). Presented at the 2018 IEEE 15th International Symposium
on Biomedical Imaging (ISBI 2018), pp. 168–172. https://doi.org/10.1109/ISBI.2018.8363547
Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V., 2019. AutoAugment: Learning Augmentation
Strategies From Data, in: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition. Presented at the Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 113–123.
Damodaram, M.S., Story, L., Eixarch, E., Patkee, P., Patel, A., Kumar, S., Rutherford, M., 2012. Foetal
volumetry using Magnetic Resonance Imaging in intrauterine growth restriction. Early Human
Development, 3rd International Conference on : “Nutrition and Care of the Preterm Infant:
Current Issues” 88, S35–S40. https://doi.org/10.1016/j.earlhumdev.2011.12.026
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, Li Fei-Fei, 2009. ImageNet: A large-scale hierarchical
image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition.
Presented at the 2009 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition Workshops (CVPR Workshops), IEEE, Miami, FL, pp. 248–255.
https://doi.org/10.1109/CVPR.2009.5206848
Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Gool, L.V., 2017. Weakly Supervised Cascaded
Convolutional Networks, in: 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), IEEE, Honolulu, HI, pp. 5131–5139.
https://doi.org/10.1109/CVPR.2017.545
Dong, N., Kampffmeyer, M., Liang, X., Wang, Z., Dai, W., Xing, E., 2018. Unsupervised Domain
Adaptation for Automatic Estimation of Cardiothoracic Ratio, in: Frangi, A.F., Schnabel, J.A.,
Davatzikos, C., Alberola-López, C., Fichtinger, G. (Eds.), Medical Image Computing and
Computer Assisted Intervention – MICCAI 2018, Lecture Notes in Computer Science. Springer
International Publishing, pp. 544–552.
Erhan, D., Bengio, Y., Courville, A., Vincent, P., Box, P.O., 2009. Visualizing Higher-Layer Features of
a Deep Network. University of Montreal 1341, 1.
Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S., 2017. Dermatologist-
level classification of skin cancer with deep neural networks. Nature 542, 115–118.
https://doi.org/10.1038/nature21056
Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G.,
Thrun, S., Dean, J., 2019. A guide to deep learning in healthcare. Nat Med 25, 24–29.
https://doi.org/10.1038/s41591-018-0316-z
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A., 2010. The Pascal Visual Object
Classes (VOC) Challenge. Int J Comput Vis 88, 303–338. https://doi.org/10.1007/s11263-009-
0275-4
Folmsbee, J., Liu, X., Brandwein-Weber, M., Doyle, S., 2018. Active deep learning: Improved training
efficiency of convolutional neural networks for tissue classification in oral cavity cancer, in:
2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). Presented at the
2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), IEEE,
Washington, DC, pp. 770–773. https://doi.org/10.1109/ISBI.2018.8363686
Fong, R.C., Vedaldi, A., 2017. Interpretable Explanations of Black Boxes by Meaningful Perturbation,
in: 2017 IEEE International Conference on Computer Vision (ICCV). Presented at the 2017
IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, pp. 3449–3457.
https://doi.org/10.1109/ICCV.2017.371
Fu, H., Xu, Y., Lin, S., Wong, D.W.K., Baskaran, M., Mahesh, M., Aung, T., Liu, J., 2019. Angle-Closure
Detection in Anterior Segment OCT Based on Multilevel Deep Network. IEEE Trans. Cybern.
1–9. https://doi.org/10.1109/TCYB.2019.2897162
García Seco de Herrera, A., Schaer, R., Bromuri, S., Müller, H., 2016. Overview of the medical tasks in
ImageCLEF 2016. Proceedings of the 7th International Conference of CLEF Association (CLEF)
2016 219–232.
Gondal, W.M., Kohler, J.M., Grzeszick, R., Fink, G.A., Hirsch, M., 2017. Weakly-supervised localization
of diabetic retinopathy lesions in retinal fundus images, in: 2017 IEEE International Conference
on Image Processing (ICIP). Presented at the 2017 IEEE International Conference on Image
Processing (ICIP), IEEE, Beijing, pp. 2069–2073. https://doi.org/10.1109/ICIP.2017.8296646
Gorriz, M., Carlier, A., Faure, E., Giro-i-Nieto, X., 2017. Cost-Effective Active Learning for Melanoma
Segmentation. arXiv:1711.09168 [cs].
Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner,
K., Madams, T., Cuadros, J., Kim, R., Raman, R., Nelson, P.C., Mega, J.L., Webster, D.R., 2016.
Development and Validation of a Deep Learning Algorithm for Detection of Diabetic
Retinopathy in Retinal Fundus Photographs. JAMA 316, 2402.
https://doi.org/10.1001/jama.2016.17216
Gutman, D., Codella, N.C.F., Celebi, E., Helba, B., Marchetti, M., Mishra, N., Halpern, A., 2016. Skin
Lesion Analysis toward Melanoma Detection: A Challenge at the International Symposium on
Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration
(ISIC). arXiv:1605.01397 [cs].
He, K., Gkioxari, G., Dollar, P., Girshick, R., 2017. Mask R-CNN, in: Proceedings of the IEEE
International Conference on Computer Vision. Presented at the Proceedings of the IEEE
International Conference on Computer Vision, pp. 2961–2969.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep Residual Learning for Image Recognition, in: 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA,
pp. 770–778. https://doi.org/10.1109/CVPR.2016.90
Hua, G., Long, C., Yang, M., Gao, Y., 2018. Collaborative Active Visual Recognition from Crowds: A
Distributed Ensemble Approach. IEEE Trans. Pattern Anal. Mach. Intell. 40, 582–594.
https://doi.org/10.1109/TPAMI.2017.2682082
Huang, G., Liu, Z., Maaten, L. van der, Weinberger, K.Q., 2017. Densely Connected Convolutional
Networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
IEEE, Honolulu, HI, pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Huang, J., Gretton, A., Borgwardt, K.M., Schölkopf, B., Smola, A.J., 2007. Correcting Sample Selection
Bias by Unlabeled Data, in: Advances in Neural Information Processing Systems. Presented at
the Advances in neural information processing systems, pp. 1–8.
Huang, S.-J., Chen, J.-L., Mu, X., Zhou, Z.-H., 2017. Cost-Effective Active Learning from Diverse
Labelers, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial
Intelligence. Presented at the Twenty-Sixth International Joint Conference on Artificial
Intelligence, International Joint Conferences on Artificial Intelligence Organization, Melbourne,
Australia, pp. 1879–1885. https://doi.org/10.24963/ijcai.2017/261
Huang, S.-J., Jin, R., Zhou, Z.-H., 2014. Active Learning by Querying Informative and Representative
Examples. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1936–1949.
https://doi.org/10.1109/TPAMI.2014.2307881
Huang, S.-J., Zhao, J.-W., Liu, Z.-Y., 2018. Cost-Effective Training of Deep CNNs with Active Model
Adaptation, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
Discovery & Data Mining, KDD ’18. ACM, New York, NY, USA, pp. 1580–1588.
https://doi.org/10.1145/3219819.3220026
Huang, S.-W., Lin, C.-T., Chen, S.-P., Wu, Y.-Y., Hsu, P.-H., Lai, S.-H., 2018. AugGAN: Cross Domain
Adaptation with GAN-Based Data Augmentation, in: Ferrari, V., Hebert, M., Sminchisescu, C.,
Weiss, Y. (Eds.), Computer Vision – ECCV 2018. Springer International Publishing, Cham, pp.
731–744. https://doi.org/10.1007/978-3-030-01240-3_44
Huang, Z., Wang, X., Wang, Jiasi, Liu, W., Wang, Jingdong, 2018. Weakly-Supervised Semantic
Segmentation Network with Deep Seeded Region Growing, in: 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition. Presented at the 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, USA, pp. 7014–
7023. https://doi.org/10.1109/CVPR.2018.00733
Huo, Y., Xu, Z., Bao, S., Assad, A., Abramson, R.G., Landman, B.A., 2018. Adversarial synthesis
learning enables segmentation without target modality ground truth, in: 2018 IEEE 15th
International Symposium on Biomedical Imaging (ISBI 2018). Presented at the 2018 IEEE 15th
International Symposium on Biomedical Imaging (ISBI 2018), IEEE, Washington, DC, pp.
1217–1220. https://doi.org/10.1109/ISBI.2018.8363790
Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K., 2018. Cross-Domain Weakly-Supervised Object
Detection Through Progressive Domain Adaptation, in: 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition. Presented at the 2018 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, pp. 5001–5009.
https://doi.org/10.1109/CVPR.2018.00525
Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H., 2018. Brain Tumor
Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge,
in: Crimi, A., Bakas, S., Kuijf, H., Menze, B., Reyes, M. (Eds.), Brainlesion: Glioma, Multiple
Sclerosis, Stroke and Traumatic Brain Injuries, Lecture Notes in Computer Science. Springer
International Publishing, pp. 287–297.
Isensee, F., Petersen, J., Kohl, S.A.A., Jäger, P.F., Maier-Hein, K.H., 2019. nnU-Net: Breaking the Spell
on Successful Medical Image Segmentation. arXiv:1904.08128 [cs].
Javanmardi, M., Tasdizen, T., 2018. Domain adaptation for biomedical image segmentation using
adversarial training, in: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI
2018). Presented at the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI
2018), IEEE, Washington, DC, pp. 554–558. https://doi.org/10.1109/ISBI.2018.8363637
Jin, B., Segovia, M.V.O., Susstrunk, S., 2017. Webly Supervised Semantic Segmentation, in: 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp.
1705–1714. https://doi.org/10.1109/CVPR.2017.185
K. Wang, D. Zhang, Y. Li, R. Zhang, L. Lin, 2017. Cost-Effective Active Learning for Deep Image
Classification. IEEE Transactions on Circuits and Systems for Video Technology 27, 2591–
2600. https://doi.org/10.1109/TCSVT.2016.2589879
Kamnitsas, K., Baumgartner, C., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Nori, A.,
Criminisi, A., Rueckert, D., Glocker, B., 2017a. Unsupervised Domain Adaptation in Brain
Lesion Segmentation with Adversarial Networks, in: Niethammer, M., Styner, M., Aylward, S.,
Zhu, H., Oguz, I., Yap, P.-T., Shen, D. (Eds.), Information Processing in Medical Imaging,
Lecture Notes in Computer Science. Springer International Publishing, pp. 597–609.
Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D.,
Glocker, B., 2017b. Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate
Brain Lesion Segmentation. Medical Image Analysis 36, 61–78.
https://doi.org/10.1016/j.media.2016.10.004
Kauppi, T., Kalesnykiene, V., Kamarainen, J.-K., Lensu, L., Sorri, I., Raninen, A., Voutilainen, R.,
Uusitalo, H., Kalviainen, H., Pietila, J., 2007. DIARETDB1 diabetic retinopathy database and
evaluation protocol, in: Medical Image Understanding and Analysis. Presented at the Medical
Image Understanding and Analysis, p. 61.
Ker, J., Wang, L., Rao, J., Lim, T., 2018. Deep Learning Applications in Medical Image Analysis. IEEE
Access 6, 9375–9389. https://doi.org/10.1109/ACCESS.2017.2788044
Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S., Liang, H., Baxter, S.L., McKeown, A., Yang,
G., Wu, X., Yan, F., Dong, Justin, Prasadha, M.K., Pei, J., Ting, M.Y.L., Zhu, J., Li, C., Hewett,
S., Dong, Jason, Ziyar, I., Shi, A., Zhang, R., Zheng, L., Hou, R., Shi, W., Fu, X., Duan, Y., Huu,
V.A.N., Wen, C., Zhang, E.D., Zhang, C.L., Li, O., Wang, X., Singer, M.A., Sun, X., Xu, J.,
Tafreshi, A., Lewis, M.A., Xia, H., Zhang, K., 2018. Identifying Medical Diagnoses and
Treatable Diseases by Image-Based Deep Learning. Cell 172, 1122-1131.e9.
https://doi.org/10.1016/j.cell.2018.02.010
Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ben Ayed, I., 2019. Constrained-CNN losses
for weakly supervised segmentation. Medical Image Analysis 54, 88–99.
https://doi.org/10.1016/j.media.2019.02.009
Kozuka, K., Niebles, J.C., 2017. Risky Region Localization with Point Supervision, in: 2017 IEEE
International Conference on Computer Vision Workshops (ICCVW). Presented at the 2017
IEEE International Conference on Computer Vision Workshop (ICCVW), IEEE, Venice, pp.
246–253. https://doi.org/10.1109/ICCVW.2017.38
Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., Sethi, A., 2017. A Dataset and a
Technique for Generalized Nuclear Segmentation for Computational Pathology. IEEE
Transactions on Medical Imaging 36, 1550–1560. https://doi.org/10.1109/TMI.2017.2677499
Lai, Z., Deng, H., 2018. Medical Image Classification Based on Deep Features Extracted by Deep Model
and Statistic Feature Fusion with Multilayer Perceptron . Computational Intelligence and
Neuroscience 2018, 1–13. https://doi.org/10.1155/2018/2061516
Laves, M.-H., Bicker, J., Kahrs, L.A., Ortmaier, T., 2019. A dataset of laryngeal endoscopic images with
comparative study on convolution neural network-based semantic segmentation. Int J CARS 14,
483–492. https://doi.org/10.1007/s11548-018-01910-0
LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444.
https://doi.org/10.1038/nature14539
Li, Q., Arnab, A., Torr, P.H.S., 2018. Weakly- and Semi-supervised Panoptic Segmentation, in: Ferrari,
V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Computer Vision – ECCV 2018. Springer
International Publishing, Cham, pp. 106–124. https://doi.org/10.1007/978-3-030-01267-0_7
Liao, H., Luo, J., 2017. A Deep Multi-Task Learning Approach to Skin Lesion Classification, in:
Workshops at the Thirty-First AAAI Conference on Artificial Intelligence. Presented at the
Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, p. 7.
Lin, D., Dai, J., Jia, J., He, K., Sun, J., 2016. ScribbleSup: Scribble-Supervised Convolutional Networks
for Semantic Segmentation, in: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). Presented at the 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), IEEE, Las Vegas, NV, USA, pp. 3159–3167.
https://doi.org/10.1109/CVPR.2016.344
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., 2014.
Microsoft COCO: Common Objects in Context, in: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars,
T. (Eds.), Computer Vision – ECCV 2014, Lecture Notes in Computer Science. Springer
International Publishing, pp. 740–755.
Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M.,
van Ginneken, B., Sánchez, C.I., 2017. A survey on deep learning in medical image analysis.
Medical Image Analysis 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C., 2016. SSD: Single Shot
MultiBox Detector. arXiv:1512.02325 [cs] 9905, 21–37. https://doi.org/10.1007/978-3-319-
46448-0_2
Lixin Duan, Tsang, I.W., Dong Xu, 2012. Domain Transfer Multiple Kernel Learning. IEEE Trans.
Pattern Anal. Mach. Intell. 34, 465–479. https://doi.org/10.1109/TPAMI.2011.114
Long, M., Cao, Y., Wang, J., Jordan, M.I., 2015. Learning Transferable Features with Deep Adaptation
Networks, in: Proceedings of International Conference on Machine Learning. Presented at the
Proceedings of International Conference on Machine Learning, Lille, France, pp. 97–105.
Mahendran, A., Vedaldi, A., 2016. Salient Deconvolutional Networks, in: Leibe, B., Matas, J., Sebe, N.,
Welling, M. (Eds.), Computer Vision – ECCV 2016. Springer International Publishing, Cham,
pp. 120–135. https://doi.org/10.1007/978-3-319-46466-4_8
Maier, A., Syben, C., Lasser, T., Riess, C., 2019. A gentle introduction to deep learning in medical image
processing. Zeitschrift für Medizinische Physik 29, 86–101.
https://doi.org/10.1016/j.zemedi.2018.12.003
Makropoulos, A., Robinson, E.C., Schuh, A., Wright, R., Fitzgibbon, S., Bozek, J., Counsell, S.J.,
Steinweg, J., Vecchiato, K., Passerat-Palmbach, J., Lenz, G., Mortari, F., Tenev, T., Duff, E.P.,
Bastiani, M., Cordero-Grande, L., Hughes, E., Tusor, N., Tournier, J.-D., Hutter, J., Price, A.N.,
Teixeira, R.P.A.G., Murgasova, M., Victor, S., Kelly, C., Rutherford, M.A., Smith, S.M.,
Edwards, A.D., Hajnal, J.V., Jenkinson, M., Rueckert, D., 2018. The developing human
connectome project: A minimal processing pipeline for neonatal cortical surface reconstruction.
NeuroImage 173, 88–112. https://doi.org/10.1016/j.neuroimage.2018.01.054
Moen, E., Bannon, D., Kudo, T., Graf, W., Covert, M., Van Valen, D., 2019. Deep learning for cellular
image analysis. Nat Methods. https://doi.org/10.1038/s41592-019-0403-1
Moeskops, P., Veta, M., Lafarge, M.W., Eppenhof, K.A.J., Pluim, J.P.W., 2017. Adversarial Training and
Dilated Convolutions for Brain MRI Segmentation, in: Cardoso, M.J., Arbel, T., Carneiro, G.,
Syeda-Mahmood, T., Tavares, J.M.R.S., Moradi, M., Bradley, A., Greenspan, H., Papa, J.P.,
Madabhushi, A., Nascimento, J.C., Cardoso, J.S., Belagiannis, V., Lu, Z. (Eds.), Deep Learning
in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Lecture
Notes in Computer Science. Springer International Publishing, pp. 56–64.
Moeskops, P., Wolterink, J.M., van der Velden, B.H.M., Gilhuijs, K.G.A., Leiner, T., Viergever, M.A.,
Išgum, I., 2016. Deep Learning for Multi-Task Medical Image Segmentation in Multiple
Modalities. arXiv:1704.03379 [cs] 9901, 478–486. https://doi.org/10.1007/978-3-319-46723-
8_55
Navarro, F., Conjeti, S., Tombari, F., Navab, N., 2018. Webly Supervised Learning for Skin Lesion
Classification, in: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger,
G. (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2018,
Lecture Notes in Computer Science. Springer International Publishing, pp. 398–406.
Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J., 2017. Plug & Play Generative Networks:
Conditional Iterative Generation of Images in Latent Space, in: 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp. 3510–3520.
https://doi.org/10.1109/CVPR.2017.374
Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S.,
Hammerla, N.Y., Kainz, B., Glocker, B., Rueckert, D., 2018. Attention U-Net: Learning Where
to Look for the Pancreas, in: International Conference on Medical Imaging with Deep Learning.
Presented at the MIDL 2018 Conference Submission, p. 10.
Ozdemir, F., Peng, Z., Tanner, C., Fuernstahl, P., Goksel, O., 2018. Active Learning for Segmentation by
Optimizing Content Information for Maximal Entropy. arXiv:1807.06962 [cs, stat] 11045, 183–
191. https://doi.org/10.1007/978-3-030-00889-5_21
Pan, Y., Liu, M., Lian, C., Zhou, T., Xia, Y., Shen, D., 2018. Synthesizing Missing PET from MRI with
Cycle-consistent Generative Adversarial Networks for Alzheimer’s Disease Diagnosis, in:
Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (Eds.), Medical
Image Computing and Computer Assisted Intervention – MICCAI 2018. Springer International
Publishing, Cham, pp. 455–463. https://doi.org/10.1007/978-3-030-00931-1_52
Perez, L., Wang, J., 2017. The Effectiveness of Data Augmentation in Image Classification using Deep
Learning. arXiv:1712.04621 [cs].
Petsiuk, V., Das, A., Saenko, K., 2018. RISE: Randomized Input Sampling for Explanation of Black-box
Models. arXiv:1806.07421 13.
Poplin, R., Varadarajan, A.V., Blumer, K., Liu, Y., McConnell, M.V., Corrado, G.S., Peng, L., Webster,
D.R., 2018. Predicting Cardiovascular Risk Factors from Retinal Fundus Photographs using
Deep Learning. Nat Biomed Eng 2, 158–164. https://doi.org/10.1038/s41551-018-0195-0
Qu, H., Wu, P., Huang, Q., Yi, J., Riedlinger, G.M., De, S., Metaxas, D.N., 2019. Weakly Supervised
Deep Nuclei Segmentation using Points Annotation in Histopathology Images, in: International
Conference on Medical Imaging with Deep Learning. Presented at the International Conference
on Medical Imaging with Deep Learning, pp. 390–400.
Quellec, G., Charrière, K., Boudi, Y., Cochener, B., Lamard, M., 2017. Deep image mining for diabetic
retinopathy screening. Medical Image Analysis 39, 178–193.
https://doi.org/10.1016/j.media.2017.04.012
Rajchl, M., Lee, M.C.H., Oktay, O., Kamnitsas, K., Passerat-Palmbach, J., Bai, W., Damodaram, M.,
Rutherford, M.A., Hajnal, J.V., Kainz, B., Rueckert, D., 2017. DeepCut: Object Segmentation
From Bounding Box Annotations Using Convolutional Neural Networks. IEEE Trans. Med.
Imaging 36, 674–683. https://doi.org/10.1109/TMI.2016.2621185
Rajpurkar, P., Irvin, J., Bagul, A., Ding, D., Duan, T., Mehta, H., Yang, B., Zhu, K., Laird, D., Ball, R.L.,
Langlotz, C., Shpanskaya, K., Lungren, M.P., Ng, A.Y., 2017. MURA: Large Dataset for
Abnormality Detection in Musculoskeletal Radiographs. arXiv:1712.06957 [physics].
Rajpurkar, P., Irvin, J., Ball, R.L., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz,
C.P., Patel, B.N., Yeom, K.W., Shpanskaya, K., Blankenberg, F.G., Seekins, J., Amrhein, T.J.,
Mong, D.A., Halabi, S.S., Zucker, E.J., Ng, A.Y., Lungren, M.P., 2018. Deep learning for chest
radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing
radiologists. PLoS Med 15, e1002686. https://doi.org/10.1371/journal.pmed.1002686
Razzak, M.I., Naz, S., Zaib, A., 2018. Deep Learning for Medical Image Processing: Overview,
Challenges and the Future, in: Dey, N., Ashour, A.S., Borra, S. (Eds.), Classification in BioApps:
Automation of Decision Making, Lecture Notes in Computational Vision and Biomechanics.
Springer International Publishing, Cham, pp. 323–350. https://doi.org/10.1007/978-3-319-
65981-7_12
Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You Only Look Once: Unified, Real-Time
Object Detection, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). Presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), IEEE, Las Vegas, NV, USA, pp. 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren, S., He, K., Girshick, R., Sun, J., 2017. Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149.
https://doi.org/10.1109/TPAMI.2016.2577031
Ribeiro, M.T., Singh, S., Guestrin, C., 2016. “Why Should I Trust You?”: Explaining the Predictions of
Any Classifier, in: Proceedings of the 22nd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining - KDD ’16. Presented at the the 22nd ACM SIGKDD
International Conference, ACM Press, San Francisco, California, USA, pp. 1135–1144.
https://doi.org/10.1145/2939672.2939778
Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional Networks for Biomedical Image
Segmentation. arXiv:1505.04597 [cs].
Roth, H., Farag, A., Turkbey, E.B., Lu, L., Liu, J., Summers, R.M., 2016. Data From Pancreas-CT.
https://doi.org/10.7937/k9/tcia.2016.tnb1kqbu
Roth, H.R., Oda, H., Hayashi, Y., Oda, M., Shimizu, N., Fujiwara, M., Misawa, K., Mori, K., 2017.
Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv:1704.06382
[cs].
Samala, R.K., Chan, H.-P., Hadjiiski, L.M., Helvie, M.A., Cha, K., Richter, C., 2017. Multi-task transfer
learning deep convolutional neural network: application to computer-aided diagnosis of breast
cancer on mammograms. Phys. Med. Biol. https://doi.org/10.1088/1361-6560/aa93d4
Schlemper, J., Oktay, O., Chen, L., Matthew, J., Knight, C., Kainz, B., Glocker, B., Rueckert, D., 2018.
Attention-Gated Networks for Improving Ultrasound Scan Plane Detection. arXiv:1804.05338
[cs].
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. Grad-CAM: Visual
Explanations from Deep Networks via Gradient-Based Localization, in: 2017 IEEE
International Conference on Computer Vision (ICCV). Presented at the 2017 IEEE International
Conference on Computer Vision (ICCV), IEEE, Venice, pp. 618–626.
https://doi.org/10.1109/ICCV.2017.74
Sener, O., Savarese, S., 2017. Active Learning for Convolutional Neural Networks: A Core-Set Approach.
arXiv:1708.00489 [cs, stat].
Shu, J., Xu, Z., Meng, D., 2018. Small Sample Learning in Big Data Era. arXiv:1808.04572 [cs, stat].
Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep Inside Convolutional Networks: Visualising Image
Classification Models and Saliency Maps. arXiv:1312.6034 [cs].
Sirinukunwattana, K., Pluim, J.P.W., Chen, H., Qi, X., Heng, P.-A., Guo, Y.B., Wang, L.Y., Matuszewski,
B.J., Bruni, E., Sanchez, U., Böhm, A., Ronneberger, O., Cheikh, B.B., Racoceanu, D., Kainz,
P., Pfeiffer, M., Urschler, M., Snead, D.R.J., Rajpoot, N.M., 2017. Gland segmentation in colon
histology images: The glas challenge contest. Medical Image Analysis 35, 489–502.
https://doi.org/10.1016/j.media.2016.08.008
Smailagic, A., Costa, P., Young Noh, H., Walawalkar, D., Khandelwal, K., Galdran, A., Mirshekari, M.,
Fagert, J., Xu, S., Zhang, P., Campilho, A., 2018. MedAL: Accurate and Robust Deep Active
Learning for Medical Image Analysis, in: 2018 17th IEEE International Conference on Machine
Learning and Applications (ICMLA). Presented at the 2018 17th IEEE International Conference
on Machine Learning and Applications (ICMLA), IEEE, Orlando, FL, pp. 481–488.
https://doi.org/10.1109/ICMLA.2018.00078
Sourati, J., Gholipour, A., Dy, J.G., Kurugol, S., Warfield, S.K., 2018. Active Deep Learning with Fisher
Information for Patch-Wise Semantic Segmentation, in: Stoyanov, D., Taylor, Z., Carneiro, G.,
Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S., Bradley, A., Papa, J.P.,
Belagiannis, V., Nascimento, J.C., Lu, Z., Conjeti, S., Moradi, M., Greenspan, H., Madabhushi,
A. (Eds.), Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical
Decision Support. Springer International Publishing, Cham, pp. 83–91.
https://doi.org/10.1007/978-3-030-00889-5_10
Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J., 2016.
Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?
IEEE Trans. Med. Imaging 35, 1299–1312. https://doi.org/10.1109/TMI.2016.2535302
Tang, P., Wang, X., Bai, X., Liu, W., 2017. Multiple Instance Detection Network with Online Instance
Classifier Refinement, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), IEEE, Honolulu, HI, pp. 3059–3067. https://doi.org/10.1109/CVPR.2017.326
Tang, P., Wang, X., Wang, A., Yan, Y., Liu, W., Huang, J., Yuille, A., 2018. Weakly Supervised Region
Proposal Network and Object Detection, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.
(Eds.), Computer Vision – ECCV 2018. Springer International Publishing, Cham, pp. 370–386.
https://doi.org/10.1007/978-3-030-01252-6_22
Van Valen, D.A., Kudo, T., Lane, K.M., Macklin, D.N., Quach, N.T., DeFelice, M.M., Maayan, I.,
Tanouchi, Y., Ashley, E.A., Covert, M.W., 2016. Deep Learning Automates the Quantitative
Analysis of Individual Cells in Live-Cell Imaging Experiments. PLOS Computational Biology
12, e1005177. https://doi.org/10.1371/journal.pcbi.1005177
Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q., 2018. Min-Entropy Latent Model for Weakly Supervised Object
Detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1297–1306.
Wang, H., Cruz-Roa, A., Basavanhally, A., Gilmore, H., Shih, N., Feldman, M., Tomaszewski, J.,
Gonzalez, F., Madabhushi, A., 2014. Mitosis detection in breast cancer pathology images by
combining handcrafted and convolutional neural network features. J. Med. Imag 1, 034003.
https://doi.org/10.1117/1.JMI.1.3.034003
Wang, H., Xia, Y., 2018. ChestNet: A Deep Neural Network for Classification of Thoracic Diseases on
Chest Radiography. arXiv:1807.03058.
Wang, Z., Du, B., Zhang, Lefei, Zhang, Liangpei, 2016. A batch-mode active learning framework by
querying discriminative and representative samples for hyperspectral image classification.
Neurocomputing 179, 88–100. https://doi.org/10.1016/j.neucom.2015.11.062
Xie, Y., Xia, Y., Zhang, J., Song, Y., Feng, D., Fulham, M., Cai, W., 2019. Knowledge-based
Collaborative Deep Learning for Benign-Malignant Lung Nodule Classification on Chest CT.
IEEE Trans. Med. Imaging 38, 991–1004. https://doi.org/10.1109/TMI.2018.2876510
Xie, Y., Zhang, J., Xia, Y., Fulham, M., Zhang, Y., 2018. Fusing texture, shape and deep model-learned
information at decision level for automated classification of lung nodules on chest CT.
Information Fusion 42, 102–110. https://doi.org/10.1016/j.inffus.2017.10.005
Xu, Y., Fang, X., Wu, J., Li, X., Zhang, D., 2016. Discriminative Transfer Subspace Learning via Low-
Rank and Sparse Representation. IEEE Trans. on Image Process. 25, 850–863.
https://doi.org/10.1109/TIP.2015.2510498
Xu, Y., Zhang, H., Miller, K., Singh, A., Dubrawski, A., 2017. Noise-Tolerant Interactive Learning Using
Pairwise Comparisons, in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R.,
Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 30.
Curran Associates, Inc., pp. 2431–2440.
Yan, Y.-F., Huang, S.-J., 2018. Cost-Effective Active Learning for Hierarchical Multi-Label
Classification, in: Proceedings of the Twenty-Seventh International Joint Conference on
Artificial Intelligence. Presented at the Twenty-Seventh International Joint Conference on
Artificial Intelligence {IJCAI-18}, International Joint Conferences on Artificial Intelligence
Organization, Stockholm, Sweden, pp. 2962–2968. https://doi.org/10.24963/ijcai.2018/411
Yan, Z., Zhan, Y., Peng, Z., Liao, S., Shinagawa, Y., Metaxas, D.N., Zhou, X.S., 2015. Bodypart
Recognition Using Multi-stage Deep Learning, in: Ourselin, S., Alexander, D.C., Westin, C.-F.,
Cardoso, M.J. (Eds.), Information Processing in Medical Imaging. Springer International
Publishing, Cham, pp. 449–461. https://doi.org/10.1007/978-3-319-19992-4_35
Yang, L., Zhang, Y., Chen, J., Zhang, S., Chen, D.Z., 2017. Suggestive Annotation: A Deep Active
Learning Framework for Biomedical Image Segmentation, in: Descoteaux, M., Maier-Hein, L.,
Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (Eds.), Medical Image Computing and
Computer Assisted Intervention − MICCAI 2017, Lecture Notes in Computer Science. Springer
International Publishing, pp. 399–407.
Yang, L., Zhang, Y., Zhao, Z., Zheng, H., Liang, P., Ying, M.T.C., Ahuja, A.T., Chen, D.Z., 2018. BoxNet:
Deep Learning Based Biomedical Image Segmentation Using Boxes Only Annotation.
arXiv:1806.00593 [cs].
Yi, X., Walia, E., Babyn, P., 2018. Generative Adversarial Network in Medical Imaging: A Review.
arXiv:1809.07294 [cs].
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H., 2015. Understanding Neural Networks Through
Deep Visualization. arXiv:1506.06579 12.
Zeiler, M.D., Fergus, R., 2014. Visualizing and Understanding Convolutional Networks, in: Fleet, D.,
Pajdla, T., Schiele, B., Tuytelaars, T. (Eds.), Computer Vision – ECCV 2014, Lecture Notes in
Computer Science. Springer International Publishing, pp. 818–833.
Zhang, B., Zhao, Q., Feng, W., Lyu, S., 2018. AlphaMEX: A smarter global pooling method for
convolutional neural networks. Neurocomputing 321, 36–48.
https://doi.org/10.1016/j.neucom.2018.07.079
Zhang, Jianming, Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S., 2018. Top-Down Neural
Attention by Excitation Backprop. Int J Comput Vis 126, 1084–1102.
https://doi.org/10.1007/s11263-017-1059-x
Zhang, Jianpeng, Xia, Y., Xie, Y., Fulham, M., Feng, D.D., 2018. Classification of Medical Images in the
Biomedical Literature by Jointly Using Deep and Handcrafted Visual Features. IEEE J. Biomed.
Health Inform. 22, 1521–1530. https://doi.org/10.1109/JBHI.2017.2775662
Zhang, J., Xie, Y., Xia, Y., Shen, C., 2019. Attention Residual Learning for Skin Lesion Classification.
IEEE Trans. Med. Imaging 1–1. https://doi.org/10.1109/TMI.2019.2893944
Zhang, L., 2019. Transfer Adaptation Learning: A Decade Survey. arXiv:1903.04687 [cs].
Zhang, Yang, David, P., Gong, B., 2017. Curriculum Domain Adaptation for Semantic Segmentation of
Urban Scenes, in: 2017 IEEE International Conference on Computer Vision (ICCV). Presented
at the 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, pp.
2039–2049. https://doi.org/10.1109/ICCV.2017.223
Zhang, Yizhe, Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z., 2017. Deep Adversarial
Networks for Biomedical Image Segmentation Utilizing Unannotated Images, in: Descoteaux,
M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (Eds.), Medical Image
Computing and Computer Assisted Intervention − MICCAI 2017. Springer International
Publishing, pp. 408–416.
Zhang, Y., Yang, L., MacKenzie, J.D., Ramachandran, R., Chen, D.Z., 2016a. A seeding-searching-
ensemble method for gland segmentation in H&E-stained images. BMC Medical Informatics
and Decision Making 16, 80. https://doi.org/10.1186/s12911-016-0312-5
Zhang, Y., Ying, M.T.C., Yang, L., Ahuja, A.T., Chen, D.Z., 2016b. Coarse-to-Fine Stacked Fully
Convolutional Nets for lymph node segmentation in ultrasound images, in: 2016 IEEE
International Conference on Bioinformatics and Biomedicine (BIBM). Presented at the 2016
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 443–448.
https://doi.org/10.1109/BIBM.2016.7822557
Zhao, A., Balakrishnan, G., Durand, F., Guttag, J.V., Dalca, A.V., 2019. Data Augmentation Using
Learned Transformations for One-Shot Medical Image Segmentation, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition. Presented at the Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8543–8553.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A., 2016. Learning Deep Features for
Discriminative Localization, in: 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). Presented at the 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), IEEE, Las Vegas, NV, USA, pp. 2921–2929.
https://doi.org/10.1109/CVPR.2016.319
Zhou, Z., Shin, J., Feng, R., Hurst, R.T., Kendall, C.B., Liang, J., 2019. Integrating Active Learning and
Transfer Learning for Carotid Intima-Media Thickness Video Interpretation. J Digit Imaging 32,
290–299. https://doi.org/10.1007/s10278-018-0143-2
Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M., Liang, J., 2017. Fine-Tuning Convolutional Neural
Networks for Biomedical Image Analysis: Actively and Incrementally, in: 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp.
4761–4772. https://doi.org/10.1109/CVPR.2017.506
Zhou, Z.-H., 2018. A brief introduction to weakly supervised learning. National Science Review 5, 44–
53. https://doi.org/10.1093/nsr/nwx106