1,2 1 2 - arXiv

23
A Survey on Deep Learning of Small Sample in Biomedical Image Analysis Pengyi Zhang 1,2 , Yulin Deng 1,2 , Xiaoying Tang 1,2 , Yunxin Zhong 1,2 , Xiaoqiong Li 1,2* 1 School of Life Science, Beijing Institute of Technology, Haidian District, Beijing, P.R. China 2 Key Laboratory of Convergence Medical Engineering System and Healthcare Technology, Ministry of Industry and Information Technology, Haidian District, Beijing, P.R. China Abstract The success of deep learning has been witnessed as a promising technique for computer-aided biomedical image analysis, due to end-to-end learning framework and availability of large-scale labelled samples. However, in many cases of biomedical image analysis, deep learning techniques suffer from the small sample learning (SSL) dilemma caused mainly by lack of annotations. To be more practical for biomedical image analysis, in this paper we survey the key SSL techniques that help relieve the suffering of deep learning by combining with the development of related techniques in computer vision applications. In order to accelerate the clinical usage of biomedical image analysis based on deep learning techniques, we intentionally expand this survey to include the explanation methods for deep models that are important to clinical decision making. We survey the key SSL techniques by dividing them into five categories: (1) explanation techniques, (2) weakly supervised learning techniques, (3) transfer learning techniques, (4) active learning techniques, and (5) miscellaneous techniques involving data augmentation, domain knowledge, traditional shallow methods and attention mechanism. These key techniques are expected to effectively support the application of deep learning in clinical biomedical image analysis, and furtherly improve the analysis performance, especially when large-scale annotated samples are not available. We bulid demos at https://github.com/PengyiZhang/MIADeepSSL. Keywords: deep learning, biomedical imaging, survey, small sample, explanation 1. Introduction Biomedical imaging techniques have been widely used in clinical practice. Specialists used to manually analyze biomedical images and further fit all clinical clues together to make the final diagnosis based on their own experience. Nowadays, the manual biomedical image analysis is facing four major challenges: (1) Manual analysis is limited by individual experience and thus diagnosis might be different among individuals. (2) Training a qualified specialist requires lots of money and years of efforts. (3) Rapid growth of biomedical images in both number and modality puts enormous pressure on specialists. (4) Repetitive and tedious analysis work on unpleasing biomedical images makes specialists feel tired easily, which might lead to a delayed diagnosis or even a misdiagnosis and thus pose a great risk to patients. These challenges exacerbate shortage of healthcare resources to a certain extent especially in under-developed regions. An alternative solution is computer-aided biomedical image analysis. Driven by the growth of computing power and availability of large-scale labelled samples (e.g., ImageNet(Deng et al., 2009) and COCO(Lin et al., 2014)), deep neural network has been extensively studied due to its fast, scalable and end-to-end learning framework. Especially, Convolution Neural Network (CNN)(LeCun et al., 2015) models have achieved significant improvements compared with traditional shallow methods in image classification (e.g., ResNet(He et al., 2016) and DenseNet(G. Huang et al., 2017)), object detection (e.g., Faster R-CNN(Ren et al., 2017) and SSD(Liu et al., 2016)) and semantic segmentation (e.g., UNet(Ronneberger et al., 2015) and Mask R-CNN(He et al., 2017)), etc. As biomedical imaging equipment is increasingly applied and popularized in clinical practice, large numbers of biomedical images have been accumulated. Therefore, deep learning techniques have been gradually applied in computer-aided biomedical image analysis, such as diabetic retinopathy detection(Gulshan et al., 2016), skin cancer diagnosis(Esteva et al., 2017), pulmonary disease detection based on X-ray radiography(Rajpurkar et al., 2018) and heart disease risk prediction in fundus retinal images(Poplin et al., 2018), etc. Computer-aided biomedical image analysis using deep learning techniques has been one of the most active research topics. Some studies(Esteva et al., 2017; Rajpurkar et al., 2018) claim that their biomedical image analysis applications based on deep learning techniques have achieved expert-level or even beyond expert-level performance. However, in many cases of biomedical image analysis, deep learning techniques suffer from small sample learning (SSL) dilemma caused mainly by lack of annotations. Labeling multimodal biomedical images is a highly specialized work that experienced biomedical experts are required to perform fine-grained annotations, which puts extra enormous pressure on biomedical experts. This is a major reason why biomedical images usually lack adequate annotations in comparison of large-scale natural images. With inadequate training samples, deep models are prone to overfitting, thus leading to a big drop in performance. Therefore, how to apply *Corresponding author

Transcript of 1,2 1 2 - arXiv

A Survey on Deep Learning of Small Sample in Biomedical Image Analysis

Pengyi Zhang1,2, Yulin Deng1,2, Xiaoying Tang1,2, Yunxin Zhong1,2, Xiaoqiong Li1,2*

1School of Life Science, Beijing Institute of Technology, Haidian District, Beijing, P.R. China

2Key Laboratory of Convergence Medical Engineering System and Healthcare Technology, Ministry of

Industry and Information Technology, Haidian District, Beijing, P.R. China

Abstract

The success of deep learning has been witnessed as a promising technique for computer-aided biomedical image analysis, due to end-to-end learning framework and availability of large-scale labelled samples. However, in many cases of biomedical image analysis, deep learning techniques suffer from the small sample learning (SSL) dilemma caused mainly by lack of annotations. To be more practical for biomedical image analysis, in this paper we survey the key SSL techniques that help relieve the suffering of deep learning by combining with the development of related techniques in computer vision applications. In order to accelerate the clinical usage of biomedical image analysis based on deep learning techniques, we intentionally expand this survey to include the explanation methods for deep models that are important to clinical decision making. We survey the key SSL techniques by dividing them into five categories: (1) explanation techniques, (2) weakly supervised learning techniques, (3) transfer learning techniques, (4) active learning techniques, and (5) miscellaneous techniques involving data augmentation, domain knowledge, traditional shallow methods and attention mechanism. These key techniques are expected to effectively support the application of deep learning in clinical biomedical image analysis, and furtherly improve the analysis performance, especially when large-scale annotated samples are not available. We bulid demos at https://github.com/PengyiZhang/MIADeepSSL.

Keywords: deep learning, biomedical imaging, survey, small sample, explanation

1. Introduction

Biomedical imaging techniques have been widely used in clinical practice. Specialists used tomanually analyze biomedical images and further fit all clinical clues together to make the final diagnosis based on their own experience. Nowadays, the manual biomedical image analysis is facing four major challenges: (1) Manual analysis is limited by individual experience and thus diagnosis might be different among individuals. (2) Training a qualified specialist requires lots of money and years of efforts. (3) Rapid growth of biomedical images in both number and modality puts enormous pressure on specialists. (4) Repetitive and tedious analysis work on unpleasing biomedical images makes specialists feel tiredeasily, which might lead to a delayed diagnosis or even a misdiagnosis and thus pose a great risk topatients. These challenges exacerbate shortage of healthcare resources to a certain extent especially inunder-developed regions. An alternative solution is computer-aided biomedical image analysis.

Driven by the growth of computing power and availability of large-scale labelled samples (e.g., ImageNet(Deng et al., 2009) and COCO(Lin et al., 2014)), deep neural network has been extensively studied due to its fast, scalable and end-to-end learning framework. Especially, Convolution Neural Network (CNN)(LeCun et al., 2015) models have achieved significant improvements compared with traditional shallow methods in image classification (e.g., ResNet(He et al., 2016) and DenseNet(G. Huang et al., 2017)), object detection (e.g., Faster R-CNN(Ren et al., 2017) and SSD(Liu et al., 2016)) and semantic segmentation (e.g., UNet(Ronneberger et al., 2015) and Mask R-CNN(He et al., 2017)), etc. As biomedical imaging equipment is increasingly applied and popularized in clinical practice, large numbers of biomedical images have been accumulated. Therefore, deep learning techniques have been gradually applied in computer-aided biomedical image analysis, such as diabetic retinopathy detection(Gulshan et al., 2016), skin cancer diagnosis(Esteva et al., 2017), pulmonary disease detection based on X-ray radiography(Rajpurkar et al., 2018) and heart disease risk prediction in fundus retinal images(Poplin et al., 2018), etc. Computer-aided biomedical image analysis using deep learning techniques has been one of the most active research topics. Some studies(Esteva et al., 2017; Rajpurkar et al., 2018) claim that their biomedical image analysis applications based on deep learning techniques have achieved expert-level or even beyond expert-level performance. However, in many cases of biomedical image analysis, deep learning techniques suffer from small sample learning (SSL) dilemma caused mainly by lack of annotations. Labeling multimodal biomedical images is a highly specialized work that experienced biomedical experts are required to perform fine-grained annotations, which puts extra enormous pressure on biomedical experts. This is a major reason why biomedical images usually lack adequate annotations in comparison of large-scale natural images. With inadequate training samples, deep models are prone to overfitting, thus leading to a big drop in performance. Therefore, how to apply

*Corresponding author

deep learning techniques effectively on a small biomedical image set becomes an urgent problem to be well studied.

Many surveys(Litjens et al., 2017; Ker et al., 2018; Anwar et al., 2018; Razzak et al., 2018; Esteva et al., 2019; Maier et al., 2019; Moen et al., 2019) review the progress of deep learning techniques applied in biomedical image analysis from four directions: (1) deep models; (2) analysis tasks, e.g., classification, detection and segmentation; (3) modalities of biomedical images; (4) anatomical structure, e.g., brain, chest and retina. This SSL dilemma is simply mentioned as a challenge and its solution is not surveyed comprehensively. Different from these studies, our survey aims to deal with the SSL dilemma that deep learning techniques generally suffer from in biomedical image analysis tasks. Shu et al.(2018) investigated SSL methods in big data era by dividing them into concept learning and experience learning. Their study theoretically clarified the rationality of the SSL regime by providing neuroscience evidences and furtherly presented a general view of SSL techniques for various application scenarios. To be more practical, in this paper we survey key SSL techniques for deep learning of small sample in biomedical image analysis by combining with the development of relevant techniques in computer vision applications. In order to support the clinical usage of biomedical image analysis based on deep learning techniques, we intentionally expand our survey to include explanation methods for deep models that are important to clinical decision making.

In this paper, we divide our surveyed key SSL techniques into five categories (Fig. 1): (1) Explanation techniques. The clinical applications require that the decision-making process of

deep models is explainable and transparent. We thus explore the potential procedure of computer-aided diagnosis system applied in clinical practice and survey the explanation techniques for deep models in Section 2.

(2) Weakly supervised learning techniques. Since the fine-grained annotations required by fine-grained biomedical image analysis tasks (e.g., lesion localization and gland segmentation) are costly, can we just use relatively low-cost and coarse-grained annotations to perform these fine-grained tasks in biomedical image analysis? Therefore, the weakly supervised learning techniques for dealing with such problem are discussed in Section 3.

(3) Transfer learning techniques. Deep learning has made significant progress in solving various domain related problems. Can the knowledge gained by deep models while solving problems in other domains be applied in biomedical image analysis and thus alleviate the demands for large-scale annotated biomedical images? We present transfer learning techniques for answering this question in Section 4.

(4) Active learning techniques. Not every annotated sample contributes equally to deep models. Can we just select the few most valuable samples for annotations and train deep models with them for biomedical image analysis? We introduce active learning techniques for solving this problem in Section 5.

(5) Miscellaneous techniques. Miscellaneous techniques that help improve the performance of deep models in biomedical image analysis are studied in Section 6, involving data augmentation, domain knowledge, shallow methods and attention mechanism.

Design of deep models

Clinical applications

Training of deep models

Dataset collection and

annotation

Procedure of deep learning

Enlarging training set with software-generated biomedical images

Adopting expertise as a design guide for deep networks

Combining with shallow methods to improve generalization

Motivating deep models to observe carefully

Selecting the few most valuable samples for annotations

Performing fine-grained tasks with weak annotations

Reusing the knowledge gained while solving problems

Providing visual explanation for deep models

Object classification

Object

localization/detection

Semantic/Instance

segmentation

Biomedical image analysis

5. Deep models heavily relies on

large-scale labeled samples.

1. Biomedical images present

multimodality.

3. Only experienced biomedical

experts can perform annotations.

4. Labeling unpleasing biomedical

images is tedious.

6. Clinical practice requires that

decision-making process of deep

models is transparent.

2. Biomedical image analysis

requires expertise.

Deep learning techniques

Explanation methods

Data Augmentation

Attention mechanism

Domain knowledge

Traditional methods

Active learning

Weakly supervised learning

Transfer learning

Surveyed SSL techniques

Figure 1. Overview of our surveyed SSL techniques for deep learning in clinical biomedical image analysis.

To our best knowledge, this is the first attempt to present a comprehensive survey on key SSL

techniques that are expected to effectively support the application of deep learning in clinical biomedical image analysis and furtherly improve analysis performance especially in small sample cases. Notably, our survey focusing on key SSL techniques can be mutually complementary with previous reviews on deep learning in biomedical image analysis.

2. Explanation techniques

Due to highly abstract feature representation and end-to-end learning framework, decision-making process of deep models remains largely unclear(Petsiuk et al., 2018). In clinical practice, an incorrect decision might do great harm to patient’s physical and mental health. Explaining decision-making process of deep models can provide decision-making evidences for doctors’ final diagnosis to enroll doctors in the decision-making process of computer-aided biomedical image analysis (Fig. 2a) and make

the clinical decision more reliable. Thus, clinical applications require that the decision-making process of deep models is explainable and transparent, which is vital to build a trusty computer-aided diagnosis system. Existing explanation methods for deep models can be divided into local or global explanation and white-box or black-box methods (Fig. 2b). Table 1 outlines some representative studies on explanation methods.

CAM-based methods. One of the well-known explanation methods is Class Activation Map (CAM)(Zhou et al., 2016). This method constructs CAMs by calculating the class-specific activation in each spatial location of the last convolutional feature maps and tries to clarify the classification decision-making process by providing visual explanation that highlights class-specific and discriminative regions. In particular, Rajpurkar et al. (2018)(2017) adopted this approach to explain deep models for abnormality detection in musculoskeletal X-rays (Fig. 3a) and pulmonary disease classification in chest X-rays. Compared with the captions provided by a board-certified radiologist, visual CAMs presented a high relevance to clinical abnormality. Essentially, CAM takes fully use of the last convolutional features and the class-specific weights of output layer, thus making it an efficient method for visual explanations. Other similar methods adopted different pooling methods (e.g., global max pooling, log-sum-exp pooling, and log-mean-exp pooling(B. Zhang et al., 2018)) to retrieve different visual CAMs. Besides, Selvaraju et al.(2017)proposed Grad-CAM by using gradients instead of weights of output layer to calculate CAMs. Grad-CAM greatly expands the applications of CAM without any modifications to deep models and without restriction to classification models. Table 1. Representative researches on explanation methods for deep models.

Reference Category Method & remark

Erhan et al. (2009)

Global explanation, White-box

Activation maximization; gradient ascent in the input space

Simonyan et al. (2013)

Global explanation, White-box

Activation maximization; gradient ascent in the input space

Local explanation, Black-box

Examine component importance; perturbing input image pixels

Zeiler et al. (2013)

Local explanation, White-box

Examine component importance; covering up different portions of input image with a gray square

Bach et al. (2015)

Local explanation, White-box

Backpropagation-based; class-specific error

Yosinski et al. (2015)

Global explanation, White-box

Activation maximization; a regularized optimization in image space

Zhou et al. (2016)

Local explanation, White-box

CAM-based; multiplying activation with the weights of output layer

Mahendran et al. (2016)

Local explanation, White-box

Backpropagation-based; visualizing features by mapping them back to the input pixel space

Zhang et al. (2016)

Local explanation, White-box

Backpropagation-based; class-specific error

Ribeiro et al. (2016)

Local explanation, Black-box

Examine component importance; perturbing the pixels of input image randomly to learn a linear explainable decision model

Selvaraju et al. (2017)

Local explanation, White-box

CAM-based; extending CAM by replacing the weights of output layer with the gradients

Fong et al. (2017)

Local explanation, Black-box

Examine component importance; learning meaningful perturbation on input image by gradient decent

Nguyen et al. (2017)

Global explanation, White-box

Activation maximization; a optimization regularized by a conditional generative adversarial network

Esteva et al. (2017)

Local explanation, White-box

Backpropagation-based; L1 norm of loss gradients to input layer

Rajpurkar et al. (2018)

Local explanation, White-box

CAM-based; explaining a deep model for abnormality detection in musculoskeletal X-rays

Rajpurkar et al. (2018)

Local explanation, White-box

CAM-based; explaining a deep model for abnormality detection in chest X-rays.

Zhang et al. (2018)

Local explanation, White-box

CAM-based; modifing CAM with different global pooling methods

Kermany et al. (2018)

Local explanation, White-box

Examine component importance; convolving an occluding kernel across the input image to find the clinically relevant regions of age-related macular degeneration and diabetic macular edema

Petsiuk et al. (2018)

Local explanation, Black-box

Examine component importance; perturbing input image with random masks to calculate the average mask weighted by class-specific output

Backpropagation-based methods. The other type of explanation methods is built on backpropagation

of class-specific gradients(Zeiler and Fergus, 2014; Mahendran and Vedaldi, 2016) or errors(Bach et al., 2015; Jianming Zhang et al., 2018). The visual explanation from backpropagation-based method is generally sharper than that from CAM-based methods. In particular, Esteva et al.(2017) adopted such a backpropagation-based approach to explain a deep model for skin cancer classification in dermoscopy images. The generated saliency maps demonstrated a highly clinical relevance to the skin cancer diagnosis (Fig. 3b).

Examining component importance. Another type of explanation methods(Petsiuk et al., 2018; Erhan et al., 2009; Simonyan et al., 2013; Ribeiro et al., 2016; Fong and Vedaldi, 2017; Kermany et al., 2018)

examine the importance of each component for decision-making by removing or modifying the components (e.g., pixel, image patch and segment) in an input image. The visual explanation is retrieved by combining each component weighted by its importance together. In particular, Kermany et al.(2018)

developed such an explanation method to explain deep models for the classification tasks of age-related macular degeneration and diabetic macular edema. The generated occlusion maps highlighted

pathological areas in retina OCT images, presenting a highly clinical relevance (Fig. 3c). Activation maximization. For global explanation, activation maximization(Erhan et al., 2009) is

generally performed as an optimization problem by gradient ascent algorithm, where the parameters required to be updated are image pixels instead of the weights of deep models. For better visualization, some studies(Yosinski et al., 2015; Nguyen et al., 2017) have tried to add various regularization constraints on activation maximization to make the optimization results more realistic.

Existing explanation methods for deep models are expected to help build trusty computer-aided diagnosis systems and accelerate such diagnosis systems to be implemented in clinical practice. The explanation results, i.e., visual attention map or saliency map, can also be used for weakly supervised object localization and instance segmentation to alleviate the demands for fine-grained annotations. However, a key limitation across these studies is that they cannot explain how robust a decision is, e.g., when deep models will fail in a specific biomedical image analysis task, which is very important to construct automatic and unattended diagnosis system. Besides, the majority of these methods are designed for CNN-based recognition tasks. When there exists multiple targets in one image, the produced visual explanation is prone to highlighting only the few most discriminative targets. The incompleteness of visual explanation might lead to a sub-optimal result. Future work likely focus on developing explanation methods for deep models in complex tasks to explain how robust the decisions are by digging deep into clinical expertise underlying biomedical images. Moreover, the cooperative mode how biomedical image analysis using deep learning techniques effectively help clinical specialists is also a research topic worthy of exploring.

Output

ActivationsGradientsWeights

Black-box methods

White-box methods

InputBlack-box model

White-box model

class 1

class 2

cN

0.02 0.01 0.88 0.54 0.05 0.00

0.07 0.38 0.17

lass

Localexplanation

Globalexplanation

Final diagnosisComputer-aided diagnosis system

Specialist

Computer-aided diagnosis

Visual explanation

Biomedical Images

Figure 2. Left-to-right: (a) A potential procedure of computer-aided diagnosis system applied in clinical practice. Explanation methods for the decision-making process are expected to help build trusty computer-aided diagnosis systems. (b) Taxonomy of explanation methods. For instance, the global explanation methods aims to explain a model by visualizing the model’s favorite “husky”, while the local explanation methods try to explain why a model think a input image is a “husky” by highlighting the regions that the model relies on to make the decision. The black-box models build the explanation with only the input and output of a model while the white-box explanation methods construct the explanation by using the input and output as well as internal information, such as activations, gradients and weights, etc.

Figure 3. Examples of visual explanation for deep models built on three different modalities of biomedical images. Left-to-right: (a) Visual CAMs. The visual CAMs(Rajpurkar et al., 2017) highlight the abnormal region in musculoskeletal X-rays, presenting a highly clinical relevance; (b) Saliency maps. In skin cancer classification based on dermoscopy images, the saliency maps(Esteva et al., 2017) describe the importance of each pixel for diagnosis, presenting a highly clinical relevance; (c) Occlusion maps. The occlusion maps(Kermany et al., 2018) highlight the clinically relevant areas of pathology for the classification task of age-related macular degeneration and diabetic macular edema in retina OCT images.

3. Weakly supervised learning techniques

With the popularity of high resolution biomedical images in clinical diagnosis, many fine-grained tasks have been introduced in biomedical image analysis, such as lesion localization and gland segmentation. Training deep models for these fine-grained tasks generally requires fine-grained annotations that are costly and difficult to acquire. A natural idea is to use relatively low-cost and coarse-grained annotations to perform these fine-grained tasks in biomedical image analysis. Weakly supervised learning is such a promising approach to alleviate the demands for fine-grained annotations.

Weakly supervised learning(Zhou, 2018) mainly addresses learning problems for incomplete, inaccurate and inexact supervision environments. In computer vision, the research frontier problems of weakly supervised deep learning have started a gradual transition from simple object localization tasks(Zhou et al., 2016; Fong and Vedaldi, 2017) to complex object detection tasks(Bilen and Vedaldi, 2016; Tang et al., 2017; Diba et al., 2017; Tang et al., 2018; Wan et al., 2018; Arun et al., 2019) and semantic segmentation tasks(Z. Huang et al., 2018; Li et al., 2018; Kervadec et al., 2019). The supervision modalities in weakly supervised learning also present a diversified development trend, involving image-level supervision(Bilen and Vedaldi, 2016), box supervision(Bonechi et al., 2018), scribble supervision(Lin et al., 2016; Can et al., 2018), point supervision(Bearman et al., 2016; Kozuka and Niebles, 2017), and webly supervision(Navarro et al., 2018; Jin et al., 2017), etc. The performance

of weakly supervised learning methods in various computer vision tasks has been significantly improved, revealing a tendency to catch up with the fully supervised learning algorithms. Taking object detection task as an example, the mean average precision (mAP) score of weakly supervised object detection has been increasingly improved on the PascalVOC 2007(Everingham et al., 2010) dataset (Fig. 4). The gap of mAP (i.e., mean of average precision) scores between weakly supervised object detection algorithms and a well-known fully supervised object detection algorithm, i.e., YOLO(Redmon et al., 2016), is less than 10%. Therefore, the weakly supervised learning techniques are expected to achieve excellent performance in fine-grained biomedical image analysis using weak annotations.

0.3483

0.412 0.4280.453

0.473

0.536

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

WSDDN (2016)

OICR (2017)

WCCN (2017)

WeakRPN (2018)

MELM (2018)

Pred Net (Ens) (2018)

mAP on PascalVOC 2007

9.4%

YOLO (2016) 0.630

Figure 4. Illustration of the progress of weakly supervised object detection methods, including WSDDN(Bilen and Vedaldi, 2016), OICR(Tang et al., 2017), WCCN(Diba et al., 2017), WeakRPN(Tang et al., 2018), MELM(Wan et al., 2018) and Pred Net (Ens)(Arun et al., 2019).

Table 2. Representative papers for weakly supervised deep learning techniques used in biomedical image analysis.

Reference Supervision modality Method Task Dataset

Gondal et al. (2017)

Image-level supervision

CAM-based attention map + thresholding method

Localization of Diabetic Retinopathy (DR) lesions

88,702 fundus images in Kaggle Dataset (https://www.kaggle.com/c/diabetic-retinopathy-detection) and 89 fundus images in DiaretDB1 Dataset(Kauppi et al., 2007)

Quellec et al. (2017)

Image-level supervision

Backpropagation-based attention map + thresholding method

Localization of Diabetic Retinopathy (DR) lesions

88,702 fundus images in Kaggle Dataset and 89 fundus images in DiaretDB1 Dataset

Rajchl et al. (2017)

Box supervision

CNN-based classifier + dense conditional random field (DenseCRF), iteratively updating the learning target classes for input patches

Brain and lung segmentation MR images of 55 fetal subjects(Damodaram et al., 2012)

Can et al. (2018)

Scribble supervision Region growing on scribble annotations + UNet + CRF, iteratively refining UNet

Cardiac segmentation and prostate segmentation

200 volumes in cardiac ACDC(Bernard et al., 2018) and 29 volumes in prostate NCI-ISBI(Bloch et al., 2015)

Fernando et al. (2018)

Webly-supervision

Augmenting dermoscopy images from the Internet and using transfer learning to deal with the natural noise of webly-supervision

Skin lesion classification 1300 skin lesion images(Ballerini et al., 2013)

Yang et al. (2018)

Box supervision

Combining fully convolutional network (FCN) for coarse segmentation and graph search (GS) for fine segmentation

Gland segmentation 14 whole-slide clinical H&E stained histology images of human intestinal tissues(Zhang et al., 2016a)

Lymph node segmentation 207 clinical ultrasound images of human neck lymph nodes(Zhang et al., 2016b)

Fungus segmentation 84 images captured by serial block-face scanning electron microscopy(Zhang et al., 2017)

Qu et al. (2019)

Point supervision

Combining Voronoi diagram and k-means clustering methods for coarse segmentation and modified UNet based on DenseCRF loss for fine segmentation

Nuclei Segmentation

40 images from 8 different lung cancer cases in Lung Cancer dataset and 30 image including a diversity of nuclear appearances from seven organs in MultiOrgan dataset(Kumar et al., 2017)

Kervadec et al. (2019)

Point supervision

ENet based on inequality (lower and upper bounds on size) constrained loss function

Cardiac segmentation

100 cine magnetic resonance (MR) exams covering: dilated cardiomyopathy, hypertrophic cardiomyopathy, myocardial infarction with altered left ventricular ejection fraction and abnormal right ventricle (https://www.creatis.insa-lyon.fr/Challenge/acdc/)

Vertebral body segmentation 23 3D T2-weighted turbo spin echo MR images from 23 patients (http://dx.doi.org/10.5281/zenodo.22304)

UNet based on inequality (lower and upper bounds on size) constrained loss function

Prostate segmentation Prostate transversal T2-weighted MR images of 50 patients (https://promise12.grand-challenge.org)

Many studies have started to apply weakly supervised learning techniques to train deep models for biomedical image analysis tasks (Table 2). These applications mainly include weakly supervised object localization(Gondal et al., 2017; Quellec et al., 2017) and weakly supervised semantic segmentation(Kervadec et al., 2019; Can et al., 2018; Rajchl et al., 2017; Yang et al., 2018; Qu et al., 2019) , where the supervision modalities involve image-level supervision(Gondal et al., 2017; Quellec et al., 2017), scribble supervision(Can et al., 2018), box supervision(Rajchl et al., 2017; Yang et al., 2018), and point supervision(Qu et al., 2019; Kervadec et al., 2019). Lesion localization is very important to clinical diagnosis. For instance, accurate localization of Diabetic Retinopathy (DR) lesions in fundus retina photos is of great help to improve the performance of DR lesion recognition. To implement weakly supervised localization, researchers(Gondal et al., 2017; Quellec et al., 2017) first train a CNN-based classifier with image-level annotations (e.g., whether a fundus photo contains DR lesions or not). Second, they retrieve attention maps or saliency maps that highlight the discriminative and lesion-related regions in biomedical images through the explanation methods for this CNN-based classifier. Finally, the lesion areas are localized by performing thresholding methods on attention maps or saliency maps (Fig. 5a). For weakly supervised segmentation tasks, scribble, box and point supervision modalities are generally used in biomedical image analysis. A representative solution is an iterative optimization with three steps (Fig. 5b), including (1) pre-processing for initial pseudo pixel-level labels (e.g., region growing(Can et al., 2018), Voronoi diagram and k-means clustering methods(Qu et al., 2019)), (2) training deep segmentation model (e.g., UNet(Kervadec et al., 2019; Can et al., 2018; Qu et al., 2019), FCN(Yang et al., 2018) and ENet(Kervadec et al., 2019)) with pseudo pixel-level labels, and (3) post-processing (e.g., GS(Yang et al., 2018) and DenseCRF(Can et al., 2018; Rajchl et al., 2017; Qu et al., 2019)) for refined pseudo pixel-level labels.

Refine

Wea

k s

uper

vis

ion

(box

, sc

ribble

or

poin

t) Pseudo

pixel-level

labels

Pre-processing

(region growing/

k-means clustering)

Post-processing(DenseCRF/

Graph Search)

Training deep segmentation networks(UNet/FCN/ENet )

Abnorm

al?

Gradients

Weakly supervised object localization

Gra

d-C

AM

Thresholding

method

Training CNN with image-level annotations

Figure 5. Illustration of weakly supervised learning techniques. Left-to-right: (a) A representative solution for weakly supervised object localization using image-level annotations. (b) A representative for weakly supervised semantic segmentation using box, scribble or point supervision.

Many successful applications have demonstrated that weakly supervised learning techniques are

expected to relieve the urgent demands of deep models for fine-grained annotations in many fine-grained biomedical image analysis tasks. Although the performance of weakly supervised learning has been notably improved, there still exists a gap between weakly supervised learning and fully supervised learning that requires to be furtherly studied. Besides, how to utilize the results of weakly supervised tasks to improve the performance of original tasks might be a potential and extended research topic.

4. Transfer learning techniques

Labeled biomedical images are scarce, while large-scale natural image sets with various levels of annotations are publicly available, such as ImageNet, COCO, and PascalVOC, etc. A natural idea is whether the annotated samples from other domains can help improve deep models in biomedical image analysis. Unfortunately, when the deep models trained in source domain (e.g., natural images and synthetic images) are applied to the target domain, i.e., biomedical images, the performance of deep models generally drops a lot. Transfer learning is a key technique to solve the problem of model adaptation across domains.

Task2 (Modality 2)

Tas

k-s

pec

ific

lay

ers

Lea

rn a

sh

ared

feat

ure

re

pre

sen

tati

on

s

Inh

eren

t co

rrel

atio

ns

bet

wee

n t

wo

task

s Task1 (Modality 1)

Biomedical images

Biomedical images

Tas

k-s

pec

ific

lay

ers

Fix

ed f

eatu

re e

xtr

acto

r

Pre-training

ImageNetCOCO

Biomedical images

Fine-tuning

Tas

k-s

pec

ific

lay

ers

Lea

rn d

om

ain-i

nvari

ant

featu

re r

epre

sen

tati

on

s

Do

mai

n d

iscr

imin

ato

r

source or target

Source domain

Labeled

Target domain

Unlabled

Bac

kbone

net

wo

rk

Figure 6. Illustration of transfer learning techniques. Left-to-right: (a) Fine-tuning. Deep models are initially pre-trained on ImageNet or COCO and sequentially fine-tuned on biomedical images by fixing some layers and updating weights of task-specific layers at a small learning rate. (b) Multi-task learning. Deep models for multi-task learning that share some layers and differ in task-specific layers are trained alternatively across different but related tasks. (c) Adversarial training. A domain discriminator branch network is added to standard deep model to motivate backbone network to learn domain invariant feature representation. The entire model is optimized alternatively similarly with multi-task learning.

Building every deep model from scratch is time-consuming and relies heavily on large-scale annotated

samples. Transfer learning techniques focuses on extracting the knowledge gained while solving problems in source domain and applying it to different but related problems in target domain. For convenience and efficiency, the image sets with large-scale annotated samples are prone to being selected as source domain in many computer vision applications. To sum up, the transfer learning methods have roughly experienced five stages development(Zhang, 2019), including instance re-weighting adaptation(Huang et al., 2007), classifier adaptation(Lixin Duan et al., 2012), feature adaptation(Xu et al., 2016), deep network adaptation(Long et al., 2015) and adversarial adaptation(Chen et al., 2018). The research frontier problems of transfer learning techniques have been gradually moved from simple image classification tasks(Lixin Duan et al., 2012) to complex and fine-grained tasks, such as object detection(Chen et al., 2018; Inoue et al., 2018) and semantic segmentation(Yang Zhang et al., 2017; Kamnitsas et al., 2017a), etc. Compared with deep models trained from scratch in target domain or trained in source domain but applied directly in target domain, deep models using transfer learning techniques generally achieve much better performance in various computer vision tasks. Therefore, transfer learning might be a promising technique for applying deep learning methods effectively to biomedical image analysis tasks, especially in the small sample cases.

Many studies have adopted transfer learning techniques to improve deep models in biomedical image analysis tasks (Table 3), mainly involving fine-tuning(Kermany et al., 2018; Tajbakhsh et al., 2016; Zhou et al., 2017; Zhou et al., 2019), multi-task learning(Moeskops et al., 2016; Liao and Luo, 2017; Samala et al., 2017) and adversarial training(Dong et al., 2018; Moeskops et al., 2017; Javanmardi and Tasdizen, 2018).

Fine-tuning might be one of the most popular transfer learning techniques. This method (Fig. 6a) generally adapts deep models (e.g., AlexNet(Tajbakhsh et al., 2016; Zhou et al., 2017; Zhou et al., 2019) and InceptionV3(Kermany et al., 2018)) pre-trained on ImageNet to biomedical image analysis by freezing some layers in pre-trained deep models while retraining the others (task-specific layers). The fixed layers are supposed to encode the knowledge gained while solving problems in source domain and act as a fixed feature extractor. During fine-tuning process, the weights of deep models are generally updated at a small learning rate nearby the values of pre-trained weights, equivalent to narrowing the search space of parameters. Thus, fine-tuning method is expected to reduce the dependency of deep models on large-scale annotated biomedical images. Existing studies(Kermany et al., 2018; Tajbakhsh et al., 2016; Zhou et al., 2017; Zhou et al., 2019) consistently reveal that deep models trained by fine-tuning method are more robust and accurate than counterparts trained from scratch in various biomedical image analysis tasks.

Multi-task learning aims to exploit the inherent correlations amongst different but related tasks, such as similar tasks in different biomedical imaging modalities(Moeskops et al., 2016; Samala et al., 2017), and different tasks in same biomedical imaging modalities(Liao and Luo, 2017). The deep models modified for multiple tasks (e.g., ImageNet DCNN(Samala et al., 2017) and ResNet-50(Liao and Luo, 2017)) generally share some layers to learn a same feature representation but differ in task-specific layers to perform multiple tasks (Fig. 6b). Thus, these multi-task deep models can be jointly trained on multiple image sets for multiple biomedical image analysis tasks. In comparison of respective single-task learning, multi-task learning techniques in these studies(Moeskops et al., 2016; Liao and Luo, 2017; Samala et al., 2017) are prone to achieving better results. It implies that multi-task learning techniques have great potential to effectively apply train deep models in biomedical image analysis when training samples from

a single modality are limited. Table 3. Representative papers using transfer learning techniques in biomedical image analysis based on deep models.

Reference Transfer learning techniques

Tasks Dataset Deep models

Tajbakhsh et al. (2016)

Fine-tuning

Polyp detection 40 short colonoscopy videos

AlexNet Pulmonary embolism detection

121 CT pulmonary angiography datasets with a total of 326 PEs

Colonoscopy frame classification 6 complete colonoscopy videos Intima-media boundary segmentation

92 carotid intima-media thickness videos

Moeskops et al. (2016)

Multi-task learning Anatomical structure segmentation 34 MR brain images, 34 MR breast images and 10 cardiac CT angiography scans

Customized CNN model

Zhou et al. (2017)

Fine-tuning

Colonoscopy frame classification 6 complete colonoscopy videos

AlexNet Polyp detection 38 short colonoscopy videos

Pulmonary embolism detection 121 CT pulmonary angiography datasets with a total of 326 PEs

Moeskops et al. (2017)

Adversarial training

Brain MRI segmentation 35 MR brain images from adult subjects and 20 axial MR brain images from elderly subjects

Customized CNN model

Samala et al. (2017)

Multi-task learning Classification of malignant and benign breast masses

1,655 SFM views and 310 DM views with 2454 masses (1057 malignant, 1397 benign)

ImageNet DCNN

Liao et al. (2017)

Multi-task learning Skin lesion classification 21657 dermoscopy images that contain both the skin lesion and body location labels

ResNet-50

Zhou et al. (2018)

Fine-tuning Carotid intima-media thickness video interpretation

92 carotid intima-media thickness videos from 23 patients

AlexNet

Kermany et al. (2018)

Fine-tuning

Age-related macular degeneration and diabetic macular edema classification

207,130 retina OCT images Inception V3 architecture

Pneumonia Detection 5,232 chest X-ray images from a total of 5,856 patients

Dong et al. (2018)

Adversarial training

Chest organ segmentation 247 grayscale chest X-rays and 221 grayscale chest X-rays

FCN-based model

Mehran et al. (2018)

Adversarial training

Eye vasculture segmentation 40 eye fundus images FCN-based model Neuron membrane detection

100 electron microscopy images of size 1024×1024 and 125 electron microscopy images of size 1250×1250

Domain adversarial training techniques try to enforce deep models to learn a domain invariant

representation (e.g., deep features) for related tasks across different domains (e.g., two biomedical image sets from different sources(Dong et al., 2018; Javanmardi and Tasdizen, 2018)) or motivate deep models to generate outputs close to ground-truth(Dong et al., 2018; Moeskops et al., 2017). In general, these techniques are implemented by adding a branch network of domain discriminator to a standard deep model, where domain discriminator and standard deep model share one backbone network (Fig. 6c). Thus, similar to multi-task learning, the entire model can be optimized alternatively across different domains. Domain adversarial training techniques are expected to reduce the distribution discrepancy of deep features(Dong et al., 2018; Javanmardi and Tasdizen, 2018) (or outputs(Dong et al., 2018; Moeskops et al., 2017)) across domains and thus improve the performance of deep models in biomedical image analysis even without the requirement of annotated samples in target domain.

This key technique for deep learning of small sample in biomedical image analysis mainly focuses on how to extract the knowledge gained while solving problems in source domain and apply it to different but related problems in target domain, thus making the deep models converge quickly, robustly and accurately especially in small sample cases. Although there exists various transfer learning techniques, fine-tuning a deep model that is pre-trained on ImageNet, COCO or PascalVOC might be a good start to perform biomedical image analysis tasks with inadequate annotated samples.

5. Active learning techniques

The performance of deep models is gradually improved with the increase of annotated samples before reaching an inflection point, while after that the increase of annotated samples will not improve deep models much (Fig. 7a). Not every annotated sample contributes equally to deep models. Selecting the few most valuable samples to accelerate the arrival of inflection point might be a potential solution to reduce the dependence of deep models on large-scale annotated samples, and that is what active learning techniques expect to do. Due to the high cost of labeling multimodal biomedical images, it is of great value to achieve the performance of deep models trained on small-scale biomedical images comparable to that of deep models trained on large-scale biomedical images. The active learning technique might be a promising approach to achieve this goal. The problem that active learning techniques expect to solve is how to train an effective model with the least annotation cost. There are generally four key components in active learning framework (Fig. 7b), including low-cost unlabeled samples, an oracle, a task-specific model and a querying strategy, where the oracle plays the role of an annotator (e.g., biomedical expert). At a given point, querying strategy selects a sample that is most valuable to train task-specific model from low-cost unlabeled samples. The selected sample is labeled by oracle and is subsequently added to training set to furtherly train task-specific model. The core problem that has been investigated over years in active learning is how to customize a querying strategy for a specific task. Many querying strategy frameworks have been

proposed, involving query by committee, expected model change, uncertainty sampling and expected error reduction, etc. The majority of these querying strategies are designed to query the labels for the most informative sample(Zhou et al., 2017; Zhou et al., 2019; K. Wang et al., 2017; Huang et al., 2014) or (and) the most representative sample(Zhou et al., 2019; Huang et al., 2014; S.-J. Huang et al., 2018; Wang et al., 2016). Compared with the random sampling strategy that is generally used to train deep models (also called passive learning), the querying strategy in active learning is able to use the selected and labeled samples to train deep models more efficiently and effectively. The current research of active learning techniques focus on solving complex problems in practical application scenarios, such as active learning with weak supervision(Hua et al., 2018; Xu et al., 2017) , cost-sensitive active learning(S.-J. Huang et al., 2017; Yan and Huang, 2018) and active learning for deep models(Hua et al., 2018; Sener and Savarese, 2017; K. Wang et al., 2017) , etc.

Oracle Labeled samples

Outputs

Features

Train

Unlabeled samples

Inference

Queryingstrategy

Entropy

Diversity

DistanceDeep model

Number of training samples with annotations

Per

form

an

ce o

f d

eep

mo

del

s

Inflection point

Active learningRandom sampling

Figure 7. Left-to-right: (a) The learning curve of model performance over training sample size; (b) illustration of active learning

framework. Table 4. Representative papers using active learning techniques in biomedical image analysis based on deep models.

Reference Querying strategy Tasks Dataset Deep models

Zhou et al. (2017)

Taking the diversity and entropy of model’s predictions for augmented unlabeled samples as a sampling criterion

Colonoscopy frame classification

6 complete colonoscopy videos

AlexNet Polyp detection 38 short colonoscopy videos

Pulmonary embolism detection

121 CT pulmonary angiography datasets with a total of 326 PEs

Yang et al. (2017)

Taking the similarity and uncertainty of unlabeled samples determined by multiple FCNs’ predictions as querying strategy for annotation suggestion

Gland segmentation 165 colon histology images(Sirinukunwattana et al., 2017) FCN

Lymph node segmentation

74 ultrasound images(Zhang et al., 2016b)

Chowdhury et al. (2017)

Uncertainty sampling based on model’s confidence in unlabeled patches

Cellular segmentation

NIH-3T3 (mouse embryonic fibroblasts), MCF10A (human breast epithelial cells) and HeLa-S3 (cervix adenocarcinoma)(Van Valen et al., 2016)

Customized CNN-based model

Gorriz et al. (2017)

Pixel-wise uncertainty built by using Monte Carlo dropout to deep model during testing on unlabeled samples

Melanoma segmentation

2000 RGB dermoscopy images(Gutman et al., 2016)

UNet

Folmsbee et al. (2018)

Uncertainty sampling based on the qualitative comparison of the model’s outputs for unlabeled samples by a trained pathologist

Tissue classification in oral cavity cancer

143 tissue pathology slides from oral cavity cancer tumor resections

Customized CNN-based model

Sourati et al. (2018)

Sampling criterion formulated by diversified Fisher information

Newborn and adolescent brain segmentation

66 brain MR images of adolescents and 25 brain MRI images of newborns(Makropoulos et al., 2018)

Customized CNN-based model

Ozdemir et al. (2018)

Uncertainty sampling based on prediction variance and representativeness sampling based on content distance

Bone and muscle tissue segmentation

36 MR images of patients diagnosed with rotator cuff tear (shoulders)

DCAN(Chen et al., 2017)

Zhou et al. (2018)

Taking the diversity and entropy of model’s predictions for augmented unlabeled samples as a sampling criterion

Carotid intima-media thickness video interpretation

92 carotid intima-media thickness videos from 23 patients

AlexNet

Smailagic et al. (2018)

Taking the maximum mean distance between unlabeled and labelled samples in the oriented FAST and rotated BIREF feature space as sampling criterion

Diabetic retinopathy detection

1200 eye fundus images from 654 diabetic and 546 healthy patients

Inception V3 Breast cancer classification

400 breast tissue cells images(Aresta et al., 2019)

Skin cancer classification

900 benign and malignant cell tissue images(Gutman et al., 2016)

In order to dramatically reduce annotation cost, many studies have adopted active learning techniques to train deep models in biomedical image analysis tasks (Table 4), mainly involving classification tasks(Zhou et al., 2017; Zhou et al., 2019; Chowdhury et al., 2017; Folmsbee et al., 2018; Smailagic et al., 2018) and segmentation tasks(Yang et al., 2017; Gorriz et al., 2017; Ozdemir et al., 2018; Sourati et al., 2018). For relatively simple biomedical image classification tasks, the predictions of deep classification models for unlabeled samples are generally useful for active learning techniques. Many querying strategies have formulated the criterion for uncertainty sampling based on model’s predictions, such as the symmetric Kullback Leibler divergence and entropy of model’s predictions for augmented samples(Zhou et al., 2017; Zhou et al., 2019), model’s confidence in unlabeled patches(Chowdhury et al., 2017) and subjective comparison of CNN’s predictions from an experienced pathologist(Folmsbee et al., 2018). Regarding to relatively complex biomedical image segmentation tasks, the uncertainty of model’s prediction mask for individual unlabeled sample and the representativeness amongst unlabeled samples based on the distance of deep features are generally combined to formulate the sampling criterion for querying strategy(Yang et al., 2017; Ozdemir et al., 2018; Sourati et al., 2018). To formulate a more effective criterion for uncertainty sampling, there generally exists three major approaches, where multiple predictions are required for the same candidate unlabeled sample. First, under the consideration that an uncertain sample is prone to making deep models sensitive to data augmentation, the sampling criterion can be formulated based on model’s predictions for multiple augmented samples of a candidate(Zhou et al., 2017; Zhou et al., 2019). Second, with the claim that an uncertain sample is prone to being sensitive to dropout operation applied to CNN while a confident sample is generally less sensitive to dropout, the sampling criterion can be formulated based on the inconsistency of model’s predictions at different random dropout conditions(Gorriz et al., 2017; Ozdemir et al., 2018). Third, the strategy of query by committee trains multiple deep models to construct a committee and thus adopt the variance of these models’ predictions as sampling criterion(Yang et al., 2017). Existing studies using active learning techniques to train deep models on multimodal biomedical images have achieved impressive results in reducing the required annotation cost. This key technique for deep learning of small sample data mainly focuses on how to customize a querying strategy for a specific biomedical image analysis task that can select the few most valuable samples for annotation, and how to effectively train deep models with these actively selected samples. However, querying for the few most valuable samples from unlabeled samples is computationally expensive in general and in practice, training deep models with selected samples effectively is very challenging. Combining active learning techniques and transfer learning techniques(Zhou et al., 2017; Zhou et al., 2019) might be one promising research method. Such combination has the potential to make computer-aided diagnosis system based on deep models incrementally evolve through online feedback of false instances corrected by biomedical experts, where the challenge is how to deal with catastrophic forgetting of false instances.

6. Miscellaneous techniques

6.1 Domain knowledge

Biomedical image analysis is a highly specialized work, which requires large expertise. In general, there is a large amount of prior domain knowledge in a specific biomedical image analysis task. The prior domain knowledge is able to regularize deep models in a certain sense, which can narrow the parameter space of deep models and thus relieve overfitting. Therefore, combining domain knowledge might be a promising approach to alleviate the dependence of deep learning on large-scale labeled samples in medical image analysis applications at the meantime of improving the robustness of deep models.

Table 5. Representative papers using domain knowledge in biomedical image analysis based on deep models.

There are two main approaches to combine domain knowledge and deep learning in biomedical image

analysis tasks (Table 5): (1) the domain knowledge is characterized and digitized by hand-crafted features, which is furtherly fused with deep features(Xie et al., 2019; Xie et al., 2018) , and (2) the domain knowledge is used as a design guide for deep networks(Yan et al., 2015; Fu et al., 2019). Deep models using prior domain knowledge in these studies(Xie et al., 2019; Xie et al., 2018; Yan et al., 2015; Fu et al., 2019) consistently achieved superior performance over standard methods.

Domain knowledge is important to train deep models for a specific biomedical image analysis task. However, the generality and end-to-end learning framework of deep models suffers due to the use of

Reference Knowledge & method Tasks

Xie et al. (2018)(2019)

The malignant nodule generally has a lobed contour, an unsmooth edge and an inhomogeneous density; characterized and digitized by hand-crafted features, and furtherly fused with deep features in decision-level.

Benign-malignant nodule diagnosis in pulmonary CT images

Yan et al. (2015)

Radiologists usually recognize the body part of a slice based on local discriminative regions rather than global image context; used as a design guide for multi-stage deep network.

CT slice-based body part recognition

Fu et al. (2019)

Global anterior segment structure, local iris region and anterior chamber angle (ACA) patch are clinically important to angle-closure detection; used as a design guide for multilevel deep network.

Closed-angle detection in anterior segment OCT (AS-OCT) images

task-specific domain knowledge. Besides, distilling prior domain knowledge requires years of clinical experience in biomedical image analysis, where only experienced biomedical experts can do that. Thus, digging deep into clinical expertise underlying biomedical images through collaboration with clinicians might be a good start to apply deep learning techniques in biomedical image analysis tasks with inadequate annotated samples.

6.2 Shallow methods

Compared with the traditional shallow methods based on hand-crafted features, deep learning techniques have made significant progress in various computer vision applications, which relies heavily on large-scale labeled samples. With inadequate training samples, the deep models are prone to suffering from overfitting, thus leading to a big drop in performance. In comparison of deep learning techniques, the shallow methods generally have lower dependency on large-scale labeled samples due to the more general hand-crafted features and lower model complexity. A natural research idea is to combine shallow methods and deep models to alleviate the dependence on large-scale labeled samples and improve performance. Shallow methods and deep models can be combined in four different levels for biomedical image analysis (Table 6), including input-level(Wang et al., 2014) (e.g., original image and edge image), feature-level(Lai and Deng, 2018) (e.g., deep features and hand-crafted features), decision-level(Xie et al., 2018; Wang et al., 2014; Jianpeng Zhang et al., 2018) (e.g., CNN-based classifier and SVM) and mixed-level(Wang et al., 2014) (e.g., SVM classifier based on deep features). The combined model is expected to inherit the advantages of both shallow methods and deep learning techniques to improve the analysis performance and generalization especially in small sample cases. Table 6. Representative papers combining shallow methods and deep models in biomedical image analysis.

In summary, combining traditional shallow methods and deep models is a promising approach to apply

deep learning of small sample in biomedical image analysis. This key technique mainly focuses on how to select appropriate hand-crafted features and design the fusion strategy for a specific biomedical image analysis task. However, a key limitation of these studies is that the fusion strategy is so tricky that an inappropriate combination might aggravate overfitting problem in small sample cases, thus leading to a big drop in performance. Prior domain knowledge might be a good design guide for hand-crafted features and fusion strategy.

6.3 Attention mechanism

Table 7. Representative papers using attention mechanism in biomedical image analysis based on deep models.

Deep models show an attention mechanism of background suppression and foreground enhancement

in feedforward operation. As mentioned earlier, this attention mechanism can be used for visual explanation and weakly supervised object detection and segmentation. The attention mechanism is a useful tool to refine the deep feature representations. Through regulating the attention of deep models, deep models can observe input image more “carefully”, which is expected to improve the performance

Reference Fusion level & method Tasks

Wang et al. (2014)

Input-level; edges based on a thresholding method and Laplacian of Gaussian + CNN-based classifier Mitosis detection in breast

cancer pathology images Mixed-level; random forest classifier + the deep features Decision-level; 2 random forest classifiers + a CNN-based classifier

Xie et al. (2018)

Decision-level; texture descriptor based on gray level co-occurrence matrix (GLCM), Fourier shape descriptor and CNN-based deep features are fused by three AdaBoosted backpropagation neural networks (BPNN)

Benign-malignant nodule diagnosis in pulmonary CT images

Zhang et al. (2018)

Decision-level; the bag of feature (BoF), local binary pattern (LBP) features and CNN-based deep features are fused by BPNN

ImageCLEF 2016 medical classification task(García Seco de Herrera et al., 2016)

Lai et al. (2018)

Feature-level; hand-crafted features based on texture and color statistics and CNN-based deep features are fused by the multilayer perceptron

Skin lesion classification and tissue classification tasks

Reference Method Tasks

Wang et al. (2018)

Adding an attention network branch based on Grad-CAM to a ResNet-152 network

Lung disease diagnosis in chest X-ray images

Schlemper et al. (2018)

Adding two designed attention-gated branches to a CNN-based network

Ultrasound scan plane detection

Schlemper et al. (2018)

Adding attention-gated skip connections to a UNet Pancreas segmentation in abdominal 3D CT scans

Zhang et al. (2019)

Using designed attention residual learning (ARL) block Skin lesion classification

of deep learning in small sample cases. In some biomedical image analysis tasks (Table 7), researchers have tried to regulate the attention of

deep models by adding attention branch network(Wang and Xia, 2018; Schlemper et al., 2018; Oktay et al., 2018) or self-attention blocks(Zhang et al., 2019) to standard deep models. These methods notably improve the analysis performance in comparison of counterparts without attention regulation. Properly regulating the attention of the deep models can be very helpful to biomedical image analysis. The attention mechanism enables deep models to focus on both global and local representation of input images simultaneously without using any extra localization modules. Such ability is equivalent to stimulating deep models to observe the input image more “carefully”, which is much more valuable for biomedical image analysis in small sample cases. Besides, this technique might be an available approach to extend weakly supervised learning techniques (e.g., weakly supervised object localization) by utilizing the (intermediate) results of weakly supervised tasks (e.g., attention map or lesion location) to improve the performance of original tasks (e.g., image classification).

6.4 Data Augmentation

Visual appearance of the object of interest generally experiences various variations, involving illumination variation, scale variation, ration, translation and deformation, etc. Machine learning theory claims that training a model on an image set with higher diversity is prone to getting a more robust and generalized model. However, collecting and labeling all the images that involve these variations seems to be an impossible task. Data augmentation techniques are thus proposed to increase both the amount and diversity of training samples by applying various transformations (e.g., horizontally flipping) to original samples. Intuitively, data augmentation is used to teach a model about invariances behind the variations in data domain(Cubuk et al., 2019). Many data augmentation techniques have been proposed with the development of deep learning, concisely categorized as classical transformations (Rajpurkar et al., 2018; Rajpurkar et al., 2017) , generative adversarial network (GAN)(S.-W. Huang et al., 2018; Pan et al., 2018) and learning to augment(Cubuk et al., 2019; Perez and Wang, 2017).

Classical transformations. Classical transformations mainly involve horizontal flipping, random cropping (extracting patches), random scaling, random rotating, translations, shearing and elastic deformations, which can be performed on-the-fly or off-line. In general, the classical transformations are widely adopted in deep learning techniques and the best augmentation strategies are dataset-specific and task-specific. To be more concise, we survey the classical transformation techniques based on some representative applications in biomedical image classification and segmentation tasks (Table 8). Random flipping, random rotating, and random cropping are the three most frequently-used classical transformations. Moreover, extracting patches from original images is prone to being used to augment training samples from high-resolution biomedical images, especially in small sample cases. Besides, elastic deformations seem to be important to biomedical image segmentation tasks since deformations are the most common variations in tissue as Ronneberger et al.(2015) pointed out.

GAN. GAN techniques have been widely used to synthesize biomedical images(Pan et al., 2018; Huo et al., 2018; Chen et al., 2019). The synthetic biomedical images show a remarkable competitiveness in data augmentations due to their variety and reality in appearance variations. Biomedical images are generally synthesized from one modality to another, such as synthesizing PET from MRI in diagnosis of Alzheimer’s disease (AD)(Pan et al., 2018) and synthesizing CT from MR in cross-modality medical image segmentation tasks(Huo et al., 2018; Chen et al., 2019). The synthetic images can be furtherly used for all kinds of biomedical image analysis with or without images from source modality. For further exploration, various applications of GANs in biomedical imaging can be found in Yi’s work(Yi et al., 2018). Although the GAN techniques have achieved many successes in biomedical image synthesis, it is worth noting that there still exists lots of risks involving non-convergence, mode collapse and highly sensitive hyper-parameter selections.

Learning to augment. Compared with classical transformations, learning to augment aims to find the most effective data augmentation strategy automatically, which is sort of similar to the active sampling strategy in active learning techniques. For convenience, the majority of existing studies(Cubuk et al., 2019; Perez and Wang, 2017) evaluate their methods of learning to augment on natural datasets. Inspired by these studies, Zhao et al. (2019) proposed an automated data augmentation method for one-shot brain MRI segmentation. Specifically, they designed two independent spatial and appearance transformation models based on the UNet architecture(Ronneberger et al., 2015) to learn how to implement natural variations in MRI scans. The two transformation models were sequentially used to generate realistic labeled MRI scans to train segmentation model. This method achieved superior performance over state-of-the-art methods for one-shot biomedical image segmentation.

Data augmentation is very important to deep learning techniques that are applied in biomedical image analysis, especially in small sample cases. The augmentation techniques are able to increase both the amount and diversity of training samples with lower cost in comparison of manual acquisition. Proper augmentation helps relieve the suffering of deep models from small sample learning and make deep models robust to variations. However, aggressive and improper augmentation might break the nature distribution of training samples, thus leading to a notable drop in the analysis performance of practical applications. Therefore, a soft and stepwise augmentation strategy is suggested to be used in biomedical image analysis.

Table 8. Classical transformations applied in some representative biomedical image classification and segmentation tasks based on deep models.

7. Conclusion

A comprehensive survey on key SSL techniques for deep learning of small sample in clinical biomedical image analysis is presented. First, we introduce explanation techniques for deep models that are expected to satisfy the demands of clinical diagnosis for explainable and transparent decision-making process. Second, we present weakly supervised learning techniques to deal with the problem how to effectively train deep models with coarse-grained annotations for fine-grained biomedical image analysis tasks. Third, transfer learning techniques are surveyed to answer this question how to retrieve the knowledge gained by deep models while solving problems in other domains and reuse it in biomedical

Task Reference Dataset Data augmentation strategy

Classification

Anthimopoulos et al. (2016) 135 CT scans with 512×512 pixels per slice Extracting patches, random flipping and

random rotating

Tajbakhsh et al. (2016)

40 short colonoscopy videos Extracting patches, scaling, translation, horizontal and vertical mirroring and flipping

121 CT pulmonary angiography datasets with a total of 326 PEs

Extracting patches, translation and rotating

6 complete colonoscopy videos Extracting patches

Esteva et al. (2017) 129,450 skin lesion images comprising 2,032 different diseases

Random rotating, cropping and vertical flipping

Gondal et al. (2017) 88,702 fundus images in Kaggle Dataset and 89 fundus images in DiaretDB1 Dataset(Kauppi et al., 2007, p. 1).

Rotating, horizontal and vertical flipping

Quellec et al. (2017) 88,702 fundus images in Kaggle Dataset and 89 fundus images in DiaretDB1 Dataset(Kauppi et al., 2007, p. 1).

Rotating, translation, scaling, horizontal flipping and cropping

Zhou et al. (2017)

6 complete colonoscopy videos Extracting patches

38 short colonoscopy videos Scaling, translation, mirroring and flipping

121 CT pulmonary angiography datasets with a total of 326 PEs

Extracting patches, scaling, translation and rotating

Samala et al. (2017) 1,655 SFM views and 310 DM views with 2454 masses (1057 malignant, 1397 benign)

Flipping and rotating

Rajpurkar et al. (2018) 40,561 multi-view musculoskeletal radiographs

Random horizontal flipping and rotating

Fernando et al. (2018) 1300 skin lesion images(Ballerini et al., 2013) Web-crawling

Smailagic et al. (2018)

1200 eye fundus images from 654 diabetic and 546 healthy patients

Cropping, flipping and translation 400 breast tissue cells images(Aresta et al., 2019) 900 benign and malignant cell tissue images(Gutman et al., 2016)

Xie et al. (2018) 1018 clinical chest CT scans with lung nodules(Armato et al., 2011)

Random translation, rotating and horizontal or vertical flipping

Lai et al. (2018)

2828 fundamental tissue images in HIS2828 dataset Randomly cropping, horizontal or

vertical flipping 2000 skin lesion images in ISIC2017 dataset(Codella et al., 2018)

Schlemper et al. (2018) 2694 2D ultrasound examinations of volunteers with gestational ages between 18 and 22 weeks

Horizontal and vertical translation, horizontal flipping, rotating and scaling

Zhang et al. (2019) 2750 skin dermoscopy images in ISIC2017 dataset(Codella et al., 2018)

Random rotation, scaling, horizontal and vertical flip

Segmentation

Ronneberger et al. (2015)

30 images (512x512 pixels) from serial section transmission electron microscopy

Elastic deformations and drop-out 35 partially annotated light microscopic images 20 partially annotated light microscopic images

Chen et al. (2016) 85 histopathological images (benign / malignant = 37 / 48)

Translation, rotating, and elastic deformations

Kamnitsas et al. (2016) 61 brain MRI sequences Adding images reflected along the

sagittal axis and intensity shifting

Isensee et al. (2017) 135 MRI scans of glioblastoma and 108 MRI scans of lower grade glioma

Random rotating, scaling, elastic deformations, gamma correction augmentation and mirroring

Schlemper et al. (2018)

150 gastric cancer 3D CT scans(Roth et al., 2017) Affine transformations, horizontal

flipping and random cropping 82 contrast enhanced 3D CT scans with pancreas(Roth et al., 2016)

Laves et al. (2018) 536 laryngeal endoscopic images Horizontal flipping and rotating

Kervadec et al. (2019) Prostate transversal T2-weighted MR images of 50 patients (https://promise12.grand-challenge.org)

Mirroring, flipping and rotating

Isensee et al. (2019)

100 cine-MRI images Elastic deformations, random scaling, random rotating, gamma augmentation, and spatial transformations

131 CT images 30 abdominal CT images 50 anisotropic MRI images

image analysis. Fourth, active learning techniques are discussed to solve the problem how to select the few most valuable samples for annotations and train deep models with them for optimal performance in biomedical image analysis. Finally, we study miscellaneous techniques that are also important to apply deep models for biomedical image analysis, involving data augmentation, domain knowledge, traditional shallow methods and attention mechanism. These key techniques are expected to effectively support the application of deep learning in clinical biomedical image analysis, and furtherly improve the analysis performance, especially when large-scale annotated samples are not available. To our best knowledge, this is the first attempt to present a comprehensive survey on key SSL techniques for deep learning of small sample in biomedical image analysis through combining with the development of related techniques in computer vision applications. Notably, our survey focusing on key SSL techniques can be mutually complementary with previous reviews on deep learning techniques in biomedical image analysis.

Acknowledgements

The authors would like to thank members of the Medical Image Analysis for discussions and suggestions. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

References

Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S., 2016. Lung Pattern

Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network. IEEE

Trans. Med. Imaging 35, 1207–1216. https://doi.org/10.1109/TMI.2016.2535865

Anwar, S.M., Majid, M., Qayyum, A., Awais, M., Alnowami, M., Khan, M.K., 2018. Medical Image

Analysis using Convolutional Neural Networks: A Review. J Med Syst 42, 226.

https://doi.org/10.1007/s10916-018-1088-1

Aresta, G., Araújo, T., Kwok, S., Chennamsetty, S.S., Safwan, M., Alex, V., Marami, B., Prastawa, M.,

Chan, M., Donovan, M., Fernandez, G., Zeineh, J., Kohl, M., Walz, C., Ludwig, F., Braunewell,

S., Baust, M., Vu, Q.D., To, M.N.N., Kim, E., Kwak, J.T., Galal, S., Sanchez-Freire, V., Brancati,

N., Frucci, M., Riccio, D., Wang, Y., Sun, L., Ma, K., Fang, J., Kone, I., Boulmane, L., Campilho,

A., Eloy, C., Polónia, A., Aguiar, P., 2019. BACH: Grand challenge on breast cancer histology

images. Medical Image Analysis 56, 122–139. https://doi.org/10.1016/j.media.2019.05.010

Armato, S.G., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer, C.R., Reeves, A.P., Zhao, B., Aberle,

D.R., Henschke, C.I., Hoffman, E.A., Kazerooni, E.A., MacMahon, H., van Beek, E.J.R.,

Yankelevitz, D., Biancardi, A.M., Bland, P.H., Brown, M.S., Engelmann, R.M., Laderach, G.E.,

Max, D., Pais, R.C., Qing, D.P.-Y., Roberts, R.Y., Smith, A.R., Starkey, A., Batra, P., Caligiuri,

P., Farooqi, A., Gladish, G.W., Jude, C.M., Munden, R.F., Petkovska, I., Quint, L.E., Schwartz,

L.H., Sundaram, B., Dodd, L.E., Fenimore, C., Gur, D., Petrick, N., Freymann, J., Kirby, J.,

Hughes, B., Vande Casteele, A., Gupte, S., Sallam, M., Heath, M.D., Kuhn, M.H., Dharaiya, E.,

Burns, R., Fryd, D.S., Salganicoff, M., Anand, V., Shreter, U., Vastagh, S., Croft, B.Y., Clarke,

L.P., 2011. The Lung Image Database Consortium (LIDC) and Image Database Resource

Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans: The

LIDC/IDRI thoracic CT database of lung nodules. Med. Phys. 38, 915–931.

https://doi.org/10.1118/1.3528204

Arun, A., Jawahar, C.V., Kumar, M.P., 2019. Dissimilarity Coefficient Based Weakly Supervised Object

Detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, pp. 9432–9441.

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., Samek, W., 2015. On Pixel-Wise

Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS

ONE 10, e0130140. https://doi.org/10.1371/journal.pone.0130140

Ballerini, L., Fisher, R.B., Aldridge, B., Rees, J., 2013. A Color and Texture Based Hierarchical K-NN

Approach to the Classification of Non-melanoma Skin Lesions, in: Celebi, M.E., Schaefer, G.

(Eds.), Color Medical Image Analysis, Lecture Notes in Computational Vision and

Biomechanics. Springer Netherlands, Dordrecht, pp. 63–86. https://doi.org/10.1007/978-94-

007-5389-1_4

Bearman, A., Russakovsky, O., Ferrari, V., Fei-Fei, L., 2016. What’s the Point: Semantic Segmentation

with Point Supervision, in: Leibe, B., Matas, J., Sebe, N., Welling, M. (Eds.), Computer Vision

– ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing, pp. 549–

565.

Bernard, O., Lalande, A., Zotti, C., Cervenansky, F., Yang, X., Heng, P.-A., Cetin, I., Lekadir, K., Camara,

O., Gonzalez Ballester, M.A., Sanroma, G., Napel, S., Petersen, S., Tziritas, G., Grinias, E.,

Khened, M., Kollerathu, V.A., Krishnamurthi, G., Rohe, M.-M., Pennec, X., Sermesant, M.,

Isensee, F., Jager, P., Maier-Hein, K.H., Full, P.M., Wolf, I., Engelhardt, S., Baumgartner, C.F.,

Koch, L.M., Wolterink, J.M., Isgum, I., Jang, Y., Hong, Y., Patravali, J., Jain, S., Humbert, O.,

Jodoin, P.-M., 2018. Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures

Segmentation and Diagnosis: Is the Problem Solved? IEEE Trans. Med. Imaging 37, 2514–2525.

https://doi.org/10.1109/TMI.2018.2837502

Bilen, H., Vedaldi, A., 2016. Weakly Supervised Deep Detection Networks, in: 2016 IEEE Conference

on Computer Vision and Pattern Recognition (CVPR). Presented at the 2016 IEEE Conference

on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA, pp. 2846–

2854. https://doi.org/10.1109/CVPR.2016.311

Bloch, N., Madabhushi, A., Huisman, H., 2015. NCI-ISBI 2013 Challenge: Automated Segmentation of

Prostate Structures. The Cancer Imaging Archive 370.

https://doi.org/10.7937/k9/tcia.2015.zf0vlopv

Bonechi, S., Andreini, P., Bianchini, M., Scarselli, F., 2018. Generating Bounding Box Supervision for

Semantic Segmentation with Deep Learning, in: Pancioni, L., Schwenker, F., Trentin, E. (Eds.),

Artificial Neural Networks in Pattern Recognition. Springer International Publishing, Cham, pp.

190–200. https://doi.org/10.1007/978-3-319-99978-4_15

Can, Y.B., Chaitanya, K., Mustafa, B., Koch, L.M., Konukoglu, E., Baumgartner, C.F., 2018. Learning

to Segment Medical Images with Scribble-Supervision Alone, in: Stoyanov, D., Taylor, Z.,

Carneiro, G., Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S., Bradley, A.,

Papa, J.P., Belagiannis, V., Nascimento, J.C., Lu, Z., Conjeti, S., Moradi, M., Greenspan, H.,

Madabhushi, A. (Eds.), Deep Learning in Medical Image Analysis and Multimodal Learning

for Clinical Decision Support, Lecture Notes in Computer Science. Springer International

Publishing, pp. 236–244.

Chen, C., Dou, Q., Chen, H., Qin, J., Heng, P.-A., 2019. Synergistic Image and Feature Adaptation:

Towards Cross-Modality Domain Adaptation for Medical Image Segmentation.

arXiv:1901.08211 [cs].

Chen, H., Qi, X., Yu, L., Dou, Q., Qin, J., Heng, P.-A., 2017. DCAN: Deep contour-aware networks for

object instance segmentation from histology images. Medical Image Analysis 36, 135–146.

https://doi.org/10.1016/j.media.2016.11.004

Chen, H., Qi, X., Yu, L., Heng, P.-A., 2016. DCAN: Deep Contour-Aware Networks for Accurate Gland

Segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition. Presented at the Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, pp. 2487–2496.

Chen, Y., Li, W., Sakaridis, C., Dai, D., Van Gool, L., 2018. Domain Adaptive Faster R-CNN for Object

Detection in the Wild, in: 2018 IEEE/CVF Conference on Computer Vision and Pattern

Recognition. Presented at the 2018 IEEE/CVF Conference on Computer Vision and Pattern

Recognition (CVPR), IEEE, Salt Lake City, UT, USA, pp. 3339–3348.

https://doi.org/10.1109/CVPR.2018.00352

Chowdhury, A., Biswas, S., Bianco, S., 2017. Active deep learning reduces annotation burden in

automatic cell segmentation (preprint). Bioinformatics. https://doi.org/10.1101/211060

Codella, N.C.F., Gutman, D., Celebi, M.E., Helba, B., Marchetti, M.A., Dusza, S.W., Kalloo, A., Liopyris,

K., Mishra, N., Kittler, H., Halpern, A., 2018. Skin lesion analysis toward melanoma detection:

A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the

international skin imaging collaboration (ISIC), in: 2018 IEEE 15th International Symposium

on Biomedical Imaging (ISBI 2018). Presented at the 2018 IEEE 15th International Symposium

on Biomedical Imaging (ISBI 2018), pp. 168–172. https://doi.org/10.1109/ISBI.2018.8363547

Cubuk, E.D., Zoph, B., Mane, D., Vasudevan, V., Le, Q.V., 2019. AutoAugment: Learning Augmentation

Strategies From Data, in: Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition. Presented at the Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, pp. 113–123.

Damodaram, M.S., Story, L., Eixarch, E., Patkee, P., Patel, A., Kumar, S., Rutherford, M., 2012. Foetal

volumetry using Magnetic Resonance Imaging in intrauterine growth restriction. Early Human

Development, 3rd International Conference on : “Nutrition and Care of the Preterm Infant:

Current Issues” 88, S35–S40. https://doi.org/10.1016/j.earlhumdev.2011.12.026

Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, Li Fei-Fei, 2009. ImageNet: A large-scale hierarchical

image database, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition.

Presented at the 2009 IEEE Computer Society Conference on Computer Vision and Pattern

Recognition Workshops (CVPR Workshops), IEEE, Miami, FL, pp. 248–255.

https://doi.org/10.1109/CVPR.2009.5206848

Diba, A., Sharma, V., Pazandeh, A., Pirsiavash, H., Gool, L.V., 2017. Weakly Supervised Cascaded

Convolutional Networks, in: 2017 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), IEEE, Honolulu, HI, pp. 5131–5139.

https://doi.org/10.1109/CVPR.2017.545

Dong, N., Kampffmeyer, M., Liang, X., Wang, Z., Dai, W., Xing, E., 2018. Unsupervised Domain

Adaptation for Automatic Estimation of Cardiothoracic Ratio, in: Frangi, A.F., Schnabel, J.A.,

Davatzikos, C., Alberola-López, C., Fichtinger, G. (Eds.), Medical Image Computing and

Computer Assisted Intervention – MICCAI 2018, Lecture Notes in Computer Science. Springer

International Publishing, pp. 544–552.

Erhan, D., Bengio, Y., Courville, A., Vincent, P., Box, P.O., 2009. Visualizing Higher-Layer Features of

a Deep Network. University of Montreal 1341, 1.

Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S., 2017. Dermatologist-

level classification of skin cancer with deep neural networks. Nature 542, 115–118.

https://doi.org/10.1038/nature21056

Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C., Corrado, G.,

Thrun, S., Dean, J., 2019. A guide to deep learning in healthcare. Nat Med 25, 24–29.

https://doi.org/10.1038/s41591-018-0316-z

Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A., 2010. The Pascal Visual Object

Classes (VOC) Challenge. Int J Comput Vis 88, 303–338. https://doi.org/10.1007/s11263-009-

0275-4

Folmsbee, J., Liu, X., Brandwein-Weber, M., Doyle, S., 2018. Active deep learning: Improved training

efficiency of convolutional neural networks for tissue classification in oral cavity cancer, in:

2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). Presented at the

2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), IEEE,

Washington, DC, pp. 770–773. https://doi.org/10.1109/ISBI.2018.8363686

Fong, R.C., Vedaldi, A., 2017. Interpretable Explanations of Black Boxes by Meaningful Perturbation,

in: 2017 IEEE International Conference on Computer Vision (ICCV). Presented at the 2017

IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, pp. 3449–3457.

https://doi.org/10.1109/ICCV.2017.371

Fu, H., Xu, Y., Lin, S., Wong, D.W.K., Baskaran, M., Mahesh, M., Aung, T., Liu, J., 2019. Angle-Closure

Detection in Anterior Segment OCT Based on Multilevel Deep Network. IEEE Trans. Cybern.

1–9. https://doi.org/10.1109/TCYB.2019.2897162

García Seco de Herrera, A., Schaer, R., Bromuri, S., Müller, H., 2016. Overview of the medical tasks in

ImageCLEF 2016. Proceedings of the 7th International Conference of CLEF Association (CLEF)

2016 219–232.

Gondal, W.M., Kohler, J.M., Grzeszick, R., Fink, G.A., Hirsch, M., 2017. Weakly-supervised localization

of diabetic retinopathy lesions in retinal fundus images, in: 2017 IEEE International Conference

on Image Processing (ICIP). Presented at the 2017 IEEE International Conference on Image

Processing (ICIP), IEEE, Beijing, pp. 2069–2073. https://doi.org/10.1109/ICIP.2017.8296646

Gorriz, M., Carlier, A., Faure, E., Giro-i-Nieto, X., 2017. Cost-Effective Active Learning for Melanoma

Segmentation. arXiv:1711.09168 [cs].

Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner,

K., Madams, T., Cuadros, J., Kim, R., Raman, R., Nelson, P.C., Mega, J.L., Webster, D.R., 2016.

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic

Retinopathy in Retinal Fundus Photographs. JAMA 316, 2402.

https://doi.org/10.1001/jama.2016.17216

Gutman, D., Codella, N.C.F., Celebi, E., Helba, B., Marchetti, M., Mishra, N., Halpern, A., 2016. Skin

Lesion Analysis toward Melanoma Detection: A Challenge at the International Symposium on

Biomedical Imaging (ISBI) 2016, hosted by the International Skin Imaging Collaboration

(ISIC). arXiv:1605.01397 [cs].

He, K., Gkioxari, G., Dollar, P., Girshick, R., 2017. Mask R-CNN, in: Proceedings of the IEEE

International Conference on Computer Vision. Presented at the Proceedings of the IEEE

International Conference on Computer Vision, pp. 2961–2969.

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep Residual Learning for Image Recognition, in: 2016 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2016 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Las Vegas, NV, USA,

pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

Hua, G., Long, C., Yang, M., Gao, Y., 2018. Collaborative Active Visual Recognition from Crowds: A

Distributed Ensemble Approach. IEEE Trans. Pattern Anal. Mach. Intell. 40, 582–594.

https://doi.org/10.1109/TPAMI.2017.2682082

Huang, G., Liu, Z., Maaten, L. van der, Weinberger, K.Q., 2017. Densely Connected Convolutional

Networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),

IEEE, Honolulu, HI, pp. 2261–2269. https://doi.org/10.1109/CVPR.2017.243

Huang, J., Gretton, A., Borgwardt, K.M., Schölkopf, B., Smola, A.J., 2007. Correcting Sample Selection

Bias by Unlabeled Data, in: Advances in Neural Information Processing Systems. Presented at

the Advances in neural information processing systems, pp. 1–8.

Huang, S.-J., Chen, J.-L., Mu, X., Zhou, Z.-H., 2017. Cost-Effective Active Learning from Diverse

Labelers, in: Proceedings of the Twenty-Sixth International Joint Conference on Artificial

Intelligence. Presented at the Twenty-Sixth International Joint Conference on Artificial

Intelligence, International Joint Conferences on Artificial Intelligence Organization, Melbourne,

Australia, pp. 1879–1885. https://doi.org/10.24963/ijcai.2017/261

Huang, S.-J., Jin, R., Zhou, Z.-H., 2014. Active Learning by Querying Informative and Representative

Examples. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1936–1949.

https://doi.org/10.1109/TPAMI.2014.2307881

Huang, S.-J., Zhao, J.-W., Liu, Z.-Y., 2018. Cost-Effective Training of Deep CNNs with Active Model

Adaptation, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge

Discovery & Data Mining, KDD ’18. ACM, New York, NY, USA, pp. 1580–1588.

https://doi.org/10.1145/3219819.3220026

Huang, S.-W., Lin, C.-T., Chen, S.-P., Wu, Y.-Y., Hsu, P.-H., Lai, S.-H., 2018. AugGAN: Cross Domain

Adaptation with GAN-Based Data Augmentation, in: Ferrari, V., Hebert, M., Sminchisescu, C.,

Weiss, Y. (Eds.), Computer Vision – ECCV 2018. Springer International Publishing, Cham, pp.

731–744. https://doi.org/10.1007/978-3-030-01240-3_44

Huang, Z., Wang, X., Wang, Jiasi, Liu, W., Wang, Jingdong, 2018. Weakly-Supervised Semantic

Segmentation Network with Deep Seeded Region Growing, in: 2018 IEEE/CVF Conference on

Computer Vision and Pattern Recognition. Presented at the 2018 IEEE/CVF Conference on

Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, USA, pp. 7014–

7023. https://doi.org/10.1109/CVPR.2018.00733

Huo, Y., Xu, Z., Bao, S., Assad, A., Abramson, R.G., Landman, B.A., 2018. Adversarial synthesis

learning enables segmentation without target modality ground truth, in: 2018 IEEE 15th

International Symposium on Biomedical Imaging (ISBI 2018). Presented at the 2018 IEEE 15th

International Symposium on Biomedical Imaging (ISBI 2018), IEEE, Washington, DC, pp.

1217–1220. https://doi.org/10.1109/ISBI.2018.8363790

Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K., 2018. Cross-Domain Weakly-Supervised Object

Detection Through Progressive Domain Adaptation, in: 2018 IEEE/CVF Conference on

Computer Vision and Pattern Recognition. Presented at the 2018 IEEE/CVF Conference on

Computer Vision and Pattern Recognition (CVPR), IEEE, Salt Lake City, UT, pp. 5001–5009.

https://doi.org/10.1109/CVPR.2018.00525

Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H., 2018. Brain Tumor

Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge,

in: Crimi, A., Bakas, S., Kuijf, H., Menze, B., Reyes, M. (Eds.), Brainlesion: Glioma, Multiple

Sclerosis, Stroke and Traumatic Brain Injuries, Lecture Notes in Computer Science. Springer

International Publishing, pp. 287–297.

Isensee, F., Petersen, J., Kohl, S.A.A., Jäger, P.F., Maier-Hein, K.H., 2019. nnU-Net: Breaking the Spell

on Successful Medical Image Segmentation. arXiv:1904.08128 [cs].

Javanmardi, M., Tasdizen, T., 2018. Domain adaptation for biomedical image segmentation using

adversarial training, in: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI

2018). Presented at the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI

2018), IEEE, Washington, DC, pp. 554–558. https://doi.org/10.1109/ISBI.2018.8363637

Jin, B., Segovia, M.V.O., Susstrunk, S., 2017. Webly Supervised Semantic Segmentation, in: 2017 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp.

1705–1714. https://doi.org/10.1109/CVPR.2017.185

K. Wang, D. Zhang, Y. Li, R. Zhang, L. Lin, 2017. Cost-Effective Active Learning for Deep Image

Classification. IEEE Transactions on Circuits and Systems for Video Technology 27, 2591–

2600. https://doi.org/10.1109/TCSVT.2016.2589879

Kamnitsas, K., Baumgartner, C., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Nori, A.,

Criminisi, A., Rueckert, D., Glocker, B., 2017a. Unsupervised Domain Adaptation in Brain

Lesion Segmentation with Adversarial Networks, in: Niethammer, M., Styner, M., Aylward, S.,

Zhu, H., Oguz, I., Yap, P.-T., Shen, D. (Eds.), Information Processing in Medical Imaging,

Lecture Notes in Computer Science. Springer International Publishing, pp. 597–609.

Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D.,

Glocker, B., 2017b. Efficient Multi-Scale 3D CNN with Fully Connected CRF for Accurate

Brain Lesion Segmentation. Medical Image Analysis 36, 61–78.

https://doi.org/10.1016/j.media.2016.10.004

Kauppi, T., Kalesnykiene, V., Kamarainen, J.-K., Lensu, L., Sorri, I., Raninen, A., Voutilainen, R.,

Uusitalo, H., Kalviainen, H., Pietila, J., 2007. DIARETDB1 diabetic retinopathy database and

evaluation protocol, in: Medical Image Understanding and Analysis. Presented at the Medical

Image Understanding and Analysis, p. 61.

Ker, J., Wang, L., Rao, J., Lim, T., 2018. Deep Learning Applications in Medical Image Analysis. IEEE

Access 6, 9375–9389. https://doi.org/10.1109/ACCESS.2017.2788044

Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S., Liang, H., Baxter, S.L., McKeown, A., Yang,

G., Wu, X., Yan, F., Dong, Justin, Prasadha, M.K., Pei, J., Ting, M.Y.L., Zhu, J., Li, C., Hewett,

S., Dong, Jason, Ziyar, I., Shi, A., Zhang, R., Zheng, L., Hou, R., Shi, W., Fu, X., Duan, Y., Huu,

V.A.N., Wen, C., Zhang, E.D., Zhang, C.L., Li, O., Wang, X., Singer, M.A., Sun, X., Xu, J.,

Tafreshi, A., Lewis, M.A., Xia, H., Zhang, K., 2018. Identifying Medical Diagnoses and

Treatable Diseases by Image-Based Deep Learning. Cell 172, 1122-1131.e9.

https://doi.org/10.1016/j.cell.2018.02.010

Kervadec, H., Dolz, J., Tang, M., Granger, E., Boykov, Y., Ben Ayed, I., 2019. Constrained-CNN losses

for weakly supervised segmentation. Medical Image Analysis 54, 88–99.

https://doi.org/10.1016/j.media.2019.02.009

Kozuka, K., Niebles, J.C., 2017. Risky Region Localization with Point Supervision, in: 2017 IEEE

International Conference on Computer Vision Workshops (ICCVW). Presented at the 2017

IEEE International Conference on Computer Vision Workshop (ICCVW), IEEE, Venice, pp.

246–253. https://doi.org/10.1109/ICCVW.2017.38

Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., Sethi, A., 2017. A Dataset and a

Technique for Generalized Nuclear Segmentation for Computational Pathology. IEEE

Transactions on Medical Imaging 36, 1550–1560. https://doi.org/10.1109/TMI.2017.2677499

Lai, Z., Deng, H., 2018. Medical Image Classification Based on Deep Features Extracted by Deep Model

and Statistic Feature Fusion with Multilayer Perceptron . Computational Intelligence and

Neuroscience 2018, 1–13. https://doi.org/10.1155/2018/2061516

Laves, M.-H., Bicker, J., Kahrs, L.A., Ortmaier, T., 2019. A dataset of laryngeal endoscopic images with

comparative study on convolution neural network-based semantic segmentation. Int J CARS 14,

483–492. https://doi.org/10.1007/s11548-018-01910-0

LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning. Nature 521, 436–444.

https://doi.org/10.1038/nature14539

Li, Q., Arnab, A., Torr, P.H.S., 2018. Weakly- and Semi-supervised Panoptic Segmentation, in: Ferrari,

V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Computer Vision – ECCV 2018. Springer

International Publishing, Cham, pp. 106–124. https://doi.org/10.1007/978-3-030-01267-0_7

Liao, H., Luo, J., 2017. A Deep Multi-Task Learning Approach to Skin Lesion Classification, in:

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence. Presented at the

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, p. 7.

Lin, D., Dai, J., Jia, J., He, K., Sun, J., 2016. ScribbleSup: Scribble-Supervised Convolutional Networks

for Semantic Segmentation, in: 2016 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR). Presented at the 2016 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), IEEE, Las Vegas, NV, USA, pp. 3159–3167.

https://doi.org/10.1109/CVPR.2016.344

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., 2014.

Microsoft COCO: Common Objects in Context, in: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars,

T. (Eds.), Computer Vision – ECCV 2014, Lecture Notes in Computer Science. Springer

International Publishing, pp. 740–755.

Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak, J.A.W.M.,

van Ginneken, B., Sánchez, C.I., 2017. A survey on deep learning in medical image analysis.

Medical Image Analysis 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C., 2016. SSD: Single Shot

MultiBox Detector. arXiv:1512.02325 [cs] 9905, 21–37. https://doi.org/10.1007/978-3-319-

46448-0_2

Lixin Duan, Tsang, I.W., Dong Xu, 2012. Domain Transfer Multiple Kernel Learning. IEEE Trans.

Pattern Anal. Mach. Intell. 34, 465–479. https://doi.org/10.1109/TPAMI.2011.114

Long, M., Cao, Y., Wang, J., Jordan, M.I., 2015. Learning Transferable Features with Deep Adaptation

Networks, in: Proceedings of International Conference on Machine Learning. Presented at the

Proceedings of International Conference on Machine Learning, Lille, France, pp. 97–105.

Mahendran, A., Vedaldi, A., 2016. Salient Deconvolutional Networks, in: Leibe, B., Matas, J., Sebe, N.,

Welling, M. (Eds.), Computer Vision – ECCV 2016. Springer International Publishing, Cham,

pp. 120–135. https://doi.org/10.1007/978-3-319-46466-4_8

Maier, A., Syben, C., Lasser, T., Riess, C., 2019. A gentle introduction to deep learning in medical image

processing. Zeitschrift für Medizinische Physik 29, 86–101.

https://doi.org/10.1016/j.zemedi.2018.12.003

Makropoulos, A., Robinson, E.C., Schuh, A., Wright, R., Fitzgibbon, S., Bozek, J., Counsell, S.J.,

Steinweg, J., Vecchiato, K., Passerat-Palmbach, J., Lenz, G., Mortari, F., Tenev, T., Duff, E.P.,

Bastiani, M., Cordero-Grande, L., Hughes, E., Tusor, N., Tournier, J.-D., Hutter, J., Price, A.N.,

Teixeira, R.P.A.G., Murgasova, M., Victor, S., Kelly, C., Rutherford, M.A., Smith, S.M.,

Edwards, A.D., Hajnal, J.V., Jenkinson, M., Rueckert, D., 2018. The developing human

connectome project: A minimal processing pipeline for neonatal cortical surface reconstruction.

NeuroImage 173, 88–112. https://doi.org/10.1016/j.neuroimage.2018.01.054

Moen, E., Bannon, D., Kudo, T., Graf, W., Covert, M., Van Valen, D., 2019. Deep learning for cellular

image analysis. Nat Methods. https://doi.org/10.1038/s41592-019-0403-1

Moeskops, P., Veta, M., Lafarge, M.W., Eppenhof, K.A.J., Pluim, J.P.W., 2017. Adversarial Training and

Dilated Convolutions for Brain MRI Segmentation, in: Cardoso, M.J., Arbel, T., Carneiro, G.,

Syeda-Mahmood, T., Tavares, J.M.R.S., Moradi, M., Bradley, A., Greenspan, H., Papa, J.P.,

Madabhushi, A., Nascimento, J.C., Cardoso, J.S., Belagiannis, V., Lu, Z. (Eds.), Deep Learning

in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Lecture

Notes in Computer Science. Springer International Publishing, pp. 56–64.

Moeskops, P., Wolterink, J.M., van der Velden, B.H.M., Gilhuijs, K.G.A., Leiner, T., Viergever, M.A.,

Išgum, I., 2016. Deep Learning for Multi-Task Medical Image Segmentation in Multiple

Modalities. arXiv:1704.03379 [cs] 9901, 478–486. https://doi.org/10.1007/978-3-319-46723-

8_55

Navarro, F., Conjeti, S., Tombari, F., Navab, N., 2018. Webly Supervised Learning for Skin Lesion

Classification, in: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger,

G. (Eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2018,

Lecture Notes in Computer Science. Springer International Publishing, pp. 398–406.

Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J., 2017. Plug & Play Generative Networks:

Conditional Iterative Generation of Images in Latent Space, in: 2017 IEEE Conference on

Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE Conference on

Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp. 3510–3520.

https://doi.org/10.1109/CVPR.2017.374

Oktay, O., Schlemper, J., Folgoc, L.L., Lee, M., Heinrich, M., Misawa, K., Mori, K., McDonagh, S.,

Hammerla, N.Y., Kainz, B., Glocker, B., Rueckert, D., 2018. Attention U-Net: Learning Where

to Look for the Pancreas, in: International Conference on Medical Imaging with Deep Learning.

Presented at the MIDL 2018 Conference Submission, p. 10.

Ozdemir, F., Peng, Z., Tanner, C., Fuernstahl, P., Goksel, O., 2018. Active Learning for Segmentation by

Optimizing Content Information for Maximal Entropy. arXiv:1807.06962 [cs, stat] 11045, 183–

191. https://doi.org/10.1007/978-3-030-00889-5_21

Pan, Y., Liu, M., Lian, C., Zhou, T., Xia, Y., Shen, D., 2018. Synthesizing Missing PET from MRI with

Cycle-consistent Generative Adversarial Networks for Alzheimer’s Disease Diagnosis, in:

Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (Eds.), Medical

Image Computing and Computer Assisted Intervention – MICCAI 2018. Springer International

Publishing, Cham, pp. 455–463. https://doi.org/10.1007/978-3-030-00931-1_52

Perez, L., Wang, J., 2017. The Effectiveness of Data Augmentation in Image Classification using Deep

Learning. arXiv:1712.04621 [cs].

Petsiuk, V., Das, A., Saenko, K., 2018. RISE: Randomized Input Sampling for Explanation of Black-box

Models. arXiv:1806.07421 13.

Poplin, R., Varadarajan, A.V., Blumer, K., Liu, Y., McConnell, M.V., Corrado, G.S., Peng, L., Webster,

D.R., 2018. Predicting Cardiovascular Risk Factors from Retinal Fundus Photographs using

Deep Learning. Nat Biomed Eng 2, 158–164. https://doi.org/10.1038/s41551-018-0195-0

Qu, H., Wu, P., Huang, Q., Yi, J., Riedlinger, G.M., De, S., Metaxas, D.N., 2019. Weakly Supervised

Deep Nuclei Segmentation using Points Annotation in Histopathology Images, in: International

Conference on Medical Imaging with Deep Learning. Presented at the International Conference

on Medical Imaging with Deep Learning, pp. 390–400.

Quellec, G., Charrière, K., Boudi, Y., Cochener, B., Lamard, M., 2017. Deep image mining for diabetic

retinopathy screening. Medical Image Analysis 39, 178–193.

https://doi.org/10.1016/j.media.2017.04.012

Rajchl, M., Lee, M.C.H., Oktay, O., Kamnitsas, K., Passerat-Palmbach, J., Bai, W., Damodaram, M.,

Rutherford, M.A., Hajnal, J.V., Kainz, B., Rueckert, D., 2017. DeepCut: Object Segmentation

From Bounding Box Annotations Using Convolutional Neural Networks. IEEE Trans. Med.

Imaging 36, 674–683. https://doi.org/10.1109/TMI.2016.2621185

Rajpurkar, P., Irvin, J., Bagul, A., Ding, D., Duan, T., Mehta, H., Yang, B., Zhu, K., Laird, D., Ball, R.L.,

Langlotz, C., Shpanskaya, K., Lungren, M.P., Ng, A.Y., 2017. MURA: Large Dataset for

Abnormality Detection in Musculoskeletal Radiographs. arXiv:1712.06957 [physics].

Rajpurkar, P., Irvin, J., Ball, R.L., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz,

C.P., Patel, B.N., Yeom, K.W., Shpanskaya, K., Blankenberg, F.G., Seekins, J., Amrhein, T.J.,

Mong, D.A., Halabi, S.S., Zucker, E.J., Ng, A.Y., Lungren, M.P., 2018. Deep learning for chest

radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing

radiologists. PLoS Med 15, e1002686. https://doi.org/10.1371/journal.pmed.1002686

Razzak, M.I., Naz, S., Zaib, A., 2018. Deep Learning for Medical Image Processing: Overview,

Challenges and the Future, in: Dey, N., Ashour, A.S., Borra, S. (Eds.), Classification in BioApps:

Automation of Decision Making, Lecture Notes in Computational Vision and Biomechanics.

Springer International Publishing, Cham, pp. 323–350. https://doi.org/10.1007/978-3-319-

65981-7_12

Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You Only Look Once: Unified, Real-Time

Object Detection, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition

(CVPR). Presented at the 2016 IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), IEEE, Las Vegas, NV, USA, pp. 779–788. https://doi.org/10.1109/CVPR.2016.91

Ren, S., He, K., Girshick, R., Sun, J., 2017. Faster R-CNN: Towards Real-Time Object Detection with

Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149.

https://doi.org/10.1109/TPAMI.2016.2577031

Ribeiro, M.T., Singh, S., Guestrin, C., 2016. “Why Should I Trust You?”: Explaining the Predictions of

Any Classifier, in: Proceedings of the 22nd ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining - KDD ’16. Presented at the the 22nd ACM SIGKDD

International Conference, ACM Press, San Francisco, California, USA, pp. 1135–1144.

https://doi.org/10.1145/2939672.2939778

Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional Networks for Biomedical Image

Segmentation. arXiv:1505.04597 [cs].

Roth, H., Farag, A., Turkbey, E.B., Lu, L., Liu, J., Summers, R.M., 2016. Data From Pancreas-CT.

https://doi.org/10.7937/k9/tcia.2016.tnb1kqbu

Roth, H.R., Oda, H., Hayashi, Y., Oda, M., Shimizu, N., Fujiwara, M., Misawa, K., Mori, K., 2017.

Hierarchical 3D fully convolutional networks for multi-organ segmentation. arXiv:1704.06382

[cs].

Samala, R.K., Chan, H.-P., Hadjiiski, L.M., Helvie, M.A., Cha, K., Richter, C., 2017. Multi-task transfer

learning deep convolutional neural network: application to computer-aided diagnosis of breast

cancer on mammograms. Phys. Med. Biol. https://doi.org/10.1088/1361-6560/aa93d4

Schlemper, J., Oktay, O., Chen, L., Matthew, J., Knight, C., Kainz, B., Glocker, B., Rueckert, D., 2018.

Attention-Gated Networks for Improving Ultrasound Scan Plane Detection. arXiv:1804.05338

[cs].

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D., 2017. Grad-CAM: Visual

Explanations from Deep Networks via Gradient-Based Localization, in: 2017 IEEE

International Conference on Computer Vision (ICCV). Presented at the 2017 IEEE International

Conference on Computer Vision (ICCV), IEEE, Venice, pp. 618–626.

https://doi.org/10.1109/ICCV.2017.74

Sener, O., Savarese, S., 2017. Active Learning for Convolutional Neural Networks: A Core-Set Approach.

arXiv:1708.00489 [cs, stat].

Shu, J., Xu, Z., Meng, D., 2018. Small Sample Learning in Big Data Era. arXiv:1808.04572 [cs, stat].

Simonyan, K., Vedaldi, A., Zisserman, A., 2013. Deep Inside Convolutional Networks: Visualising Image

Classification Models and Saliency Maps. arXiv:1312.6034 [cs].

Sirinukunwattana, K., Pluim, J.P.W., Chen, H., Qi, X., Heng, P.-A., Guo, Y.B., Wang, L.Y., Matuszewski,

B.J., Bruni, E., Sanchez, U., Böhm, A., Ronneberger, O., Cheikh, B.B., Racoceanu, D., Kainz,

P., Pfeiffer, M., Urschler, M., Snead, D.R.J., Rajpoot, N.M., 2017. Gland segmentation in colon

histology images: The glas challenge contest. Medical Image Analysis 35, 489–502.

https://doi.org/10.1016/j.media.2016.08.008

Smailagic, A., Costa, P., Young Noh, H., Walawalkar, D., Khandelwal, K., Galdran, A., Mirshekari, M.,

Fagert, J., Xu, S., Zhang, P., Campilho, A., 2018. MedAL: Accurate and Robust Deep Active

Learning for Medical Image Analysis, in: 2018 17th IEEE International Conference on Machine

Learning and Applications (ICMLA). Presented at the 2018 17th IEEE International Conference

on Machine Learning and Applications (ICMLA), IEEE, Orlando, FL, pp. 481–488.

https://doi.org/10.1109/ICMLA.2018.00078

Sourati, J., Gholipour, A., Dy, J.G., Kurugol, S., Warfield, S.K., 2018. Active Deep Learning with Fisher

Information for Patch-Wise Semantic Segmentation, in: Stoyanov, D., Taylor, Z., Carneiro, G.,

Syeda-Mahmood, T., Martel, A., Maier-Hein, L., Tavares, J.M.R.S., Bradley, A., Papa, J.P.,

Belagiannis, V., Nascimento, J.C., Lu, Z., Conjeti, S., Moradi, M., Greenspan, H., Madabhushi,

A. (Eds.), Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical

Decision Support. Springer International Publishing, Cham, pp. 83–91.

https://doi.org/10.1007/978-3-030-00889-5_10

Tajbakhsh, N., Shin, J.Y., Gurudu, S.R., Hurst, R.T., Kendall, C.B., Gotway, M.B., Liang, J., 2016.

Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?

IEEE Trans. Med. Imaging 35, 1299–1312. https://doi.org/10.1109/TMI.2016.2535302

Tang, P., Wang, X., Bai, X., Liu, W., 2017. Multiple Instance Detection Network with Online Instance

Classifier Refinement, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition

(CVPR). Presented at the 2017 IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), IEEE, Honolulu, HI, pp. 3059–3067. https://doi.org/10.1109/CVPR.2017.326

Tang, P., Wang, X., Wang, A., Yan, Y., Liu, W., Huang, J., Yuille, A., 2018. Weakly Supervised Region

Proposal Network and Object Detection, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y.

(Eds.), Computer Vision – ECCV 2018. Springer International Publishing, Cham, pp. 370–386.

https://doi.org/10.1007/978-3-030-01252-6_22

Van Valen, D.A., Kudo, T., Lane, K.M., Macklin, D.N., Quach, N.T., DeFelice, M.M., Maayan, I.,

Tanouchi, Y., Ashley, E.A., Covert, M.W., 2016. Deep Learning Automates the Quantitative

Analysis of Individual Cells in Live-Cell Imaging Experiments. PLOS Computational Biology

12, e1005177. https://doi.org/10.1371/journal.pcbi.1005177

Wan, F., Wei, P., Jiao, J., Han, Z., Ye, Q., 2018. Min-Entropy Latent Model for Weakly Supervised Object

Detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Presented at the Proceedings of the IEEE Conference on Computer Vision and Pattern

Recognition, pp. 1297–1306.

Wang, H., Cruz-Roa, A., Basavanhally, A., Gilmore, H., Shih, N., Feldman, M., Tomaszewski, J.,

Gonzalez, F., Madabhushi, A., 2014. Mitosis detection in breast cancer pathology images by

combining handcrafted and convolutional neural network features. J. Med. Imag 1, 034003.

https://doi.org/10.1117/1.JMI.1.3.034003

Wang, H., Xia, Y., 2018. ChestNet: A Deep Neural Network for Classification of Thoracic Diseases on

Chest Radiography. arXiv:1807.03058.

Wang, Z., Du, B., Zhang, Lefei, Zhang, Liangpei, 2016. A batch-mode active learning framework by

querying discriminative and representative samples for hyperspectral image classification.

Neurocomputing 179, 88–100. https://doi.org/10.1016/j.neucom.2015.11.062

Xie, Y., Xia, Y., Zhang, J., Song, Y., Feng, D., Fulham, M., Cai, W., 2019. Knowledge-based

Collaborative Deep Learning for Benign-Malignant Lung Nodule Classification on Chest CT.

IEEE Trans. Med. Imaging 38, 991–1004. https://doi.org/10.1109/TMI.2018.2876510

Xie, Y., Zhang, J., Xia, Y., Fulham, M., Zhang, Y., 2018. Fusing texture, shape and deep model-learned

information at decision level for automated classification of lung nodules on chest CT.

Information Fusion 42, 102–110. https://doi.org/10.1016/j.inffus.2017.10.005

Xu, Y., Fang, X., Wu, J., Li, X., Zhang, D., 2016. Discriminative Transfer Subspace Learning via Low-

Rank and Sparse Representation. IEEE Trans. on Image Process. 25, 850–863.

https://doi.org/10.1109/TIP.2015.2510498

Xu, Y., Zhang, H., Miller, K., Singh, A., Dubrawski, A., 2017. Noise-Tolerant Interactive Learning Using

Pairwise Comparisons, in: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R.,

Vishwanathan, S., Garnett, R. (Eds.), Advances in Neural Information Processing Systems 30.

Curran Associates, Inc., pp. 2431–2440.

Yan, Y.-F., Huang, S.-J., 2018. Cost-Effective Active Learning for Hierarchical Multi-Label

Classification, in: Proceedings of the Twenty-Seventh International Joint Conference on

Artificial Intelligence. Presented at the Twenty-Seventh International Joint Conference on

Artificial Intelligence {IJCAI-18}, International Joint Conferences on Artificial Intelligence

Organization, Stockholm, Sweden, pp. 2962–2968. https://doi.org/10.24963/ijcai.2018/411

Yan, Z., Zhan, Y., Peng, Z., Liao, S., Shinagawa, Y., Metaxas, D.N., Zhou, X.S., 2015. Bodypart

Recognition Using Multi-stage Deep Learning, in: Ourselin, S., Alexander, D.C., Westin, C.-F.,

Cardoso, M.J. (Eds.), Information Processing in Medical Imaging. Springer International

Publishing, Cham, pp. 449–461. https://doi.org/10.1007/978-3-319-19992-4_35

Yang, L., Zhang, Y., Chen, J., Zhang, S., Chen, D.Z., 2017. Suggestive Annotation: A Deep Active

Learning Framework for Biomedical Image Segmentation, in: Descoteaux, M., Maier-Hein, L.,

Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (Eds.), Medical Image Computing and

Computer Assisted Intervention − MICCAI 2017, Lecture Notes in Computer Science. Springer

International Publishing, pp. 399–407.

Yang, L., Zhang, Y., Zhao, Z., Zheng, H., Liang, P., Ying, M.T.C., Ahuja, A.T., Chen, D.Z., 2018. BoxNet:

Deep Learning Based Biomedical Image Segmentation Using Boxes Only Annotation.

arXiv:1806.00593 [cs].

Yi, X., Walia, E., Babyn, P., 2018. Generative Adversarial Network in Medical Imaging: A Review.

arXiv:1809.07294 [cs].

Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H., 2015. Understanding Neural Networks Through

Deep Visualization. arXiv:1506.06579 12.

Zeiler, M.D., Fergus, R., 2014. Visualizing and Understanding Convolutional Networks, in: Fleet, D.,

Pajdla, T., Schiele, B., Tuytelaars, T. (Eds.), Computer Vision – ECCV 2014, Lecture Notes in

Computer Science. Springer International Publishing, pp. 818–833.

Zhang, B., Zhao, Q., Feng, W., Lyu, S., 2018. AlphaMEX: A smarter global pooling method for

convolutional neural networks. Neurocomputing 321, 36–48.

https://doi.org/10.1016/j.neucom.2018.07.079

Zhang, Jianming, Bargal, S.A., Lin, Z., Brandt, J., Shen, X., Sclaroff, S., 2018. Top-Down Neural

Attention by Excitation Backprop. Int J Comput Vis 126, 1084–1102.

https://doi.org/10.1007/s11263-017-1059-x

Zhang, Jianpeng, Xia, Y., Xie, Y., Fulham, M., Feng, D.D., 2018. Classification of Medical Images in the

Biomedical Literature by Jointly Using Deep and Handcrafted Visual Features. IEEE J. Biomed.

Health Inform. 22, 1521–1530. https://doi.org/10.1109/JBHI.2017.2775662

Zhang, J., Xie, Y., Xia, Y., Shen, C., 2019. Attention Residual Learning for Skin Lesion Classification.

IEEE Trans. Med. Imaging 1–1. https://doi.org/10.1109/TMI.2019.2893944

Zhang, L., 2019. Transfer Adaptation Learning: A Decade Survey. arXiv:1903.04687 [cs].

Zhang, Yang, David, P., Gong, B., 2017. Curriculum Domain Adaptation for Semantic Segmentation of

Urban Scenes, in: 2017 IEEE International Conference on Computer Vision (ICCV). Presented

at the 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, Venice, pp.

2039–2049. https://doi.org/10.1109/ICCV.2017.223

Zhang, Yizhe, Yang, L., Chen, J., Fredericksen, M., Hughes, D.P., Chen, D.Z., 2017. Deep Adversarial

Networks for Biomedical Image Segmentation Utilizing Unannotated Images, in: Descoteaux,

M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (Eds.), Medical Image

Computing and Computer Assisted Intervention − MICCAI 2017. Springer International

Publishing, pp. 408–416.

Zhang, Y., Yang, L., MacKenzie, J.D., Ramachandran, R., Chen, D.Z., 2016a. A seeding-searching-

ensemble method for gland segmentation in H&E-stained images. BMC Medical Informatics

and Decision Making 16, 80. https://doi.org/10.1186/s12911-016-0312-5

Zhang, Y., Ying, M.T.C., Yang, L., Ahuja, A.T., Chen, D.Z., 2016b. Coarse-to-Fine Stacked Fully

Convolutional Nets for lymph node segmentation in ultrasound images, in: 2016 IEEE

International Conference on Bioinformatics and Biomedicine (BIBM). Presented at the 2016

IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 443–448.

https://doi.org/10.1109/BIBM.2016.7822557

Zhao, A., Balakrishnan, G., Durand, F., Guttag, J.V., Dalca, A.V., 2019. Data Augmentation Using

Learned Transformations for One-Shot Medical Image Segmentation, in: Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition. Presented at the Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8543–8553.

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A., 2016. Learning Deep Features for

Discriminative Localization, in: 2016 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR). Presented at the 2016 IEEE Conference on Computer Vision and Pattern

Recognition (CVPR), IEEE, Las Vegas, NV, USA, pp. 2921–2929.

https://doi.org/10.1109/CVPR.2016.319

Zhou, Z., Shin, J., Feng, R., Hurst, R.T., Kendall, C.B., Liang, J., 2019. Integrating Active Learning and

Transfer Learning for Carotid Intima-Media Thickness Video Interpretation. J Digit Imaging 32,

290–299. https://doi.org/10.1007/s10278-018-0143-2

Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M., Liang, J., 2017. Fine-Tuning Convolutional Neural

Networks for Biomedical Image Analysis: Actively and Incrementally, in: 2017 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR). Presented at the 2017 IEEE

Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, Honolulu, HI, pp.

4761–4772. https://doi.org/10.1109/CVPR.2017.506

Zhou, Z.-H., 2018. A brief introduction to weakly supervised learning. National Science Review 5, 44–

53. https://doi.org/10.1093/nsr/nwx106