COMPARISON BETWEEN MACHINE LEARNING AND DEEP …

FINAL THESIS

Bachelor's degree in Biomedical Engineering

COMPARISON BETWEEN MACHINE LEARNING AND DEEP

LEARNING FOR THE CLASSIFICATION OF MAMMOGRAMS

IN BI-RADS

Report and Annexes

Author: Ignacio Moragues Rodríguez Director: Christian Mata Miquel Co-Director: Raul Benítez Iglesias Call: June 2021

i

Resum

Tal com apunten les estadístiques, el càncer de mama es un problema de salut greu que suposa una

considerable càrrega econòmica a l'hora de dur a terme el seu tractament, de manera que es

justifica, indubtablement, la necessitat de realitzar un cribratge d'aquesta malaltia. No obstant això,

en l'actualitat, la forma en que es realitza el diagnòstic en la pràctica clínica es propensa a errors.

D'aquesta manera, sorgeix la necessitat de buscar una eina que ajudi als professionals a classificar

les mamografies en les quatre categories de BI-RADS.

En aquest projecte es presenten dos enfocament: un de machine learning i un de deep learning.

Principalment, mes enllà de la comparació dels resultats, el que es pretén es analitzar i desgranar

en profunditat el procés seguit per aconseguir els seus respectius desenvolupaments i posterior

implementació. Així, es mostren les dificultats i desavantatges trobats a l’hora que s'avaluen i

comparen els dos models. Per a això, s'utilitzen tres bases de dades de mamografies que experts ja

han classificat seguint les pautes de BI-RADS.

Pel model de machine learning, es desenvolupen i utilitzen algoritmes que extreuen

característiques de textura de les mamografies. L’àrea densa de la mama es segmenta utilitzant la

informació obtinguda de textura i Fuzzy C-means (una tecnica de soft clustering sense supervisió).

A continuació, les àrees denses segmentades de la mama es classifiquen, utilitzant les

característiques prèviament obtingudes i seleccionades, amb l'ajuda de l'algoritme dels k-nearest

neighbors (k-NN). En aquest estudi s'especifiquen, també, les estratègies de desenvolupament al

voltant de les possibilitats que no s'han implementat en la seva totalitat explicant els motius que

van determinar la seva (parcial) exclusió. En canvi, pel model deep learning, atès que la base de

dades de les mamografies era insuficient per a l'entrenament adequat del model, s'utilitzen

tècniques de data augmentation. S'avaluen i entrenen, així, diferents arquitectures de xarxes

neuronals convolucionals (CNN).

Finalment, es presenten els resultats obtinguts i es proposa una discussió exhaustiva dels resultats,

demostrant que el model de machine learning requereix d’un gran esforç i expertesa per obtenir

uns resultats acceptables, mentre que el de deep learning mostra una precisió molt major i, per la

seva facilitat d’implementació, pot considerar-se com una eina clau per futurs treballs o

investigacions en aquesta matèria.

Report

ii

Resumen

Tal y como apuntan las estadísticas, el cáncer de mama es un problema de salud grave que supone una

considerable carga económica a la hora de llevar a cabo su tratamiento, por lo que se justifica,

indudablemente, la necesidad de realizar un cribado de esta enfermedad. Sin embargo, en la

actualidad, la forma en que se realiza el diagnóstico en la práctica clínica es propensa a errores. De este

modo, surge la necesidad de buscar una herramienta que ayude a los profesionales a clasificar las

mamografías en las cuatro categorías de BI-RADS.

En este proyecto se presentan dos enfoques: uno de machine learning y otro de deep learning.

Principalmente, más allá de la comparación de los resultados, lo que se pretende es analizar y

desgranar en profundidad el proceso seguido para conseguir sus respectivos desarrollos y posterior

implementación. Así, se muestran las dificultades y desventajas encontradas a la hora de evaluar y

comparar los dos modelos. Para ello, se utilizan tres bases de datos de mamografías que expertos ya

han clasificado siguiendo las pautas de BI-RADS.

En el caso del modelo de machine learning, se desarrollan y utilizan algoritmos que extraen

características de textura de las mamografías. El área densa de la mama se segmenta utilizando la

información obtenida de textura y Fuzzy C-means (una técnica de soft clustering sin supervisión). A

continuación, las áreas densas segmentadas de la mama se clasifican, utilizando las características

previamente obtenidas y seleccionadas, con la ayuda del algoritmo de k-nearest neighbors (k-NN). En

este estudio se especifican, también, las estrategias de desarrollo en torno a las posibilidades que no

se han implementado en su totalidad explicando los motivos que determinaron su (parcial) exclusión.

En cambio, en el modelo de deep learning, dado que la base de datos de las mamografías era

insuficiente para el entrenamiento adecuado del modelo, se utilizan técnicas de data augmentation.

Se evalúan y entrenan, así, diferentes arquitecturas de redes neuronales convolucionales (CNN).

Finalmente, se presentan los resultados obtenidos y se plantea una discusión exhaustiva de los

resultados, demostrando que el modelo de machine learning requiere de un gran esfuerzo y

experiencia para obtener unos resultados aceptables, mientras que el de deep learning muestra una

precisión mucho mayor y, debido a su fácil implementación, puede considerarse como una

herramienta clave para futuros trabajos o investigaciones en esta materia.

iii

Abstract

Epidemiological statistics portray the fact that breast cancer is a significant health concern and

economic burden, undoubtedly justifying the need for breast cancer screening. Nevertheless, how the

current diagnosis is made in clinical practice is prone to errors. Hence, there is a necessity for a tool to

assist physicians when classifying mammographies into the four categories of BI-RADS.

In this project, two approaches are presented: one based on machine learning and the other one based

on deep learning. Mainly, beyond the comparison of the results, what is intended is to analyze and

discuss in-depth the process followed to achieve their respective developments and subsequent

implementation. Thus, the difficulties and drawbacks found when evaluating and comparing the two

models are shown. Consequently, three mammography databases are used that experts have already

classified following the BI-RADS guidelines.

In the case of the machine learning model, algorithms that extract texture features from mammograms

are developed and used. The dense area of the breast is segmented, with the information obtained

from texture, using Fuzzy C-means (an unsupervised soft clustering technique). Subsequently, a feature

selection process was carried out. The classification of the dense areas was performed using a k-

nearest neighbors algorithm (k-NN). The development strategy around other possibilities that were

not fully implemented is also explained, with reference to the motives behind these decisions. On the

other hand, in the deep learning model, the mammogram database was insufficient for the adequate

training of the model. Hence, data augmentation techniques are used. Different convolutional neural

network (CNN) architectures were assessed and trained.

Finally, the results obtained are presented and an exhaustive discussion is performed, demonstrating

that the machine learning model requires great effort and experience to obtain acceptable results. In

contrast, the deep learning model shows a much higher accuracy and can be considered as key for

future work or research in this area.

Report

iv

Acknowledgments I would first like to thank my supervisor, Christian Mata Miquel, for his guidance and advice. Your

insightful feedback pushed me to sharpen my thinking. Thank you for allowing me to work on this

project.

I would like to acknowledge the support given in the deep learning implementation by the Grupo de

Investigación de Modelos de Arendizaje Computacional from the Técnologico de Monterey.

In addition, I would like to thank my family and workmates for being always there for me even though

they don't understand anything. I am also grateful for the support of my roommates. Finally, I want to

thank my life partner for his daily love and support in accompanying me on this journey.

v

Glossary

CNN: Convolutional neural network.

DL: Deep learning.

FC: Fully connected neural network.

FCM: Fuzzy C-Means.

GLCM: Grey level co-occurrence matrix.

k-NN: k-nearest neighbors algorithm, also known as KNN.

LAWS: Texture energy measures based on masks.

LBP: Local binary patterns.

ML: Machine learning.

ReLU: Rectified linear unit.

ROI: Region of interest.

SDG: Stochastic gradient descent.

SVM: Support vector machines.

Report

vi

Table of contents

RESUM ______________________________________________________________ I

RESUMEN __________________________________________________________ II

ABSTRACT __________________________________________________________ III

ACKNOWLEDGMENTS ________________________________________________ IV

GLOSSARY __________________________________________________________ V

1. INTRODUCTION _________________________________________________ 7

1.1. Cancer today ............................................................................................................ 7

1.2. Breast cancer ........................................................................................................... 7

1.3. The mammography ................................................................................................. 8

1.4. Breast Imaging Reporting and Data System (BI-RADS®) ....................................... 10

1.5. Origin of this project and motivation .................................................................... 11

1.6. Objectives .............................................................................................................. 11

2. STATE OF THE ART ______________________________________________ 13

3. PROJECT FRAMEWORK __________________________________________ 17

3.1. Texture ................................................................................................................... 17

3.1.1. Grey co-occurrence level matrix (GCLM) ............................................................. 18

3.1.2. Law's masks (LAWS) .............................................................................................. 21

3.1.3. Local binary patterns (LBP) ................................................................................... 28

3.2. Machine learning approach ................................................................................... 30

3.3. Deep learning approach ........................................................................................ 33

4. METHODOLOGY AND IMPLEMENTATION ___________________________ 39

4.1. Materials and preprocessing ................................................................................. 39

4.2. Machine learning implementation ........................................................................ 41

4.2.1. Extraction of GLCM features ................................................................................. 41

4.2.2. Extraction of LAWS features ................................................................................. 46

4.2.3. Extraction of LBP features ..................................................................................... 51

4.2.4. Creating the feature dataset ................................................................................. 53

4.2.5. Dense tissue segmentation ................................................................................... 56

4.2.6. Classification.......................................................................................................... 59

4.3. Deep learning implementation ............................................................................. 64

vii

4.3.1. VGG-16 .................................................................................................................. 64

4.3.2. Data augmentation............................................................................................... 66

4.3.3. Training and Learning curves ............................................................................... 67

4.3.4. Interpreting the model performance ................................................................... 69

5. DISCUSSION ___________________________________________________ 71

6. ENVIRONMENTAL IMPACT _______________________________________ 75

CONCLUSIONS ______________________________________________________ 77

BUDGET ___________________________________________________________ 79

Personnel cost .................................................................................................................. 79

Materials cost ................................................................................................................... 80

BIBLIOGRAPHY _____________________________________________________ 81

ANNEX A __________________________________________________________ 89

ANNEX B __________________________________________________________ 91

1

List of Figures

Figure 1.1. Human breast anatomy [6]. ______________________________________________ 8

Figure 1.2. Representation of a mammography [3]. ____________________________________ 9

Figure 1.3. Tasks planning. _______________________________________________________ 12

Figure 3.1. Mathematical representation of a digital image. _____________________________ 17

Figure 3.2. Example of the obtention of the grey co-occurrence matrix. ___________________ 19

Figure 3.3. Possible combination for the outer product. ________________________________ 21

Figure 3.4. Sample picture of the EEBE's Building C. ___________________________________ 24

Figure 3.5. Representation of a convolution [45]. _____________________________________ 25

Figure 3.6. 𝐼𝐸5𝐿5 and 𝐼𝐿5𝐸5with gray colormap. ____________________________________ 25

Figure 3.7. 𝐼𝐸5𝐿5 and 𝐼𝐿5𝐸5with multicolor colormap. _______________________________ 26

Figure 3.8. Average of 𝐼𝐸5𝐿5 and 𝐼𝐿5𝐸5. ___________________________________________ 26

Figure 3.9. Local variance and mean from the image in Figure 3.8. _______________________ 27

Figure 3.10. Local absolute mean extracted from the image in Figure 3.8. __________________ 28

Figure 3.11. An example of how does LBP works. _____________________________________ 29

Figure 3.12. LBP examples using different radius and number of neighbors [50]. ____________ 29

Figure 3.13. LBP using a radius of 1 and 8 neighbors. __________________________________ 30

Figure 3.14. A visual example of unsupervised (left) and supervised (right) machine learning. __ 31

Figure 3.15. Predicting a new point. _______________________________________________ 31

Page. 2 Report

2

Figure 3.16. Example of infinite clusterization. _______________________________________ 32

Figure 3.17. Machine learning vs. deep learning. _____________________________________ 33

Figure 3.18. Diagram of a neuron. ________________________________________________ 34

Figure 3.19. Deep neural network [55]. ____________________________________________ 34

Figure 3.20. The rectifier function. ________________________________________________ 35

Figure 3.21. A rectified linear unit. ________________________________________________ 35

Figure 3.22. Fully connected neural network. _______________________________________ 36

Figure 3.23. Learning curves. ____________________________________________________ 37

Figure 4.1. Raw mammogram. ___________________________________________________ 40

Figure 4.2. Breast profile segmentation of two mammograms using the algorithm of [67]. ____ 40

Figure 4.3. Steps followed by GLCM_extractor.py. ____________________________________ 41

Figure 4.4. Extraction of the statistical features of the first pixel from the GLCM. ____________ 42

Figure 4.5. Last step of the GLCM_extractor.py. ______________________________________ 42

Figure 4.6. GLCM feature reduction (homogeneity). __________________________________ 45

Figure 4.7. Diagram of the feature extraction using LAWS_extractor.py (Part 1). ____________ 46

Figure 4.8. Texture image from 𝑅5𝑅5 and its histogram. ______________________________ 47

Figure 4.9. Use of a colormap to improve visualization of 𝐼𝑅5𝑅5. ________________________ 47

Figure 4.10. Last step of the LAWS_extractor.py. _____________________________________ 48

Figure 4.11. Extraction of the features using LAWS_extractor.py (Part 2). __________________ 49

3

Figure 4.12. Texture images obtained using a 15x15 window. ___________________________ 50

Figure 4.13. Steps followed by GLCM_extractor.py. ___________________________________ 51

Figure 4.14. Combination of the LBP features with different parameters. __________________ 52

Figure 4.15. Binning process of the texture images. ___________________________________ 54

Figure 4.16. Conceptual dataset. __________________________________________________ 55

Figure 4.17. Example of a selection of an ROI [29]. ____________________________________ 56

Figure 4.18. Segmentation examples through FCM. ___________________________________ 57

Figure 4.19. Result of the segmentation test with all the features. ________________________ 57

Figure 4.20. Examples of features discarded. ________________________________________ 58

Figure 4.21. Segmented artifacts. _________________________________________________ 58

Figure 4.22. Data division sizes. ___________________________________________________ 59

Figure 4.23. Heatmap of the correlation.____________________________________________ 60

Figure 4.24. Classification process. ________________________________________________ 64

Figure 4.25. The architecture of VGG-16 [83]. ________________________________________ 65

Figure 4.26. Example of data augmentation. _________________________________________ 66

Figure 4.27. AUG1 Loss vs. Epoch. _________________________________________________ 68



Figure 4.30. Confusion matrix for training dataset (test 1, 25 epochs). Image and Grad-CAM. __ 70

Page. 4 Report

4

Figure 5.1. Confusion matrix of the ML approach. ____________________________________ 71

Figure 5.2. Binned confusion matrix of the ML approach. ______________________________ 72

Figure 5.3. Confusion matrix of the DL approach (AUG3). ______________________________ 73

Figure 5.4. Binned Confusion Matrix of the DL approach (AUG3). ________________________ 74


Figure 0.2. Binned Confusion matrix of the DL approach (AUG1). ________________________ 91


Figure 0.4. Binned Confusion matrix of the DL approach (AUG2). ________________________ 92

5

List of tables

Table 1.1. BI-RADS categories [13]. ________________________________________________ 10

Table 2.1. Overview of the machine learning literature. ________________________________ 13

Table 2.2. Overview of the deep learning literature. ___________________________________ 14

Table 4.1. Dataset composition. __________________________________________________ 39

Table 4.2. Parameters used to extract the features on each image. _______________________ 43

Table 4.3. Breakdown of features extracted. _________________________________________ 53

Table 4.4. Breakdown of features extracted after combination. __________________________ 53

Table 4.5. Main data frame.______________________________________________________ 54

Table 4.6. Features with high correlation. ___________________________________________ 60

Table 4.7. Classified pixels of each image. ___________________________________________ 61

Table 4.8. Final classification._____________________________________________________ 63

Table 4.9. Data augmentation methods for each test. _________________________________ 67

Table 0.1. Cost for the personnel work. _____________________________________________ 79

Table 0.2. Cost for the materials used. _____________________________________________ 80

Page. 6 Report

6

List of Equations

Eq. 3.1 ______________________________________________________________________ 19

Eq. 3.2 _____________________________________________________________________ 20

Eq. 3.3 _____________________________________________________________________ 20

Eq. 3.4 _____________________________________________________________________ 20

Eq. 3.5 _____________________________________________________________________ 20

Eq. 3.6 _____________________________________________________________________ 20

Eq. 3.7 _____________________________________________________________________ 20

Eq. 3.8 _____________________________________________________________________ 28

Eq. 3.9 _____________________________________________________________________ 28

Eq. 3.10 ____________________________________________________________________ 30

Eq. 4.1 _____________________________________________________________________ 42

Eq. 4.2 _____________________________________________________________________ 50

Eq. 4.3 _____________________________________________________________________ 52

Eq. 4.4 _____________________________________________________________________ 55

Eq. 4.5 _____________________________________________________________________ 62

Eq. 6.1 _____________________________________________________________________ 75

Eq. 6.2 _____________________________________________________________________ 75

7

1. Introduction

1.1. Cancer today

Worldwide, more than 19 million new cancer cases and almost 10 million cancer deaths occurred in

2020. Female breast cancer has surpassed lung cancer as the most frequently diagnosed cancer, with

2.3 million new cases (11.7%), followed by lung (11.4%), colorectal (10.0 %), prostate (7.3%), and

stomach (5.6%) cancers. In women, breast cancer is the most diagnosed cancer and the leading cause

of cancer death [1].

It has been revealed that thanks to the breast cancer screening performed in the current clinical

practice, mortality from this disease has significantly decreased when done at the age of over 50, which

is the one with the highest incidence [2]. In medicine, screening is looking for signs of a disease, such

as breast cancer, before somebody has signs of it. The goal line of screening tests is to find cancer at

an early phase when it can be treated and might be cured. Occasionally, a screening test finds cancer

that is very small or very slow-growing [3]. These cancers are not likely to cause illness or death during

a person's lifetime. Therefore, screening often leads to overdiagnosis. Essentially, overdiagnosis is

turning healthy women into patients that otherwise might not have become clinically apparent.

Overdiagnosed cancers remain asymptomatic throughout a woman's life [4]. However, screening is

needed and essential since breast cancer is a significant public health concern with considerable

medical and economic burden.

In Spain, it is estimated that early detection of cancer could reduce the total costs by around 9,000

million euros. Furthermore, on average, metastatic breast cancer costs almost 4 times more than

cancer detected in an early stage. The expenses of metastatic breast cancer can exceed 200,000 euros

per patient [2]. In metastasis, cancer cells break away from where they first formed (primary cancer),

travel through the blood or lymph system, and develop new tumors (metastatic tumors) in other parts

of the body. Many cancer deaths begin when cancer moves from the original tumor and spreads to

other tissues and organs, colonizing them [5].

1.2. Breast cancer

Breast cancer is a common disease in which cells in the breast begin to multiply and grow

uncontrollably. A breast has three main parts: lobules, ducts, and connective tissue. Ducts have the

Report

8

function of collecting and transporting milk, which is produced by the lobules. Altogether is surrounded

and held by connective tissue made primarily by fibrous and fatty tissue:

Figure 1.1. Human breast anatomy [6].

There exist different kinds of cancer depending on which cells in the breast have become cancerous.

In most cases, cancer begins in the lobules or ducts. These cancerous cells can spread through blood

and lymph vessels to other parts of the body (metastasis) [6].

As already mentioned, screening mammography is necessary because it has demonstrated that it

significantly reduces breast cancer mortality [7] and positively affects the economic cost of healthcare.

Nevertheless, after 30 years of mammography screening, advanced and metastatic breast cancer

incidence rates have remained stable [8].

1.3. The mammography

During a radiographic test, an X-ray beam passes through a body part. The internal structures absorb

these X-rays at different rates, and the remaining X-ray pattern hits a detector. The recording of this

radiation can be done using film that reacts and is sensitive to X-rays or using electronic sensors.

A mammogram or mammography is essentially an X-ray image of the breast. They are an advantageous

technique for looking for early signs of breast cancer before it can be felt (up to 3 years) [9]. Hence, it

is a crucial test in breast cancer screening.

9

In a mammogram, the breast is pressed between two plates. Then, X-rays are used to take pictures of

breast tissue, as can be seen in Figure 1.2:

Figure 1.2. Representation of a mammography [3].

Few handicaps affect the variability of the resulting mammographic picture. For example, there is much

variation in the breast's glandularity, which disturbs the radiographic density and appearance of the

mammogram. In general, breast glandularity decreases with increasing breast sizes, but again there

can be significant differences. The breast has to be compressed during mammography (Figure 1.2),

and the compressed thickness may vary from 20 mm to more than 110 mm. This variation in breast

composition and thickness leads to a significant challenge to the X-ray imaging system, which must

achieve adequate quality at a low dose for a wide range of conditions. Breast abnormalities may appear

on the mammogram as a soft tissue lesion that may be rounded or spiculated. However, sometimes

the only sign of an anomaly is one or more calcifications or distortion in the breast architecture.

Calcifications are crumbs of calcium hydroxyapatite or phosphate, ranging from extremely small to

several millimeters. It is considered desirable to detect calcifications as small as 100 µm, which presents

a significant challenge to the imaging system [10].

Report

10

Human readers evaluate screening mammograms. The reading process is monotonous, tiring, lengthy,

costly, and, most importantly, prone to errors. Multiple studies have shown that up to 30% of

diagnosed cancers could be found retrospectively on the previous negative screening exam by blinded

reviewers [11].

1.4. Breast Imaging Reporting and Data System (BI-RADS®)

Aiming to reduce the discordance in interpreting mammographic findings and homogenizing the terms

for characterization and reporting in a standardized way, the American College of Radiology published,

in 1993, the Breast Imaging Reporting and Data (BI-RADS®) [12].

This structured system aims to achieve consistency and reliability between different reports and

facilitates clear communication between the radiologist and other medical professionals by providing

a lexicon of descriptors. It is a reporting structure that relates assessment categories to management

recommendations and a framework for data collection and auditing. The BI-RADS lexicon classifies

breast imaging findings into different types:

Table 1.1. BI-RADS categories [13].

BI-RADS 1

No finding is present in the imaging modality (not even a benign finding).

Symmetrical and no masses, no architectural distortion, nor suspicious

calcifications

BI-RADS 2

A finding in this category has a 100% chance of being benign. Even though

BI-RADS 1 and BI-RADS 2 represent an essentially zero probability of being

malignant. BI-RADS 1 is used when the breast is unremarkable; BI-RADS 2

is used when the radiologist wants to highlight a benign finding.

BI-RADS 3 A finding is probably benign, with a shallow risk of malignancy between 0%

and 2%. The density of the breast is higher than the previous categories.

BI-RADS 4

Suspicious abnormality. Lesions may not have the typical morphology of

breast cancer. However, there is a high chance of malignancy. In these

cases, a biopsy is recommended. The breast is very dense.

11

There are more categories apart from the ones in Table 1.1. For instance, 0 indicates that additional

mammograms should be taken since no conclusions can be extracted (moved or wrong taken). BI-RADS

category 5 indicates a higher chance of malignancy, and BI-RADS category 6 represents a biopsy-proven

malignancy. Hence, only the four levels in the table above are considered in this project.

1.5. Origin of this project and motivation

The origin of this project can be traced back to previous work from the MSc thesis of Christian Mata

[14], the project's supervisor. Hence, this study pretends to be a continuity and improvement of his job

with an updated literature review and an autonomous learning voyage from my side. It should be

mentioned that only one general subject in the Bachelor's degree in Biomedical Engineering introduces

the Python language [14], and only one specific subject presents image processing.

The previous project was developed on Matlab [15] and only used machine learning techniques to

classify mammograms in the BI-RADS categories using texture descriptors. According to the further

works related to the previous version, the development requires implementing new strategies,

improving the performance, and optimize the computational time. For this reason, an exhaustive study

of previously published works will be depicted in section 2. It is a crucial step to construct and justify

our purpose and methodology developed in this project.

My motivation to continue this project increased after reading the previous works and considering the

suggestions; I accepted to do my final bachelor’s project in this research line. It is important to remark

that even though I had very little knowledge and experience in this topic before starting the project, I

desired to delve deeper into it as I consider it the key to advance and progress in healthcare. The main

objectives and planning schedule chart are detailed in the following section.

1.6. Objectives

The main objective of this BSc thesis is to assess the most suitable way, between machine learning or

deep learning, to classify digital mammograms within the 4 categories of the BI-RADS scale shown in

Table 1.1. The motivation of this objective is born under the following facts, already introduced in

section 1.1:

• It is the most diagnosed cancer worldwide.

• It is the first cause of cancer death in women.

• Its early and accurate diagnosis would save many lives and reduce healthcare expenses.

• Up to 30% of diagnosed breast cancers could be found retrospectively on the previous

negative screening exam by blinded reviewers.

Report

12

Therefore, in summary, breast cancer is a substantial public health concern with a significant medical

and economic burden. Furthermore, as already stated, the current diagnosis is error-prone. Thus, a

justified need to develop a computer-aided diagnosis (CAD) system capable of assisting physicians in

mammograms' classification exists. This project pretends to find the best pipeline to follow for

developing this CAD tool. Therefore, the steps and aims of the herein project are itemized as follows:

• Review state of the art on mammogram classifications.

• Conduct a literature review to obtain a basis and fundamentals on the topic and classification

methods in the literature to study them.

• Implement all codes in Python.

• Compare the machine learning and deep learning approaches.

• Discuss and find further improvements connected to this research work.

The following figures aim to draw a roadmap of the project to organize the time and tasks that need

to be treated. It is the final version as it was modified during the project:

February March April May June

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19

First meetings with the tutor

Planification

Introduction to the topic

Delve deeper into Python

Bibliographic research

Implementation of the texture extractors

Feature extraction

Store information in the dataset

Dense segmentation

Model selection and classification

Meeting: revision and improvements

Writing

First contact with deep learning

Literature review

Meeting with ITM

Implementation of DL to data set

Results obtention

Discussion

Writing

Final review

Figure 1.3. Tasks planning.

13

2. State of the art

During the last years, different approaches have been proposed to deal with the classification of

radiological breast images. These works exploit both machine learning and deep learning techniques.

However, not many of them attempt to classify in the BI-RADS scale. In fact, the vast majority of works

are focused on finding regions of interest, the likelihood of being cancerous, or simply whether

mammograms are malignant or not. Both machine learning and deep learning are extensively

described in sections 3 and 4.

Many projects based on machine learning use texture (further explained in section 3.1) as a feature

extraction method. Having features that describe the data is crucial for classification since it is the input

the different machine learning models require. Table 2.1 gathers the previous works that have

addressed classifying mammograms using texture and machine learning. The most employed texture

algorithms are LBP, GLCM, and LAWS:

Table 2.1. Overview of the machine learning literature.

Reference LBP GLCM LAWS

Pereira et al. 2014 [16] ✔

Mata et al. 2008 [17] ✔ ✔ ✔

Rabidas et al. 2016 [18] ✔

Sonar et al. 2018 [19] ✔

Mohanty et al. 2011 [20] ✔

Sadad et al. 2018 [21] ✔ ✔

Gardezi et al. 2015 [22] ✔

Phadke et al. 2016 [23] ✔ ✔

Wang et al. 2017 [24] ✔

Manduca et al. 2009 [25] ✔

Nithya et al. 2017 [26] ✔ ✔

Kriti et al. 2015 [27] ✔

Farhan et al. 2020 [28] ✔ ✔

Report

14

On the other hand, even though it is not mandatory to manually extract features prior to classification

in deep learning, it has to be mentioned that texture extraction steps are sometimes included in the

pipeline of some studies (Setiawan et al. 2015 [29] and Gastounioti et al. 2018 [30]). These examples

and the ones depicted in Table 2.1 establish that texture is useful and currently employed in deep

learning and, especially, in machine learning. Nevertheless, in projects where deep learning is

implemented, the images are directly used to train the models in most cases. Table 2.2 gathers the

previous works that have addressed classifying mammograms using deep learning:

Table 2.2. Overview of the deep learning literature.

Reference Approach

Setiawan et al. 2015 [29] Mammogram classification using Law's texture

energy measure.

Gastounioti et al. 2018 [30] Breast patterns finder associated with breast

cancer risk.

Jadoon et al. 2017 [31] Three-class mammogram classification based on

descriptive CNN features.

Arora et al. 2020 [32] Benign and malignant classification.

Altan et al. 2020 [33] Three-class mammogram classification.

Suh et al. 2020 [34] Cancer detection in mammograms of various

densities.

Shen et al. 2019 [35] Classification of patches in benign or malignant

calcification or masses.

Mohamed et al. 2018 [36] Breast density three-class mammogram classifier.

Wang et al. 2016 [37] Identifying metastatic breast cancer.

Adedigba et al. 2019 [38] Deep learning-based classifier for small dataset.

15

As seen in Table 2.1 and Table 2.2, there is a wide variety of studies in the field of mammogram

classification. However, they rely on either machine learning or deep learning. Hence, even though

some utilize deep learning for feature extraction and the final classification is done through machine

learning [31], no comparative revisions between both for this specific topic are found.

Therefore, the novelty and uniqueness of this project is the direct comparison between two different

approaches of machine learning and deep learning for mammogram classification in BI-RADS. After this

revision, the use of texture techniques such as GLCM, Law's masks, and LBP will be implemented

following and improving the steps of previous work [39] using Python. Moreover, the use of deep

learning algorithms could improve the traditional machine learning approaches. It is important to

remark that different approaches exist based on deep learning. In this sense, some of them will be

explored in order to choose the one that will eventually be implemented using our mammography

database. The strategies and the methodology used in this project will be explained in the following

sections.

17

3. Project framework

Different methodologies have been explored and addressed. Some of them are mentioned and

justified in this section, especially in section 4. However, for convenience, only the finalized versions of

the two developed approaches are fully described. The development strategy around other

possibilities that were not fully implemented is also explained, with reference to the motives behind

these decisions. This section gives an overview of concepts that will be utilized in the methodology.

3.1. Texture

As seen in the state of the art and literature reviewed, texture analysis and extraction have significant

importance in computer vision and especially in machine learning. Therefore, texture feature

extraction is considered a fundamental step to classify mammograms in the machine learning

approach. Hence, continuing in the same line of the previous work introduced in section 1.5.

The sense of touch allows living organisms to perceive qualities of objects such as pressure,

temperature, texture, and hardness. The skin has different receptors that transform the stimuli into

information that the brain can interpret [40]. Therefore, tactile texture refers to the tangible feel of a

surface; humans can also visually identify textures. Thus, visual texture can be defined as seeing shapes

or contents in an object and associate them with a tactile texture. However, in the computer vision

domain, identifying textures can be complex and challenging. As it can be seen in the figure below

(Figure 3.1), the mathematical representation of a digital 2D image consists of a matrix array in which

each position of the matrix represents the value of the pixel intensity:

Figure 3.1. Mathematical representation of a digital image.

Report

18

In an 8-bit-grayscale standard image, the maximum value (white) is 255, while the minimum (black) is

0. In between, the remaining integers correspond to the shades of grey between black and white. The

concept is the same for color images except that each pixel has three components corresponding to

the red, green, and blue intensity.

Therefore, starting from the base that digital images are essentially matrix arrays, texture can be

defined, in image processing, as the spatial variation of the pixels' intensity. Texture analysis has shown

an important role in computer vision, such as object recognition, surface defect detection, pattern

recognition, and medical image analysis. Evaluating the intensity of the pixels' distribution and

dispersion characteristics such as smoothness, coarseness, and regularity in multiple directions can

help in the diagnose of certain diseases [41].

The texture detection methods are usually classified into four types: statistical methods, structural,

model-based, and transform-based methods. However, many methods that have been developed

cannot be classified in only one class since they are considered combinational methods [41].

The statistical methods perform a series of calculations on the lightness intensity distribution functions

of pixels. Two types of levels of statistical characteristics can be identified:

• First level: single pixel specification is calculated without taking into account the interaction

between the other pixels of the image

• Second and higher levels: The specification of a particular pixel is calculated considering two

or more pixels' dependence.

When classifying anything, two things must be known beforehand: the classes and the features that

will be extracted. For example, if we wanted to classify humans into two classes (healthy and

unhealthy), we could ask them details such as age, weight, height, and the number of times they have

been hospitalized in the last years. These are called features and are characteristics that describe each

human. These descriptors must be wisely chosen so that they can discriminate the data into the classes.

3.1.1. Grey co-occurrence level matrix (GCLM)

One of the oldest statistical methods for extracting texture features is the co-occurrence matrix

introduced by Haralick in 1973 [42]. The grey co-occurrence matrix (GLCM) of an image is created

based on the correlations between image pixels. Therefore, it is considered a second-level statistical

characteristic.

The GLCM is defined over an image as the distribution of co-occurring pixel values at a given offset.

The offset is a position operator that indicates the directions when computing the co-occurrence

matrix. For instance, an offset [2, 1] means looking at one pixel down and two pixels right on each step.

19

Moreover, if an image has p different pixel values, its co-occurrence matrix will be p x p for the given

offset [43]. In a standard 8-byte image, a 256 x 256 co-occurrence matrix is obtained.

The 𝐶Δx,Δy(𝑖, 𝑗) value of a GLCM gives the number of times in the image that 𝑖 and 𝑗 pixel values occur

in the relation conveyed by the offset. Therefore, depending on the offset, the matrix will change.

To sum up, the co-occurrence matrix can be parameterized as:

𝐶Δx,Δy(𝑖, 𝑗) = ∑ ∑ {

1, if 𝐼(𝑥, 𝑦) = 𝑖 and 𝐼(𝑥 + Δx, y + Δy) = j 0, otherwise

𝑚

𝑦=1

𝑛

𝑥=1

Eq. 3.1

(Δx, Δy) is the offset

𝐼(𝑥, 𝑦) is the pixel value at the position (𝑥, 𝑦)

The co-occurrence matrix can also be parametrized in terms of distance (𝑑) and angle (𝜃) instead of

offset, as seen in the example below (Figure 3.2). The co-occurrence matrix, in this case, is computed

with an angle of 0 degrees and a distance of 1:

Figure 3.2. Example of the obtention of the grey co-occurrence matrix.

Report

20

In the example above (Figure 3.2), an image of only 4 levels of intensity is presented. Hence, the

resulting co-occurrence matrix is a 4x4 matrix. As previously mentioned, a 256x256 matrix is obtained

in an 8-byte image.

How many times a pair of pixels repeat could seem to be unusable information. Nevertheless, it is

possible to extract valuable data through some statistical operations. These operations give the called

Haralick features [42]. These features and the co-occurrence matrix can easily be obtained using the

scikit-image open-source image processing library for Python [44]. The module "greycoprops" of this

library is based on the Haralick features. It extracts features of a given grey-level co-occurrence matrix

to serve as a compact summary of the matrix. The properties are computed as follows, where the

parameter 𝑃𝑖,𝑗 is grey-level co-occurrence:

1) Contrast = ∑ 𝑃𝑖,𝑗(𝑖 − 𝑗)2𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1

𝑖,𝑗=0 Eq. 3.2

2) Dissimilarity = ∑ 𝑃𝑖,𝑗|𝑖 − 𝑗|𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1

𝑖,𝑗=0 Eq. 3.3

3) Homogeneity = ∑𝑃𝑖,𝑗

1 + (𝑖 + 𝑗)2

𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1

𝑖,𝑗=0 Eq. 3.4

4) ASM = ∑ 𝑃𝑖,𝑗 2

𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1

𝑖,𝑗=0 Eq. 3.5

5) Energy = √𝐴𝑆𝑀 Eq. 3.6

6) Correlation

= ∑ 𝑃𝑖,𝑗

[ ( 𝑖 − 𝜇𝑖)( 𝑗 − 𝜇𝑗)

√(𝜎𝑖2)(𝜎𝑗

2)] 𝑔𝑟𝑒𝑦 𝑙𝑒𝑣𝑒𝑙𝑠−1

𝑖,𝑗=0

Eq. 3.7

where 𝜇𝑖 , 𝜇𝑗 , 𝜎𝑖 𝑎𝑛𝑑 𝜎𝑗 are the means and

standard deviations of 𝑃𝑖 and 𝑃𝑗

21

3.1.2. Law's masks (LAWS)

One of the most relevant techniques used for extracting information from the textures of an image is

the Laws' masks. These masks are a group of predefined kernels proven to extract relevant texture

features effectively and without a high computational cost since it is a simple convolution between the

image and the mask. The different types of Laws' masks are used for: level detection (L), edge detection

(E), spot detection (S), ripple detection (R), and wave detection (W). Each mask gives the user different

image data, making them more suitable depending on the application. The Laws' masks used in this

project are the followings:

L5 = [ 1 4 6 4 1]

E5 = [−1 −2 0 2 1]

S5 = [−1 0 2 0 −1]

R5 = [−1 −4 6 −4 1]

As mentioned, if the outer product is computed between them, 5x5 masks are obtained:

𝑳𝟓𝑬𝟓 = L5 ⊗ E5 = L5 𝐓 E5 =

[ 1 4 6 4 1

]

[−1 − 2 0 2 1 ] =

[ −1 −2 0 2 1−4 −8 0 8 4−6 −12 0 12 6−4 −8 0 8 4−1 −2 0 2 1

]

The possible combinations of the give a total of 16 masks:

𝑳𝟓 𝑬𝟓 𝑺𝟓 𝑹𝟓

𝑳𝟓 L5L5 E5L5 S5L5 R5L5

𝑬𝟓 L5E5 E5E5 S5E5 R5E5

𝑺𝟓 L5S5 E5S5 S5S5 R5R5

𝑹𝟓 L5R5 E5R5 S5R5 R5R5

Figure 3.3. Possible combination for the outer product.

Report

22

𝐿5𝐿5 =

[ 1 4 6 4 1 4 16 24 16 4 6 24 36 24 6 4 16 24 16 4 1 4 6 4 1 ]

𝐿5𝐸5 =

[ −1 −2 0 2 1 −4 −8 0 8 4 −6 −12 0 12 6 −4 −8 0 8 4 −1 −2 0 2 1 ]

𝐿5𝑆5 =

[ −1 0 2 0 −1 −4 0 8 0 −4 −6 0 12 0 −6 −4 0 8 0 −4 −1 0 2 0 −1 ]

𝐿5𝑅5 =

[ 1 −4 6 −4 1 4 −16 24 −16 4 6 −24 36 −24 6 4 −16 24 −16 4 1 −4 6 −4 1 ]

𝐸5𝐿5 =

[ −1 −4 −6 −4 −1 −2 −8 −12 −8 −2 0 0 0 0 0 2 8 12 8 2 1 4 6 4 1 ]

𝐸5𝐸5 =

[ 1 2 0 −2 −1 2 4 0 −4 −2 0 0 0 0 0 −2 −4 0 4 2 −1 −2 0 2 1 ]

𝐸5𝑆5 =

[ 1 0 −2 0 1 2 0 −4 0 2 0 0 0 0 0 −2 0 4 0 −2 −1 0 2 0 −1 ]

𝐸5𝑅5 =

[ −1 4 −6 4 −1 −2 8 −12 8 −2 0 0 0 0 0 2 −8 12 −8 2 1 −4 6 −4 1 ]

𝑆5𝐿5 =

[ −1 −4 −6 −4 −1 0 0 0 0 0 2 8 12 8 2 0 0 0 0 0 −1 −4 −6 −4 −1 ]

𝑆5𝐸5 =

[ 1 2 0 −2 −1 0 0 0 0 0 −2 −4 0 4 2 0 0 0 0 0 1 2 0 −2 −1 ]

23

𝑆5𝑆5 =

[ 1 0 −2 0 1 0 0 0 0 0 −2 0 4 0 −2 0 0 0 0 0 1 0 −2 0 1 ]

𝑆5𝑅5 =

[ −1 4 −6 4 −1 0 0 0 0 0 2 −8 12 −8 2 0 0 0 0 0 −1 4 −6 4 −1 ]

𝑅5𝐿5 =

[ 1 4 6 4 1 −4 −16 −24 −16 −4 6 24 36 24 6 −4 −16 −24 −16 −4 1 4 6 4 1 ]

𝑅5𝐸5 =

[ −1 −2 0 2 1 4 8 0 −8 −4 −6 −12 0 12 6 4 8 0 −8 −4 −1 −2 0 2 1 ]

𝑅5𝑆5 =

[ −1 0 2 0 −1 4 0 −8 0 4 −6 0 12 0 −6 4 0 −8 0 4 −1 0 2 0 −1 ]

𝑅5𝑅5 =

[ 1 −4 6 −4 1 −4 16 −24 16 −4 6 −24 36 −24 6 −4 16 −24 16 −4 1 −4 6 −4 1 ]

Except the 𝐿5𝐿5, the sum of all the values within each 2D mask is equal to zero. Hence, 𝐿5𝐿5 is

sometimes excluded and is not used when extracting texture information [29].

Another interesting fact about these masks is that, as commented before, they are very useful in

reacting to a particular image's pixel distribution. For instance, 𝐸5𝐿5 measures horizontal edge content

while 𝐿5𝐸5 can measure vertical edge contents. This fact can be intuited from how the masks are:

𝐿5𝐸5 =

[ −1 −2 0 2 1 −4 −8 0 8 4 −6 −12 0 12 6 −4 −8 0 8 4 −1 −2 0 2 1 ]

𝐸5𝐿5 =

[ −1 −4 −6 −4 −1 −2 −8 −12 −8 −2 0 0 0 0 0 2 8 12 8 2 1 4 6 4 1 ]

As a visual example, an image with very marked horizontal and vertical lines, such as the Building C

from the Barcelona East School of Engineering (EEBE), is convoluted with the two kernels above.

Report

24

Figure 3.4. Sample picture of the EEBE's Building C.

A convolution is simply the process of taking a small matrix (kernel or mask) and sliding it over all the

image’s pixels. For each position, thus each pixel, the mutually overlapping pixels product and their

sum are calculated.

Frequently, padding is added to the original image to avoid ending up with a smaller size image. The

result will be the value of the output pixel at that particular location as seen in the conceptual

representation in Figure 3.5.

25

Figure 3.5. Representation of a convolution [45].

The resulting convolutions of Figure 3.4 are IE5L5and IL5E5

:

Figure 3.6. 𝐼𝐸5𝐿5

and 𝐼𝐿5𝐸5with gray colormap.

In Figure 3.6, it can be seen that the horizontal and vertical parts of the image result, as expected,

present higher values when filtered by these masks.

Report

26

Since the human eye can perceive no more than 900 levels of gray [46], a colormap can be applied to

distinguish better:

Figure 3.7. 𝐼𝐸5𝐿5

and 𝐼𝐿5𝐸5with multicolor colormap.

The average of the two resulting images becomes the total edge content:

Figure 3.8. Average of 𝐼𝐸5𝐿5

and 𝐼𝐿5𝐸5.

27

The same idea is used with the resulting images. Hence, starting with one inputted image, 9 images are

obtained:

𝐼𝐿5𝐸5+ 𝐼𝐸5𝐿5

2

𝐼𝐿5𝑆5+ 𝐼𝑆5𝐿5

2

𝐼𝐿5𝑅5+ 𝐼𝑅5𝐿5

2

𝐼𝐸5𝑆5+ 𝐼𝑆5𝐸5

2

𝐼𝐸5𝑅5+ 𝐼𝑅5𝐸5

2

𝐼𝑅5𝑆5+ 𝐼𝑆5𝑅5

2

𝐼𝑆5𝑆5 𝐼𝐸5𝐸5

𝐼𝑅5𝑅5

As done with 3-channels color images (RGB), the 9 resulting images can be considered as a single image

with 9 texture features for each pixel.

Similar to the GLCM feature extraction method, a 𝑊size window is taken for each pixel. Statistical

features such as mean, absolute mean, and variance are extracted between the center pixel and the

neighbors in that window. These are the features used in classification. Their obtention is

computationally heavy. Find below the resulting features:

Figure 3.9. Local variance and mean from the image in Figure 3.8.

Report

28

Figure 3.10. Local absolute mean extracted from the image in Figure 3.8.

3.1.3. Local binary patterns (LBP)

Local binary patterns (LBP), first described in 1940 [4], are a visual descriptor used for classification in

computer vision. It describes the grayscale local texture of the image with low computational

complexity by detecting local patterns between adjacent pixels [47]. Its unique capability to ignore

uniform variations in the images (e.g., the lightning) and its low computational cost makes this

technique very advantageous in many applications, for example, in face detection [48] and medical

images [47].

The implementation of the algorithm is based, basically, on a sliding window that compares its center

value with neighbors' values in a specific radius or distance. The number of neighbors is also decided,

allowing interpolating the numbers between them [49]. For each comparison, if the centric value is

bigger or equal to the neighbor, 1 is assigned in that position; otherwise, a 0 is allocated. The following

expressions describe the LBP, where 𝑠(𝑥) is the threshold function, and 𝑔𝑐 and 𝑔𝑝 represent the grey-

scale value of the center pixel and the pth neighbor, respectively:

𝑠(𝑥) = {

1, if 𝑥 ≥ 0 0, otherwise

Eq. 3.8

𝐿𝐵𝑃 = ∑ 𝑠(𝑔𝑝 − 𝑔𝑐) · 2𝑝

𝑃−1

𝑝=0

Eq. 3.9

29

Afterward, all the ones and zeros are concatenated, creating a binary number that is converted into an

LBP value after converting it into a decimal value. For example, it is shown in the following illustration:

Figure 3.11. An example of how does LBP works.

As already mentioned, the radius and number of neighbors can also be chosen. In the diagram below,

the red dots represent the central pixel (𝑔𝑐) and the green dots represent the neighbors' pixels (𝑔𝑝):

Figure 3.12. LBP examples using different radius and number of neighbors [50].

When the LBP value is calculated, the window slides to the next position until the image is fully

processed, sometimes it is interesting to add padding to the picture before applying the LBP. Hence,

the resulting image is not smaller than the original one.

There are two types of patterns depending on the number of transitions between 0 and 1 and vice

versa. If there are two or fewer transitions, it is called a uniform pattern. Otherwise, it is a non-uniform

pattern. Here goes an example:

11111011 → Uniform (2 transitions)

10100011 → Non-uniform (4 transitions)

In practice, it is very infrequent to find non-uniform patterns. For that reason, there are 58 possible

LBP values defined for uniform patterns. For all the non-uniform patterns, there is only 1 LBP value

assigned to them. Hence, the possible combinations can be reduced from 256 to 59 characteristics in

an 8-bit grayscale image using this technique.

Report

30

Furthermore, it is possible to extract rotationally invariant Local binary patterns. This method

introduces the fact that the same patterns are not affected by the orientation and are invariant over

rotations. This is achieved by shifting the pattern until finding the minimum value, which will be the

LBP. With this approach, there are only 36 features in an 8-bit grayscale image. In addition to this,

grayscale invariance is also achievable if the center gray value is subtracted from the neighbors [51].

Therefore, reducing the number of possible local binary patterns by only looking at the intensity

variance and considering the central pixel as the offset. This idea is represented in the figure and

expression below:

Figure 3.13. LBP using a radius of 1 and 8 neighbors.

If the LBP grayscale invariance is searched in Figure 3.13, the expression used for that would be:

𝐿𝐵𝑃 = [𝑔0 − 𝑔𝐶 , 𝑔1 − 𝑔𝐶 , 𝑔2 − 𝑔𝐶 , 𝑔3 − 𝑔𝐶 , 𝑔4 − 𝑔𝐶 , 𝑔5 − 𝑔𝐶 , 𝑔6 − 𝑔𝐶 , 𝑔7 − 𝑔𝐶] Eq. 3.10

Therefore, LBP is a powerful extraction method that is not as dependent as others on pixel intensity or

rotation.

3.2. Machine learning approach

Machine learning is one of the applications of artificial intelligence (AI) that provides systems with the

ability, without being explicitly programmed, to improve and learn from experience. The learning

process begins by fitting a model with training data. Then, the model will learn from it and make

decisions or predictions of new data in the future based on the data provided. Machine learning is

widely used in various computing tasks where designing and programming explicit algorithms with

good performance is complex or unfeasible. For example, some of its applications include email

filtering, network intrusion detection, product recommendations, speech recognition, optical

character recognition (OCR), and, most importantly, computer vision and medical diagnosis [52].

Machine learning algorithms can be classified as supervised or unsupervised:

31

• Supervised machine learning uses what has been learned in the past to new data, predicting

events in the future. Thanks to a known training dataset, the algorithm can produce an inferred

function to make predictions. A model can provide targets for any new input after the training.

• On the other hand, unsupervised machine learning is used when the training dataset is neither

labeled nor classified. Unsupervised learning algorithms study how systems can infer a

function to describe a hidden structure from unlabeled data. Clustering is the most commonly

used unsupervised learning technique. Clustering refers to the process of automatically

grouping data points with similar characteristics and assigning them to "clusters".

To better explain these two types of machine learning, two conceptual plots of each class can be found

below. Each axis is any feature that describes a point (energy consumption, height, density, age, etc.):

Figure 3.14. A visual example of unsupervised (left) and supervised (right) machine learning.

In supervised learning, it is possible to check whether

data clusters or groups coincide with the actual class

of the data. Hence, in the right plot of Figure 3.14, it

can be seen that "A" and "B" are discriminable and do

not overlap. Therefore, when giving a new point to this

model, its class can be predicted by, for instance,

checking the class of its closest point in the training

dataset, as depicted in the right:

Figure 3.15. Predicting a new point.

Report

32

In Figure 3.15, using the closest neighbor criteria, the new point would be classified as "A". On the

contrary, in the left plot of Figure 3.14 (unsupervised learning), it is clear that there are two clusters;

nevertheless, some concerns might arise. For example, points on the Cluster 1 may be in Cluster 2

and vice versa. Hence, one handicap of unsupervised learning is to know if the features selected

can adequately discriminate the data as having two clusters does not necessarily mean that.

Moreover, the number of clusters should be known beforehand. Otherwise, criteria to choose the

number of clusters must be considered since data can be grouped in many ways. For example,

taking the example of unsupervised learning, the second cluster can be divided not only once but

many times:

Figure 3.16. Example of infinite clusterization.

Therefore, unsupervised learning is a powerful tool to separate and discriminate data by looking at its

features in two dimensions and many more. However, the main challenge is ensuring that it

distinguishes the data in the classes we are interested in in a specific case. Moreover, the fact of having

to know how many classes an unlabeled dataset has is relevant and might bring some limitations. As

an exemplification, in a dataset of unlabeled pictures of cats and dogs, there are only two classes (cats

and dogs), but it is only unknown which image is which. In that case, it is possible to extract features

from the pictures, and we could try to separate them into two clusters using an unsupervised machine

learning algorithm. On the other hand, if we had a dataset of animal pictures, we would need to know

how many different animals there are. Otherwise, we would not only have the problem of knowing if

the algorithm can discriminate the different classes with the features we have chosen, but also we

would not know how many classes or data types there are.

33

3.3. Deep learning approach

Deep learning evolves from machine learning to overcome the fact that the accuracy of most

conventional classification algorithms demands a solid feature engineering to work. Therefore,

requiring previous expert knowledge of the data and a challenging manual process to build descriptive

data features [53]. This fact is depicted in Figure 3.17. It can be seen that, in machine learning, a human

is needed to determine how and which features are going to be extracted as well as how they are going

to be classified. On the other hand, deep learning directly does these steps considerably reducing

human intervention:

Figure 3.17. Machine learning vs. deep learning.

Briefly, deep learning focuses on modeling high-level abstractions of information using computational

architectures that support multiple and iterative nonlinear transformations expressed in matrices or

tensors [54]. Thanks to their potential and scalability, neural networks have become the defining model

of deep learning. The fundamental unit of neural networks is a neuron. Each neuron individually

performs only a simple computation.

Report

34

Figure 3.18. Diagram of a neuron.

For instance, as exemplified in Figure 3.18, in a linear unit (neuron), the input 𝑥 is connected to the

neuron with a weight 𝑤. Everytime a value 𝑥 is driven through a connection it is multiplied by the

weight’s value. For the input 𝑥, what reaches the neuron is 𝑤 · 𝑥. A neural network is able to "learn"

by modifying its weights. Additionally, to allow the neuron to alter the output independently of its

inputs, a remarkable weight is introduced: The bias, represented by the 𝑏. It doesn't have any input

associated. Instead, a 1 is defined in the diagram. Hence, the value reaching the neuron is just 𝑏 (1 ·

𝑏 = 𝑏).

Neural networks are usually organized into layers of neurons. In addition, when they are located

between the input and output layer, they are called hidden layers since their outputs are never seen

directly:

Figure 3.19. Deep neural network [55].

35

However, the output of two or more layers with nothing in between is comparable to a single layer

[56]. Hence, nonlinear operations are needed, which are called activation functions. Essentially, they

are a function applied to each of the layer's outputs. The most common is the rectifier function

𝑚𝑎𝑥(0, input). The rectifier function has a graph with the negative part rectified to zero:

Figure 3.20. The rectifier function.

When this function (Figure 3.20) is applied to a neuron (linear unit), we get a rectified linear unit or

ReLU. Hence, it is common to call the rectifier function the ReLU function [57]. Thus, the neuron’s

output becomes:

Figure 3.21. A rectified linear unit.

Therefore, thanks to the nonlinearity that ReLu allows, neural networks can do complex data

transformations—making possible the regression or classification tasks [57]. For instance, a stack of

neurons forming a fully connected layer (FC) consists of weights, biases, and activation functions. It is

where the classification process begins to take place. These layers typically form the last convolutional

neural networks (CNN) layers before the output layer. An example is shown in Figure 3.22.

Report

36

Figure 3.22. Fully connected neural network.

Besides fully connected layers, there are other types: the convolutional layers and pooling layers. When

these three different types of layers are stacked, a convolutional neural network CNN architecture is

formed. The convolution layer is fundamentally the first layer and is used to extract features from the

input (usually images) by performing a convolution operation [58]. The output is named the feature

map, which feeds the following layers. Generally, a pooling layer follows a convolutional layer. The

principal aim of this layer is to decrease the size of the feature map to reduce computational costs. The

feature map can be reduced in many ways: by taking the maximum values of the feature map (max

pooling), averaging them (mean pooling), or summing (sum pooling) [59].

A neural network can be structured in many ways using different types of neurons and varying the

number of layers. Nevertheless, like machine learning algorithms, it is indispensable to have a training

dataset. Each training data sample consists of inputs with an expected target (the output). Thus,

training a neural network is adjusting the weights to transform the input into the expected output.

However, two more things are needed for a neural network to learn and predict new data: A "loss

function" and an "optimizer" .

During training, the model will use the loss function as a guide for finding the appropriate values of its

weights. The loss function measures the disparity between the target's actual value and the value

predicted. It is essentially supervised learning and will be the one used in this project. Therefore, the

loss function tells the network its objective. The lower the loss, the higher the accuracy.

37

On the other hand, the optimizer is an algorithm that adjusts the weights minimizing the loss. These

algorithms used in deep learning belong to a family called stochastic gradient descent (SGD) [60]. They

are iterative algorithms that train a network in steps repeatedly until the loss is optimal or does not

further decrease. Each iteration done per training data samples is called a minibatch. At the same time,

the number of epochs is how many times the network will see the entire training dataset. The learning

rate is a parameter used for the SGD that determines the step size at each iteration.

In order to assess a model's performance accurately, the model needs to be evaluated on a new set of

data called the validation dataset. For that, the learning curves can be evaluated:

Figure 3.23. Learning curves.

As depicted in the figure above, underfitting the training set is when the loss is not as low as possible

because the model has not learned enough. In contrast, the same phenomenon can happen if the

model is overfitted: learning from noise and giving higher weights to nonrelevant details that have an

undesirable impact on the performance. To overcome the fitting problem, an early stop can be

performed by interrupting the training (Figure 3.23). Actually, what is done is recording the weight

values over epochs and reset them back to where the minimum occurred. Another technique

commonly used to prevent overfitting is the dropout [57]. The idea is to randomly drop out parts of a

layer's input in every training step. Hence, the network ignores the nonrelevant patterns in the training

Report

38

data. Instead, it forces the neural network to search for general patterns, increasing the robustness

and avoiding overfitting.

The use of large training datasets has shown promising capability in many artificial intelligence

applications using deep learning [61] [62] and, more recently, in the biomedical imaging field—

comparable to and, in some cases, surpassing physicians' performance.

39

4. Methodology and implementation

In this section, the set of procedures used to achieve the final objective, which is classification, are

described. All the coding has been developed in Python language, and the ones related to the machine

learning approach can be found in the GITHUB repository [63]. The code regarding the deep learning

approach is not included since it was developed by the Grupo de Investigación de Modelos de

Arendizaje Computacional from the Técnologico de Monterey [64].

4.1. Materials and preprocessing

In the present work, digital mammograms already classified and labeled are used to train and assess

the approaches developed. The database consists of the following image types:

Table 4.1. Dataset composition.

BI-RADS 1 360

BI-RADS 2 335

BI-RADS 3 339

BI-RADS 4 241

Total: 1275

The databases comes from the Hospital Josep Trueta (Girona, Spain) [17] and the Mammographic

Image Analysis Society (MIAS) [65]. A set of experts has manually classified both databases following

the guidelines in the Breast Imaging Reporting and Data System (BI-RADS). Additionally, since the deep

learning approach required a higher number of mammograms, a third database was introduced in only

that approach: the Iberian breast cancer digital repository (BCDR) [66]. Specifically, 20 BI-RADS 2, 20

BI-RADS 3, and 19 BI-RADS 4, were added. It is further explained in section 4.3.2.

After reviewing state of the art, the final objective of this project is to discuss, compare and assess the

implementation of two different approaches for the classifications: one based on machine learning and

the second one based on deep learning.

The mammographies used in this project were already preprocessed [39]. In fact, two independent

steps were performed. The first one was a segmentation of the background and annotations from the

Report

40

whole breast area, while the second one involved separating the pectoral muscle from the rest of the

breast area. The following image is how an initial mammogram looks before the preprocessing:

Figure 4.1. Raw mammogram.

The final result is the mammogram containing only the breast part.

Figure 4.2. Breast profile segmentation of two mammograms using the algorithm of [67].

In this project, the segmented images were also downscaled to a 0.5 factor to reduce the feature

extraction's computational time. It is later discussed in section 4.2.4. The rescaling was performed

using the rescale function provided by the scikit-image open-source image processing library for

Python [68].

41

4.2. Machine learning implementation

As seen in the literature review, the GLCM, LAWS, and LBP are widely used. In this section, the functions

developed in Python and the implementation until the classification will be described.

4.2.1. Extraction of GLCM features

Features extracted from the GLCM have been obtained thanks to the developed function. This

function is called GLCM_extractor.py, and the steps followed for each image are described in the

following diagram:

Figure 4.3. Steps followed by GLCM_extractor.py.

It has to be mention that step 2 (Figure 4.3) is required to maintain the original image size. Step three

is repeated so that a GLCM is computed as many times as pixels the original image had (Figure 4.4).

Report

42

Figure 4.4. Extraction of the statistical features of the first pixel from the GLCM.

Therefore, as a result, 5 matrix arrays are created and stored. The diagram below shows the calculation

of the last GLCM resulting in the ultimate pixel of the descriptors. They can be visualized as images:

Figure 4.5. Last step of the GLCM_extractor.py.

The 5 resulting images can be considered as a single image with 5 texture features for each pixel. From

this idea, a vector that defines each pixel 𝑖 of the original mammogram is obtained:

𝐺𝐿𝐶𝑀[𝑖] = [ 𝐶𝑜𝑛𝑡𝑟𝑎𝑠𝑡[𝑖] 𝐷𝑖𝑠𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦[𝑖] 𝐻𝑜𝑚𝑜𝑔𝑒𝑛𝑖𝑡𝑦[𝑖] 𝐸𝑛𝑒𝑟𝑔𝑦[𝑖] 𝐸𝑛𝑒𝑟𝑔𝑦[𝑖]] Eq. 4.1

43

The function GLCM_extractor.py is executed 12 times with the parameters shown in Table 4.2. It has

to be mentioned that GLCMs can be defined in eight offsets (0º, 45º, 90º, 135º, 180º, 225º, 270º, and

315º). Nevertheless, in the original definition, Haralick proposed to use only four directions spaced at

intervals of 45º, as the others are symmetrical [42]. Furthermore, the distance could present a wide

range of values, but the most used are d = 1, 2, 3. In this case, the distance has been chosen according

to the window size.

Table 4.2. Parameters used to extract the features on each image.

Every time the function is executed 5 texture images are obtained per each mammogram. Since

GLCM_extractor.py is run 12 times (Table 4.2), a total of 60 texture images are produced.

Consequently, 60 components describe every pixel.

Parameters

Windows

size (pixels) Angle (rad)

Distance (pixels)

Number of bins

Step (pixels)

1. 5x5 0 3 32 1

2. 5x5 𝜋

4 3 32 1

3. 5x5 𝜋

2 3 32 1

4. 5x5 3𝜋

4 3 32 1

5. 15x15 0 5 32 1

6. 15x15 𝜋

4 5 32 1

7. 15x15 𝜋

2 5 32 1

8. 15x15 3𝜋

4 5 32 1

9. 25x25 0 10 32 1

10. 25x25 𝜋

4 10 32 1

11. 25x25 𝜋

2 10 32 1

12. 25x25 3𝜋

4 10 32 1

Report

44

Initially, the first test was done with the parameters selected in Table 4.2 but using 256 bin levels. This

means that every GLCM computed had a size of 256x256 [69]. However, the computational time taken

for only one execution of the function for one image was around 20 min. As this has to be done 12

times per image, and the number of images is 1275, the total execution time to extract the GLCM

features would have been 212.5 days (20min · 12 𝑡𝑖𝑚𝑒𝑠 · 1275 𝑖𝑚𝑎𝑔𝑒𝑠).

To overcome this concerning issue and to be able to finalize the project in the timeline estimated

initially, different considerations and actions were taken:

• As mentioned in the preprocessing section (section 4.1), the image resolution was reduced to

half. Therefore, the computation time per image is reduced to half as well.

• The step parameter represents the number of pixels the sliding window moves in each

iteration; it is directly related to the number of GLCMs computed. Nevertheless, for instance,

if the step selected is 2, that would mean that the resulting texture images obtained are half

the original size. It is the reason why modifying this parameter to improve the computation

time was rejected.

• The function was optimized so that it does not compute the GLCMs of the background slices.

Since the background represents around 50% in many images, the computation time is

reduced to half.

• Subsequently, the number of bins chosen was 32. The GLCMs of 256 bins (with a size of

256x256) were mainly empty. Even in the most extensive window (25x25 pixels), the repetition

of pairs within the 256 levels is unlikely. Therefore, and following the initial objective of

reducing the computational time, the statistics were extracted from GLCMs with a size 32x32.

As a result of previous adjustments, the function's execution took less than a minute per image.

Therefore, making viable the fact of running the function 12 times per mammogram.

After more than 4 days, the 60 texture images that characterize the mammograms were obtained for

each of them. Due to the reasons explained in section 4.2.4, these features are combined to reduce

their number when building the feature dataset. This process is done using the developed function

df_mother.py. The procedure followed can be found in Figure 4.6.

45

Figure 4.6. GLCM feature reduction (homogeneity).

The figure above points out the main idea of what has been done to reduce the number of features.

Ultimately, the images with the same window size and distance (thus different angles) are averaged.

As pointed out in the literature, this is a fast way to obtain rotationally invariant GLCM features [70]

without losing much information. As a consequence, the final GLCM features are reduced to 15 instead

of the original 60.

Report

46

4.2.2. Extraction of LAWS features

The LAWS features have been extracted using the LAWS_extractor.py. The main steps of this function

are described in the diagram below:

Figure 4.7. Diagram of the feature extraction using LAWS_extractor.py (Part 1).

Essentially, from steps 1 to 3, the image is convoluted with the 15 kernels. Hence, 15 texture images

are obtained since, as mentioned in section 3.1.2, the 𝐿5𝐿5 mask is not used. The original size is

maintained thanks to the padding performed. These 15 images are combined following the

expressions listed in step 4 in Figure 4.7. Eventually, 9 texture images are obtained.

47

The texture image resulting from the convolution with the kernel 𝑅5𝑅5 is displayed as an example:

Figure 4.8. Texture image from 𝑅5𝑅5 and its histogram.

As humans can only distinguish 900 shades of gray and given the low contrast in the image (the vast

majority of pixels have values between -100 and 100), a color map can be applied for visualization

purposes:

𝐼𝑅5𝑅5

Figure 4.9. Use of a colormap to improve visualization of 𝐼𝑅5𝑅5.

Report

48

It can be seen that some parts of the inner breast are highlighted (Figure 4.9) but going back to the

feature extraction process, the last step consists of sliding a 5x5 window pixel by pixel to extract the

local mean, absolute mean, and standard deviation. The value obtained as a result of these three

statistical calculous is stored in three new images. The process is graphically described below using

IR5R5 as an example:

Figure 4.10. Last step of the LAWS_extractor.py.

49

This process is repeated for each of the 9 texture images obtained combining the 15 convolution results

(Step 4 in Figure 4.7). Hence, 27 texture images are obtained for each mammogram. The ones from the

mammogram used as an example in this section are displayed in the figure below:

Figure 4.11. Extraction of the features using LAWS_extractor.py (Part 2).

Report

50

Different window sizes were tested. Actually, LAWS features were extracted for all the dataset using:

5x5, 15x15, and 25x25 window sizes. However, the ones larger than 5x5 were discarded. The main

reason is that bigger windows distorted the images since they were too big compared to the resolution

of the mammograms after the preprocessing. These effects can be seen in the images below:

Figure 4.12. Texture images obtained using a 15x15 window.

Therefore, bigger window sizes were not used as they would have included noise and artifacts in the

features dataset.

Eventually, 27 features that describe each pixel 𝑖 of every mammogram are obtained:

𝐿𝐴𝑊𝑆[𝑖] = [ 𝐿𝐴𝑊𝑆1[𝑖] 𝐿𝐴𝑊𝑆2[𝑖]

𝐿𝐴𝑊𝑆3[𝑖] … 𝐿𝐴𝑊𝑆26[𝑖]

𝐿𝐴𝑊𝑆27[𝑖]] Eq. 4.2

The computation time per image took around 12 seconds with the 5x5 window. Hence, extracting the

features of the whole mammogram dataset needed a bit more than 4 hours. The larger the window,

the more execution time; therefore, this is a limitation and must be considered when choosing the

window size.

51

4.2.3. Extraction of LBP features

Lastly, LBP features were extracted with the LBP_extractor.py. The LBP features can be easily extracted

using the local_binary_pattern module available in the scikit-image open-source image processing

library for Python [51]. Therefore, the steps followed by the function are the followings:

Figure 4.13. Steps followed by GLCM_extractor.py.

The computation of the four methods is already implemented in the module. They represent the

following [51]:

• Default: original local binary pattern, which is grayscale but not rotation invariant.

• Ror: extension of default implementation, which is grayscale and

• rotation invariant.

• Uniform: improved rotation invariance with uniform patterns and finer quantization of the

angular space, which is grayscale and rotation invariant.

• Var: rotation invariant variance measures the contrast of local image texture, which is rotation

but not gray scale-invariant.

Similar to the GLCM process described in section 4.2.1, the amount of texture features is reduced by

combining them, as indicated in the following figure. It is done using the df_mother.py.

Report

52

Figure 4.14. Combination of the LBP features with different parameters.

The selection of the number of neighbors and radius are diverse in the literature. However, the ones

selected for the current project (listed in Figure 4.14) are similar to the ones used in other studies of

mammogram classification, especially in those that use the scale BI-RADS [39] [17] and proved good

accuracy.

It is essential to mention that the texture images obtained from the LBP (Figure 4.14) are not 8-byte

images since the local binary patterns can be as big as the number of neighbors. To display them here,

they were converted to 8 bits. Therefore, some images above seem to have no contrast and that the

average result appears to be inconsistent. For instance, the "default" ones result in an image with a

black background because the colormap is adjusted to the intensity range.

The computation time for the LBP_extractor.py takes less than a second per image, making it the

fastest among the feature extraction functions developed in this project.

Eventually, 4 values that describe each pixel are obtained:

𝐿𝐵𝑃[𝑖] = [ 𝐿𝐵𝑃1[𝑖] 𝐿𝐵𝑃2[𝑖]

𝐿𝐵𝑃3[𝑖] 𝐿𝐵𝑃4[𝑖]

] Eq. 4.3

53

4.2.4. Creating the feature dataset

Thanks to the three functions developed: GLCM_extractor.py, LAWS_extractor.py, and

LBP_extractor.py, 99 texture images were extracted for every mammogram with the different

parameters chosen. Henceforth, each pixel of any mammogram would have 99 values that describe

itself and the local pixel neighbors (texture), which is essential for classification in machine learning.

Specifically, the 99 features came from:

Table 4.3. Breakdown of features extracted.

EXTRACTOR TIMES RUN WITH DIFFERENT

PARAMETERS FEATURES PER RUN

FEATURES

GLCM 12 5 60 LAWS 1 27 27 LBP 3 4 12

Total = 99

After the feature combinations mentioned, the total number of features ends being:

Table 4.4. Breakdown of features extracted after combination.

EXTRACTOR TIMES RUN WITH

DIFFERENT PARAMETERS FEATURES PER RUN

FEATURES

GLCM 12 5 60 15 LAWS 1 27 27 LBP 3 4 12 4

Total = 46

Besides that, the dataset was still heavy. Actions were taken to avoid the further steps becoming time-

consuming for the computation. Averaging in blocks (pixel-binning) the 4 adjacent pixels into a super-

pixel [71] for each texture image was done to overcome the issue stated. The idea is represented in

Figure 4.15.

Report

54

Figure 4.15. Binning process of the texture images.

Consequently, the dataset size was reduced by a factor of 4 without a relevant loss of information since

the super-pixels are essentially an average of 4 neighbor pixels.

Afterward, all mammograms and texture images are stored in the data frame shown below. This data

reorganization is done via the developed function df_mother.py. In the example below, only some rows

have been taken for viewing purposes as it has thousands:

Table 4.5. Main data frame.

… …

(…)

image_name label pixel_number image LBP0 LBP1 LBP2 LBP3 GLCM0 GLCM1 GLCM2 GLCM14 LAWS0 LAWS1 LAWS2 Column1 LAWS25 LAWS26

LOW_A_0357_1.LEFT_MLO_b4.png 4 122 0,20 7437,72 3,59 1,96 1289,77 59,31 0,39 5,92 -991,16 707376,79 991,16 1,6E-14 3,0E-17






LOW_A_0357_1.LEFT_MLO_b4.png 4 158 0,43 164,08 0,62 0,43 1524,39 54,00 0,47 4,74 19,90 1366446,43 1004,14 2,7E-14 3,4E-17

































































… …

55

The reorganization of the images is done through a flattening process. It consists of converting a 2D

array to 1D:

[

𝑎0 𝑎1 𝑎2

𝑎3 𝑎4 𝑎5

𝑎6 𝑎7 𝑎8

] →

[

[ 𝑎0

𝑎1

𝑎2

𝑎3

𝑎4

𝑎5

𝑎6

𝑎7

𝑎8]

]

Eq. 4.4

Therefore, every mammogram and its texture images are flattened. Hence, resulting in that each row

of the table represents the pixel and its 46 features. It is done for every mammogram, and all of them

are concatenated in one single table conceptually similar to the following figure:

Figure 4.16. Conceptual dataset.

Report

56

The data that contains each column (Table 4.5) is described below:

• First column: mammogram file name.

• Second column: actual label assigned following the BI-RADS scale.

• Third column: mammogram pixel values.

• Fourth and ahead: texture image pixel values

It needs to be highlighted that the background has been removed. This is why the column

"pixel_number" begins with 122 instead of 0. 122 is the first pixel that contains the breast in that

specific example. This data frame is the one that will be used for the following steps.

4.2.5. Dense tissue segmentation

In some studies, the use of ROIs (regions of interest) that are manually selected [29][18] is widely used.

Essentially, it is cropping a section of the dense part of the mammogram (the part that varies among

the different BI-RADS categories) and extracting features to classify the mammogram.

Figure 4.17. Example of a selection of an ROI [29].

However, to avoid manual manipulation of the images and to make the process as automatized as

possible, a method to segment the dense area was implemented. Different unsupervised classification

methods for segmentation were tested (k-means, SVM, thresholding, and region growth).

Nevertheless, the one that showed promising results with a good trade-off between performance and

computational time, besides being easy to implement, was the Fuzzy C-Means algorithm. Actually, it

has already been used in previous projects showing good results [17][72][18]. Some segmentation

outputs are shown in Figure 4.18.

57

Figure 4.18. Segmentation examples through FCM.

The features used for this segmentation were selected by discarding the ones where the dense area

was not clearly contrasted nor easily identifiable from the rest of the breast. The results obtained using

all features were not appropriated:

Figure 4.19. Result of the segmentation test with all the features.

Report

58

The breast was successfully divided into two separate categories (Figure 4.18, number of clusters = 2):

fatty tissue and dense tissue after removing some features that were mainly black, such as the ones

below:

Figure 4.20. Examples of features discarded.

Nevertheless, in some situations, artifacts related to the preprocessing (next to where the pectoral

muscles were) are sometimes present:

Figure 4.21. Segmented artifacts.

The case above occurred in few images, and it is a minor error compared to the dense area. It needs

to be mentioned that this is an unsupervised segmentation method intended to sample relevant parts

(dense area) of the mammogram for the final classification. The Fuzzy C-Means (FCM) algorithm [14]

is an extension of the well-known k-Means algorithm (Figure 3.15). Fuzzy C-Means' main difference is

that each image pattern is associated with every cluster using a fuzzy membership function [73]. In

non-fuzzy clustering (also known as hard clustering), data is divided into distinct clusters, where each

data point can only belong to exactly one group. In fuzzy clustering, data points can potentially belong

to multiple clusters. For example, an apple can be red or green (hard clustering), but an apple can also

59

be red and green (fuzzy clustering) [74]. Hence, FCM is a practical and concise segmentation algorithm

[40] that allows checking the membership percentage of each cluster. This fact was valuable since it is

an unsupervised classification method. Therefore, the percentage of belonging to a cluster was used

as a parameter to check whether the features selected were significantly discriminating the fatty and

dense tissue. The algorithm used is the one provided by the official third-party software repository for

Python [74].

4.2.6. Classification

Right after the segmentation step, the pixels that do not belong to the dense tissue cluster are removed

from the dataset (Table 4.5). Then, the remaining dense pixels can be classified according to the 46

features. The dataset was fragmented into three parts: Training, validation, and test. Each of them with

the following percentages:

Figure 4.22. Data division sizes.

The accuracy using all features was not ideal (<0.5) even though different classifiers with a good trade-

off between computation and performance were tested. Therefore, a feature selection method was

implemented. It consisted of correlating the 46 features of the dataset training with the actual labels

and selecting only the highly correlated. The following heatmap (Figure 4.23) shows the correlation

between the labels and the features.

Report

60

Figure 4.23. Heatmap of the correlation.

This feature selection process is done using the feature_selection.py code. The features that showed

the highest correlation of the training dataset were selected and are the following ones:

Table 4.6. Features with high correlation.

LBP2 0.078577 LBP3 0.058869 GLCM0 0.053648 GLCM2 0.086261 GLCM3 0.081330 GLCM4 0.093957 GLCM5 0.058481 GLCM7 0.109125 GLCM8 0.101990 GLCM9 0.106501 GLCM10 0.050399 GLCM12 0.106888 GLCM13 0.087769 GLCM14 0.097127 LAWS2 0.084407 LAWS5 0.054620 LAWS26 0.051825

61

It has to be taken into account that what is being classified are not the images but pixels. Hence, the

k-NN algorithm, already introduced in section 3.2, is used with the selected features (Table 4.6). The

k-NN was chosen due to its low computational demand and the feasible training and prediction time.

Furthermore, it is also recommended by scikit [75] for the type of dataset of this project. Moreover, it

is one of the most used methods for image classification using texture not only in medical images [19]

[76] but also in other fields that use texture extraction features [77].

All previous steps and in advance are done with classifier.py. The k-NN algorithm was set with a k = 3;

this criterion means that the prediction of new points will be made according to the three closest

neighbors (Figure 3.15). Usually, an odd number is chosen if the number of classes is even [78], which

is the case. Moreover, in studies with similar datasets and texture features, there was no significant

difference in accuracy using k numbers between 1 to 9 [79]. Since higher numbers mean higher

computational demand, a k = 3 was chosen. The predictions obtained with the trained k-NN can be

seen in Table 4.7:

Table 4.7. Classified pixels of each image.

image name actual label dense 1 2 3 4

LOW_D_4576_1.RIGHT_MLO_b1.png 1 811 71,76 18,99 8,01 1,23

LOW_pdb059ls_b1.png 1 5200 63,31 23,15 12,77 0,77

LOW_pdb070rl_b1.png 1 8014 66,47 17,43 14,99 1,11

LOW_pdb267ll_b1.png 1 7455 63,30 21,07 14,77 0,86

LOW_pdb301lm_b1.png 1 6352 66,33 23,76 9,01 0,91

LOW_pdb301lm_b1.png 1 6352 66,33 23,76 9,01 0,91

LOW_pdb305lm_b1.png 1 5656 68,00 21,53 9,95 0,51

LOW_pdb306rm_b1.png 1 5477 61,71 24,23 13,75 0,31

LOW_tdb052mlol_b1.png 1 7653 71,25 19,50 8,55 0,71

LOW_tdb052mlol_b1.png 1 7653 71,25 19,50 8,55 0,71

LOW_tdb087mlor_b1.png 1 8232 67,59 20,23 11,31 0,87

LOW_tdb087mlor_b1.png 1 8232 67,59 20,23 11,31 0,87

LOW_pdb128rm_b2.png 2 4077 48,15 30,88 18,91 2,06

LOW_pdb192rs_b2.png 2 4122 67,25 20,35 11,64 0,75

LOW_pdb202rl_b2.png 2 9450 51,21 24,85 22,77 1,17

LOW_pdb207lm_b2.png 2 3346 59,59 22,30 16,38 1,73

LOW_tdb020mlol_b3.png 3 8095 60,73 15,86 21,89 1,52

LOW_tdb050mlor_b3.png 3 3519 52,57 19,10 26,83 1,51

LOW_tdb050mlor_b3.png 3 3519 52,57 19,10 26,83 1,51

LOW_tdb020mlol_b3.png 3 8095 60,73 15,86 21,89 1,52

LOW_tdb038mlor_b3.png 3 5678 57,68 20,36 20,38 1,59

LOW_A_0254_1.RIGHT_MLO_b4.png 4 239 42,26 22,59 25,52 9,62

LOW_A_0261_1.LEFT_MLO_b4.png 4 653 43,64 17,76 29,10 9,49

LOW_B_3606_1.LEFT_MLO_b4.png 4 1176 47,36 19,13 25,77 7,74

LOW_D_4506_1.RIGHT_MLO_b4.png 4 294 48,98 25,17 18,37 7,48

LOW_pdb002rl_b4.png 4 4136 45,50 19,17 25,75 9,57

LOW_pdb172rl_b4.png 4 4045 29,62 21,11 34,78 14,49

Report

62

In Table 4.7, the first column is the picture's file name, the second is the actual label (in BI-RADS), and

the third is the number of dense pixels. The dense number depends on the size of the breast and

resolution, which are not related to the BI-RADS scale. Columns four, five, six, and seven represent the

percentage of the dense pixel that ended up classified as BI-RADS 1, 2, 3, or 4, respectively.

Intuitively, it might seem that an image should be eventually classified by looking at the column with a

higher percentage. However, if this is done in this case, all will end up as BI-RADS 1. Hence, to perform

the final classification, two different methods were tested. The first one was to classify them by

choosing thresholds that define each class's four percentages. Nevertheless, the data to find the

tendency was insufficient as validation was only 10% of the whole dataset. Therefore, this idea was

discarded after the initial tests were not successful. Moreover, it was a manual and arbitrary process.

On the other hand, optimization.py was developed. Essentially, it uses the scipy optimize module [80]

to find four parameters that multiply each of the four columns that contain the percentages (Table

4.7). The objective of this function is to find the parameters that maximize the accuracy. It was done

using the classified validation data. Fundamentally, this is pondering and giving more importance to

the different columns. The parameters are the followings:

𝑤 ≈ [0.80 2.32 2.32 10.91] Eq. 4.5

Therefore, the final classification can be obtained by multiplying each of the four columns with the four

parameters above. The resulting label ("Result" column in Table 4.8) is the maximum among the

pondered columns. The correct classified mammograms are highlighted in green.

63

Table 4.8. Final classification.

image name actual label dense W1 W2 W3 W4 Result

LOW_D_4576_1.RIGHT_MLO_b1.png 1 811 57,4 44,1 18,6 13,4 1

LOW_pdb059ls_b1.png 1 5200 50,6 51,6 29,6 8,4 2

LOW_pdb070rl_b1.png 1 8014 53,2 38,9 34,8 12,1 1

LOW_pdb267ll_b1.png 1 7455 50,6 47,0 34,3 9,4 1

LOW_pdb301lm_b1.png 1 6352 53,1 53,0 20,9 10,0 1

LOW_pdb301lm_b1.png 1 6352 53,1 53,0 20,9 10,0 1

LOW_pdb305lm_b1.png 1 5656 54,4 48,0 23,1 5,6 1

LOW_pdb306rm_b1.png 1 5477 49,4 54,0 31,9 3,4 2

LOW_tdb052mlol_b1.png 1 7653 57,0 43,5 19,8 7,7 1

LOW_tdb052mlol_b1.png 1 7653 57,0 43,5 19,8 7,7 1

LOW_tdb087mlor_b1.png 1 8232 54,1 45,1 26,2 9,5 1

LOW_tdb087mlor_b1.png 1 8232 54,1 45,1 26,2 9,5 1

LOW_pdb128rm_b2.png 2 4077 38,5 68,9 43,9 22,5 2

LOW_pdb192rs_b2.png 2 4122 53,8 45,4 27,0 8,2 1

LOW_pdb202rl_b2.png 2 9450 41,0 55,4 52,8 12,8 2

LOW_pdb207lm_b2.png 2 3346 47,7 49,7 38,0 18,9 2

LOW_D_4514_1.LEFT_MLO_b3.png 3 565 39,4 36,3 62,4 83,0 4

LOW_tdb020mlol_b3.png 3 8095 48,6 35,4 50,8 16,6 3

LOW_tdb050mlor_b3.png 3 3519 42,1 42,6 62,2 16,4 3

LOW_tdb050mlor_b3.png 3 3519 42,1 42,6 62,2 16,4 3

LOW_tdb020mlol_b3.png 3 8095 48,6 35,4 50,8 16,6 3

LOW_tdb038mlor_b3.png 3 5678 46,1 45,4 47,3 17,3 3

LOW_A_0254_1.RIGHT_MLO_b4.png 4 239 33,8 50,4 59,2 104,9 4

LOW_A_0261_1.LEFT_MLO_b4.png 4 653 34,9 39,6 67,5 103,5 4

LOW_B_3606_1.LEFT_MLO_b4.png 4 1176 37,9 42,7 59,8 84,3 4

LOW_D_4506_1.RIGHT_MLO_b4.png 4 294 39,2 56,1 42,6 81,6 4

LOW_pdb002rl_b4.png 4 4136 36,4 42,8 59,7 104,4 4

LOW_pdb172rl_b4.png 4 4045 23,7 47,1 80,7 157,9 4

Report

64

The diagram below shows the steps followed to obtain the classification:

Figure 4.24. Classification process.

As shown in the diagram above, the predictions of the validation data are used to find the weights'

values. These weights are later used to classify the predictions of the test data. Hence, mammograms

are labeled depending on the maximum value (same idea shown in Table 4.8).

4.3. Deep learning implementation

Different architectures of deep learning models were tested, including Alexnet, Inception, Resnet50,

and VGG-16. In addition, these models were pre-trained with ImageNet to prevent overfitting. The

ImageNet project is an enormous database with more than 14 million images, many of which were

hand-annotated. This project has been extremely helpful in advancing computer vision and deep

learning research. In addition, the data is available for free to researchers for non-commercial use [81].

4.3.1. VGG-16

VGG-16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the

Visual Geometry Group (VGG) at the University of Oxford in 2014 [82], achieving 92.7% top-5 test

accuracy in ImageNet. Besides that, this architecture gave the best results among the four tested. The

preliminary performance obtained in the tests with different architectures can be found in Annex A.

65

The VGG-16 is depicted in the following figure:

Figure 4.25. The architecture of VGG-16 [83].

The architecture of VGG-16 is considerably simple. As represented above, it is composed of 2

contiguous blocks of 2 convolution layers followed by a max-pooling. Subsequently, it has 3 contiguous

blocks of 3 convolution layers followed by another max-pooling. Finally, 3 FC dense layers are found

before the output.

Therefore, features are extracted on every convolution layer forming a feature map when a

mammogram is given. Additionally, every time there is max-pooling, the size of the data is reduced to

half by downsampling the input along its spatial dimensions (height and width), taking the maximum

values. In the end, the fully connected layers are found, and the very last layer of this network has as

many neurons as the number of classes to predict.

Once the model is trained, each input mammogram will trigger and activate the different neurons until

they activate one of the last ones, corresponding to every 4 BI-RADS categories (Table 1.1).

Report

66

4.3.2. Data augmentation

It needs to be mentioned that an extra dataset of mammograms was needed as, typically, in deep

learning applications, big training datasets are required to achieve consistent accuracy. This additional

dataset was only used in the deep learning approach. As already mentioned in section 4.1, the

additional mammograms were obtained from the BCDR repository [84].

Even though more breast images were added to the dataset, it was still insufficient for the correct train

and performance of the model. Therefore, a technique known as data augmentation was implemented

to enlarge the dataset of mammograms. It consists of adding slightly modified copies of already existing

mammograms, increasing the images in the dataset to reduce overfitting [85].

Figure 4.26. Example of data augmentation.

Different transformations were tested for the data augmentation, such as vertical and horizontal

flipping, perspective distortions, rotations, blurs, or paddings. These data augmentation techniques

can be easily implemented in Python [14] using the open-source module transforms from Pytorch [86].

Three tests were assessed containing 6, 10, and 12 transformations each, increasing the training data

set by 6, 10, and 12 times, respectively.

67

The transformation performed on each test are listed in the following table:

Table 4.9. Data augmentation methods for each test.

AUG1

1. Horizontal flip 2. Vertical Flip 3. Padding

4. Random perspective distortion 5. Random affine 6. Random rotation

AUG2

1. Horizontal flip 2. Vertical Flip 3. Padding 4. Random affine

5. Random perspective distortion 6. Random rotation 7. Color jitter 8. Gaussian blur

AUG3

1. Horizontal flip 2. Vertical Flip 3. Padding 4. Random affine 5. Random rotation 6. Color jitter

7. Random perspective distortion 8. Random rotation with expansion 9. Random resized crop 10. Center crop 11. Resize 12. Gaussian blur

4.3.3. Training and Learning curves

The model was evaluated with the three training datasets obtained using different data augmentation

methods (Table 4.9).

The training size was composed of 1200 mammographies (90% of the dataset). Hence, after the data

augmentation step, test 1 had 7200 mammograms, test 2 had 9600, and test 3 had 144000. The

learning curves are depicted in the following three figures (Figure 4.27, Figure 4.28, and Figure 4.29);

the validation was done with the remaining 134 mammograms (10% of the dataset).

Report

68

Figure 4.27. AUG1 Loss vs. Epoch.


69


In Figure 4.27, Figure 4.28, and Figure 4.29, the red dotted line marks the point where the loss for the

validation was minimal. It is the point where the early stop was performed (described in section 3.3),

meaning that the error in that instant was minimum when trying to classify the mammograms in the

validation dataset. Overall, the loss is similar between all tests. However, it can be seen that the more

extensive dataset, the minimum loss for the validation data is achieved earlier (in fewer epochs).

4.3.4. Interpreting the model performance

In section 2, it was shown that deep learning has proven unprecedented accuracy in medical image

classification. However, one of the biggest problems humans encounter in deep learning is model

interpretability. In other words, to understand the model as we do in machine learning, where is it

possible to tear apart all steps and comprehend them.

Therefore, in this section, a widely used technique that makes CNN-based models more

understandable and clear is presented. This technique is called Gradient-weighted Class Activation

Mapping (Grad-CAM) [87]. It takes the class-specific gradient (weights) information of the final

convolutional layer to produce a localization map (image) from the input (mammogram) of the most

relevant regions that trigger the neurons to classify as one or another category.

The Grad-CAMs obtained after 25 epochs training the VGG-16 with the training dataset test 3 are

shown in Figure 4.30. The heatmap indicates that the dense areas of the breast are the ones to which

Report

70

the neural network has given more importance. Hence, confirming that the model is suitable since

mammograms are being classified depending on the dense area. Similarly to the process physician

follow when classifying in the BI-RADS categories (section 1.4). Actually, for the BI-RADS 3 and 4, the

entire dense area is highlighted by the Grad-CAM, comparable to the segmentation done in the

machine learning approach (section 4.2.5):

Figure 4.30. Confusion matrix for training dataset (test 1, 25 epochs). Image and Grad-CAM.

71

5. Discussion

In this section, the results obtained from the machine learning and deep learning approach are

compared and discussed. The tool used to assess the results and performance of the two approaches

is the confusion matrix. Each row of the matrix represents the relative occurrences in a real class, while

each column represents the relative occurrences in a predicted class. Therefore, the higher the

diagonal values of the confusion matrix, the more correct predictions [88].

Firstly, the machine learning approach is analyzed. The resulting confusion matrix is the following:

Figure 5.1. Confusion matrix of the ML approach.

By looking at Figure 5.1, it can be stated that the approach cannot efficiently differentiate between BI-

RADS 1 and 2 and 3 and 4. However, the accuracy for the grouped classes is relevant (Figure 5.2).

Report

72

Figure 5.2. Binned confusion matrix of the ML approach.

In fact, the ML model can differentiate between benign and malign mammograms since, by definition,

only BI-RADS 3 and 4 are likely to be cancerous (Table 1.1). In real clinical practice, out of the four BI-

RADS breast density categories, it is relatively easy to discriminate between "entirely fatty" and

"extremely dense" by visual assessment [53]. In those situations, physicians are comfortable and

confident making decisions without needing assistance. However, it is challenging for them to

differentiate between the two categories. Hence, this is an interesting approach that could help

physicians generate a prediction to help improve the determination of a BI-RADS breast density

category. The results indicate that malign breast tissue could be identified in 91% of the cases.

However, 22% of the BI-RADS 1 and 2 are false positive, meaning that (without human validation) the

model would classify almost a quarter of the mammographies wrongly. In any case, having a reduced

amount of false negatives (only 0.9%) is less concerning than having false positives and erroneously

consider patients as healthy.

On the other hand, three different training datasets were utilized for the deep learning approach with

different rates of data augmentation (as explained in section 4.3.2). However, the one with the largest

data augmentation (AUG3) gave the best results. The confusion matrices of the other two can be found

in Annex B.

73

The resulting confusion matrix for the deep learning approach is the following:

Figure 5.3. Confusion matrix of the DL approach (AUG3).

Comparing Figure 5.1 and Figure 5.3, it can be seen that the deep learning approach obtained far better

results. Actually, by looking at the confusions matrix’s diagonal in Figure 5.3, the DL model can

discriminate significantly better between the four BI-RADS categories than the ML approach.

In addition, if the confusion matrix is binned (Figure 5.4), the results indicate that malign breast tissue

could be identified in 96% of the cases and the benign tissue in 91% of the cases, considerably reducing

the number of false positives with respect to the machine learning approach.

Report

74

Figure 5.4. Binned Confusion Matrix of the DL approach (AUG3).

It needs to be mentioned that the deep learning model trained with the other two datasets (AUG1

and AUG2) had less global accuracy (Annex B). However, the one trained with AUG1 showed an

outstanding accuracy of 99% for the grouped categories BI-RADS 3 and 4.

Therefore, even though having a small mammogram dataset, deep learning has shown significantly

better results compared to the machine learning approach developed in this project. Besides that, the

human intervention in the deep learning approach has been minimal. In contrast, every step needed

to be programmed in ML, and its correct performance needed to be assessed as well. Furthermore,

the texture feature extraction process took 10 times longer than the training of the DL approach, not

only with the VGG-16 but with the different deep learning models tested. Fundamentally, this is

because the architecture of a CNN allows the parallelization of the computation. Thus, making simple

calculous in parallel that otherwise would need an extensive amount of time.

In summary, the deep learning approach has shown a promising potential demonstrating best results,

superlative computational performance, and most importantly, requiring less human intervention

making the implementation more straightforward and less error-prone.

75

6. Environmental impact

The environmental impact can be extensively debated by evaluating all the parts that have been

involved directly or indirectly in this project development resulting in many lines of discussion.

However, this whole project is a set of codes that include different steps entirely developed with the

Spyder [89] and Colab [90] environment in the programming language Python 3 [14]. Therefore, there

is no tangible item manufactured or produced. Furthermore, even though the radiological images have

a substantial energy waste associated and the acquisition system has a considerable environmental

impact, the usage is ethically and economically justified (section 1). Having said that, the image

acquisition process is out of the scope of this report. Only the generation of CO2 by the computer's

electricity consumption is studied for the environmental impact.

Basically, as stated in section 1.6, approximately 18 weeks were employed to develop the project.

Considering an average of 35 hours of work per week using a computer, it can be estimated that

electricity was consumed for a total of 630 hours plus the almost 250 hours required for the extraction

of texture features, ML model training, and other tests carried out. This part was done using a Dell XPS

13 9300 with an average consumption of 33 W, according to the computer's specifications [91].

On the other hand, the training and obtention of the results from the DL model required an execution

time of 30 hours in Colab [90] using a GPU NVIDIA Tesla T4 with a consumption of 70 W [92].

Therefore, the total consumption of the computer for the whole project is 31.1 kWh:

𝐸𝑛𝑒𝑟𝑔𝑦 𝑐𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 = (630 + 250) · 33 + 30 · 70 = 31140 𝑊ℎ = 31.1 𝑘𝑊ℎ Eq. 6.1

In Catalonia, it is estimated that each kW produced generates 321 g of CO2 [93]. Hence, assuming that

the Colab [90] machine was located in Catalonia, the development of this project has associated a

carbon footprint of 10 kg of CO2:

𝐶𝑂2 𝑝𝑟𝑜𝑑𝑢𝑐𝑒𝑑 = 31.1 𝑘𝑊 · 321

𝑔 𝐶𝑂2

𝑘𝑊= 10 𝑘𝑔 𝐶𝑂2

Eq. 6.2

77

Conclusions

In this project, the inherent necessity to develop a tool to assist physicians in the BI-RADS classifications

task and, consequently, in breast cancer diagnosis has been explained. Briefly, this is a result of the

current diagnosis and screening method being error-prone. This fact is very relevant to today's clinic

practice since preventing these errors could save many lives. Therefore, evaluating and comparing the

suitability of implementing deep learning or machine learning for mammogram classification is the

herein project's main objective. To achieve it, two independent models (one of DL and one of ML) were

fully implemented. The final models presented were chosen after reviewing the literature, assessing

the different possibilities, and testing them. Eventually, the two presented were the ones that were

more suitable and gave the best preliminary results given the resources available (database, hardware,

and time).

However, even though the two approaches were implemented and are functional, the most relevant

thing to examine is not the results obtained with each approach. Instead, the most thought-provoking

points to analyze here are the differences observed while implementing one or the other.

For the ML approach, the features needed to be extracted manually. In other words, the features to

extract (texture) has to be chosen and the extraction algorithm needs to be programmed and

developed, requiring human effort and knowledge. In addition to that, traditional techniques rely on

many simple operations computed in series (one after the other) since parallelizing them requires an

additional programming effort. Therefore, the feature extraction process in machine learning is time-

consuming.

Moreover, besides extracting the features, it is also fundamental to check whether they are actually

differentiating the categories efficiently. Many of the features can induce noise or useless information,

increasing the computational cost and affecting the accuracy. Hence, another step that assesses the

feature performance must be included, possibly leading to another manual process of feature

selection. Finally, after choosing the classification algorithm, the feature dataset needs to be arranged

interpretably by the classification algorithm, then the model is trained and ready.

On the other hand, the main drawback of deep learning models is that they usually need more data

for efficient training. Nevertheless, this was overcome using data augmentation techniques, a quick

and simple step. Afterward, the mammograms were directly used to train the model that, connected

to a GPU, took less than 30 min to train a model. The short time is due to the fact that the computation

is parallelized, optimizing the time. This allowed the complete assessment and testing of different

architectures and data augmentation methods.

Report

78

In summary, the results were considerably better for the deep learning approach and outstanding

when the categories were grouped. However, the truly significant added value that DL offers is that

the implementation is immensely straightforward. Indeed, it is as simple as inputting the

mammograms to train the model. Additionally, the training and automatic feature extraction were

done in less than 30 min, while in the machine learning approach, the extraction step took more than

200 hours. This is not a direct limitation since once the model is trained, it is ready to be used if it was

ever implemented in an application. However, every time a new mammogram has to be classified, the

texture features would need to be extracted, taking more than 5 minutes. In contrast, the deep

learning approach can classify a new mammogram in less than seconds, making it more commercially

viable.

In conclusion, with the results obtained, the fast and straightforward implementation, and its viability,

the deep learning approach is a promising candidate to be further improved. Therefore, the main

objective of this project is achieved, concluding and justifying that the path to follow in future works is

deep learning.

Nevertheless, in the approach presented in this project, the accuracy in distinguishing between the

four classes was not optimal, therefore, there is still scope to improve. In this case, data augmentation,

dropout, and varying the learning rate to increase the accuracy were attempted. However, the

overfitting was not reduced at all. Having said that, in future works adding more images to the training

could be key. With a larger dataset, not only will the accuracy improve but also deeper architectures

could be implemented and tested (deeper architectures overfit easily if the dataset is not large

enough). Finally, an intuitive graphical user interface (GUI) for the deep learning model could be

developed. Hence, making the implementation and actual use in the clinical practice possible.

79

Budget

In this section, the hypothetical cost that the development of this project would have is described. The

budget is divided into two primary sources of expenses: the cost of the personnel involved and the cost

of the materials used.

Personnel cost

The costs of hiring a junior engineer to carry out the tasks listed in Figure 1.3 corresponding to the

project’s development are given in Table 0.1. It is estimated that the average salary for that position

is 13 € per hour [90]. Considering a workday of 7 hours and 18 weeks, it results in 630 hours of

work. Hence, adding up a cost of 8190 €.

Table 0.1. Cost for the personnel work.

Tasks Working

hours Cost (€)

First meetings with the tutor 5 65 Planification 12 156

Introduction to the topic 18 234 Delve deeper into Python 15 195

Bibliographic research 50 650 Implementation of the extractors 40 520

Feature extraction 80 1040 Store information in the dataset 40 520

Dense segmentation 55 715 Model selection and classification 70 910

Meeting: revision and improvements 5 65 Writing 30 390

First contact with deep learning 14 182 Literature review 10 130 Meeting with ITM 1 13

Implementation of DL to data set 70 910 Results obtention 20 260

Discussion 15 195 Writing 45 585

Final review 35 455

Total 630 h 8190 €

Report

80

Materials cost

The expenses related to the hardware and software licenses used throughout the development of the

project are estimated in the following table:

Table 0.2. Cost for the materials used.

Hardware Cost (€)

Laptop 1,499 External hard drive 70

Software Cost (€) Microsoft office 35

Colab pro 9

Total 1613 €

To summarize, the project's final cost goes up to 8190 + 1613 = 9893€ without considering the

charge of having someone to supervise the project and its actual implementation.

81

Bibliography

[1] H. Sung et al., “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., p. caac.21660, Feb. 2021, doi: 10.3322/caac.21660.

[2] O. Wyman, “El impacto económico y social del cáncer en España,” 2020. [Online]. Available: https://www.aecc.es/sites/default/files/content-file/Informe-Los-costes-cancer.pdf.

[3] T. Winslow, “National Cancer Institute - Breast Cancer Screening (PDQ®)–Patient Version,” 2013. https://www.cancer.gov/types/breast/patient/breast-screening-pdq (accessed May 03, 2021).

[4] E. Morris, S. A. Feig, M. Drexler, and C. Lehman, “Implications of Overdiagnosis: Impact on Screening Mammography Practices,” Population health management, vol. 18, no. Suppl 1. Mary Ann Liebert, Inc., pp. S3–S11, Sep. 01, 2015, doi: 10.1089/pop.2015.29023.mor.

[5] “Definition of metastatic - NCI Dictionary of Cancer Terms - National Cancer Institute.” https://www.cancer.gov/publications/dictionaries/cancer-terms/def/metastatic (accessed May 01, 2021).

[6] “What Is Breast Cancer? | CDC,” September 14, 2020. https://www.cdc.gov/cancer/breast/basic_info/what-is-breast-cancer.htm (accessed Jan. 23, 2021).

[7] M. Broeders et al., “The impact of mammographic screening on breast cancer mortality in Europe: A review of observational studies,” J. Med. Screen., vol. 19, no. SUPPL. 1, pp. 14–25, Sep. 2012, doi: 10.1258/jms.2012.012078.

[8] P. Autier and M. Boniol, “Mammography screening: A major issue in medicine,” Eur. J. Cancer, vol. 90, pp. 34–62, Feb. 2018, doi: 10.1016/j.ejca.2017.11.002.

[9] “What Is a Mammogram? | CDC,” Centers for Disease Control and Prevention. https://www.cdc.gov/cancer/breast/basic_info/mammograms.htm (accessed Apr. 24, 2021).

[10] D. R. DANCE, “PHYSICAL PRINCIPLES OF MAMMOGRAPHY,” in Physics for Medical Imaging Applications, Springer Netherlands, 2007, pp. 355–365.

[11] D. Ribli, A. Horváth, Z. Unger, P. Pollner, and I. Csabai, “Detecting and classifying lesions in mammograms with Deep Learning OPEN,” doi: 10.1038/s41598-018-22437-z.

[12] F. P. Kestelman et al., “BREAST IMAGING REPORTING AND DATA SYSTEM-BI-RADS®: POSITIVE PREDICTIVE VALUE OF CATEGORIES 3, 4 AND 5. A SYSTEMATIC LITERATURE REVIEW*,” 2007.

[13] K. Pesce, M. B. Orruma, C. Hadad, Y. B. Cano, R. Secco, and A. Cernadas, “BI-RADS terminology for mammography reports: What residents need to know,” Radiographics, vol. 39, no. 2. Radiological Society of North America Inc., pp. 319–320, Mar. 01, 2019, doi: 10.1148/rg.2019180068.

Report

82

[14] “Welcome to Python.org.” https://www.python.org/ (accessed Jun. 08, 2021).

[15] “MATLAB - El lenguaje del cálculo técnico - MATLAB & Simulink.” https://es.mathworks.com/products/matlab.html (accessed Jun. 08, 2021).

[16] E. T. Pereira, S. P. Eleutério, and J. M. Carvalho, “Local Binary Patterns Applied to Breast Cancer Classification in Mammographies,” Rev. Informática Teórica e Apl., vol. 21, no. 2, p. 32, Nov. 2014, doi: 10.22456/2175-2745.46848.

[17] C. Mata, J. Freixenet, X. Lladó, and A. Oliver, “Texture Descriptors applied to Digital Mammography,” 2008.

[18] R. Rabidas, A. Midya, J. Chakraborty, and W. Arif, “A Study of Different Texture Features Based on Local Operator for Benign-malignant Mass Classification,” Procedia Comput. Sci., vol. 93, no. September, pp. 389–395, 2016, doi: 10.1016/j.procs.2016.07.225.

[19] P. Sonar, U. Bhosle, and C. Choudhury, “Mammography classification using modified hybrid SVM-KNN,” in Proceedings of IEEE International Conference on Signal Processing and Communication, ICSPC 2017, Mar. 2018, vol. 2018-January, pp. 305–311, doi: 10.1109/CSPC.2017.8305858.

[20] A. K. Mohanty, S. Beberta, and S. K. Lenka, “Classifying Benign and Malignant Mass using GLCM and GLRLM based Texture Features from Mammogram,” Int. J. Eng. Res. Appl., vol. 1, no. 3, pp. 687–693, 2011.

[21] T. Sadad, A. Munir, T. Saba, and A. Hussain, “Fuzzy C-means and region growing based classification of tumor from mammograms using hybrid texture feature,” J. Comput. Sci., vol. 29, pp. 34–45, 2018, doi: 10.1016/j.jocs.2018.09.015.

[22] S. J. S. Gardezi and I. Faye, “Fusion of completed local binary pattern features with curvelet features for mammogram classification,” Appl. Math. Inf. Sci., vol. 9, no. 6, pp. 3037–3048, 2015, doi: 10.12785/amis/090633.

[23] A. C. Phadke and P. P. Rege, “Fusion of local and global features for classification of abnormality in mammograms,” Sadhana - Academy Proceedings in Engineering Sciences, vol. 41, no. 4. pp. 385–395, 2016, doi: 10.1007/s12046-016-0482-y.

[24] C. Wang, A. R. Brentnall, J. Cuzick, E. F. Harkness, D. G. Evans, and S. Astley, “A novel and fully automated mammographic texture analysis for risk prediction: Results from two case-control studies,” Breast Cancer Res., vol. 19, no. 1, pp. 1–13, Oct. 2017, doi: 10.1186/s13058-017-0906-6.

[25] A. Manduca et al., “Texture features from mammographic images and risk of breast cancer,” Cancer Epidemiol. Biomarkers Prev., vol. 18, no. 3, pp. 837–845, Mar. 2009, doi: 10.1158/1055-9965.EPI-08-0631.

[26] R. Nithya and B. Santhi, “Application of texture analysis method for mammogram density classification,” J. Instrum., vol. 12, no. 07, pp. P07009--P07009, Jul. 2017, doi: 10.1088/1748-0221/12/07/p07009.

83

[27] Kriti and J. Virmani, “Breast density classification using Laws’ mask texture features,” Int. J. Biomed. Eng. Technol., vol. 19, no. 3, 2015, doi: 10.1504/IJBET.2015.072999.

[28] A. H. Farhan and M. Y. Kamil, “Texture Analysis of Breast Cancer via LBP, HOG, and GLCM techniques,” IOP Conf. Ser. Mater. Sci. Eng., vol. 928, no. 7, 2020, doi: 10.1088/1757-899X/928/7/072098.

[29] A. S. Setiawan, Elysia, J. Wesley, and Y. Purnama, “Mammogram Classification using Law’s Texture Energy Measure and Neural Networks,” Procedia Comput. Sci., vol. 59, no. Iccsci, pp. 92–97, 2015, doi: 10.1016/j.procs.2015.07.341.

[30] A. Gastounioti, A. Oustimov, M. K. Hsieh, L. Pantalone, E. F. Conant, and D. Kontos, “Using Convolutional Neural Networks for Enhanced Capture of Breast Parenchymal Complexity Patterns Associated with Breast Cancer Risk,” Acad. Radiol., vol. 25, no. 8, pp. 977–984, Aug. 2018, doi: 10.1016/j.acra.2017.12.025.

[31] M. M. Jadoon, Q. Zhang, I. U. Haq, S. Butt, and A. Jadoon, “Three-Class Mammogram Classification Based on Descriptive CNN Features,” 2017, doi: 10.1155/2017/3640901.

[32] R. Arora, P. K. Rai, and B. Raman, “Deep feature–based automatic classification of mammograms,” Med. Biol. Eng. Comput., vol. 58, no. 6, pp. 1199–1211, Jun. 2020, doi: 10.1007/s11517-020-02150-8.

[33] G. Altan, “Deep learning-based mammogram classification for breast cancer,” Int. J. Intell. Syst. Appl. Eng., vol. 8, no. 4, pp. 171–176, Dec. 2020, doi: 10.18201/ijisae.2020466308.

[34] Y. J. Suh, J. Jung, and B. J. Cho, “Automated breast cancer detection in digital mammograms of various densities via deep learning,” J. Pers. Med., vol. 10, no. 4, pp. 1–11, 2020, doi: 10.3390/jpm10040211.

[35] L. Shen, L. R. Margolies, J. H. Rothstein, E. Fluder, R. McBride, and W. Sieh, “Deep Learning to Improve Breast Cancer Detection on Screening Mammography,” Sci. Rep., vol. 9, no. 1, pp. 1–12, Dec. 2019, doi: 10.1038/s41598-019-48995-4.

[36] A. A. Mohamed, W. A. Berg, H. Peng, Y. Luo, R. C. Jankowitz, and S. Wu, “A deep learning method for classifying mammographic breast density categories,” Med. Phys., vol. 45, no. 1, pp. 314–321, Jan. 2018, doi: 10.1002/mp.12683.

[37] D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H. Beck, “Deep Learning for Identifying Metastatic Breast Cancer,” Jun. 2016, Accessed: Jun. 08, 2021. [Online]. Available: http://arxiv.org/abs/1606.05718.

[38] A. P. Adedigba, S. A. Adeshinat, and A. M. Aibinu, “Deep learning-based mammogram classification using small dataset,” in 2019 15th International Conference on Electronics, Computer and Computation, ICECCO 2019, Dec. 2019, doi: 10.1109/ICECCO48375.2019.9043186.

[39] C. M. Miquel, S. J. Freixenet, and X. Llad, “MSc . Thesis VIBOT Texture Descriptors applied to Digital Mammography,” 2009.

Report

84

[40] B. A. Jenkins and E. A. Lumpkin, “Developing a sense of touch,” Dev., vol. 144, no. 22, pp. 4048–4090, Nov. 2017, doi: 10.1242/dev.120402.

[41] L. Armi and S. Fekri-Ershad, “Texture image analysis and texture classification methods - A review,” arXiv, vol. 2, no. 1, pp. 1–29, 2019.

[42] R. M. Haralick, I. Dinstein, and K. Shanmugam, “Textural Features for Image Classification,” IEEE Trans. Syst. Man Cybern., vol. SMC-3, no. 6, pp. 610–621, 1973, doi: 10.1109/TSMC.1973.4309314.

[43] “Co-occurrence matrix - Wikipedia.” https://en.wikipedia.org/wiki/Co-occurrence_matrix (accessed May 16, 2020).

[44] S. Van Der Walt et al., “Scikit-image: Image processing in python,” PeerJ, vol. 2014, no. 1, 2014, doi: 10.7717/peerj.453.

[45] Apple, “Blurring an Image | Apple Developer Documentation.” https://developer.apple.com/documentation/accelerate/blurring_an_image (accessed May 31, 2021).

[46] T. Kimpe and T. Tuytschaever, “Increasing the number of gray shades in medical display systems - How much is enough?,” J. Digit. Imaging, vol. 20, no. 4, pp. 422–432, Dec. 2007, doi: 10.1007/s10278-006-1052-3.

[47] S. H. Kim, J. H. Lee, B. Ko, and J. Y. Nam, “X-ray image classification using Random Forests with Local Binary Patterns,” in 2010 International Conference on Machine Learning and Cybernetics, ICMLC 2010, 2010, vol. 6, pp. 3190–3194, doi: 10.1109/ICMLC.2010.5580711.

[48] Z. Jun, H. Jizhao, T. Zhenglan, and W. Feng, “Face detection based on LBP,” in ICEMI 2017 - Proceedings of IEEE 13th International Conference on Electronic Measurement and Instruments, Jul. 2017, vol. 2018-January, pp. 421–425, doi: 10.1109/ICEMI.2017.8265841.

[49] L. Armi and S. Fekri-Ershad, “Texture image analysis and texture classification methods - A review,” no. April, 2019, [Online]. Available: http://arxiv.org/abs/1904.06554.

[50] “Local binary patterns - Wikipedia.” https://en.wikipedia.org/wiki/Local_binary_patterns (accessed May 24, 2021).

[51] T. Ojala, M. Pietikäinen, and T. Mäenpää, “Gray scale and rotation invariant texture classification with local binary patterns,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2000, vol. 1842, pp. 404–420, doi: 10.1007/3-540-45054-8_27.

[52] P. Ongsulee, “Artificial intelligence, machine learning and deep learning,” in International Conference on ICT and Knowledge Engineering, Jan. 2018, pp. 1–6, doi: 10.1109/ICTKE.2017.8259629.

[53] A. A. Mohamed, W. A. Berg, H. Peng, Y. Luo, R. C. Jankowitz, and S. Wu, “A deep learning method for classifying mammographic breast density categories,” Med. Phys., vol. 45, no. 1, pp. 314–321, Jan. 2018, doi: 10.1002/mp.12683.

85

[54] Y. Bengio, A. Courville, and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, Jun. 2012, Accessed: Jun. 11, 2021. [Online]. Available: http://arxiv.org/abs/1206.5538.

[55] IBM Cloud Education, “What are Neural Networks? | IBM,” IBM, 2020. https://www.ibm.com/cloud/learn/neural-networks (accessed Jun. 12, 2021).

[56] “Learn Intro to Deep Learning Tutorials | Kaggle.” https://www.kaggle.com/learn/intro-to-deep-learning (accessed Jun. 13, 2021).

[57] S. Hahn and H. Choi, “Understanding dropout as an optimization trick,” Neurocomputing, vol. 398, pp. 64–70, Jul. 2020, doi: 10.1016/j.neucom.2020.02.067.

[58] J. Ren, M. Green, and X. Huang, “From traditional to deep learning: Fault diagnosis for autonomous vehicles,” in Learning Control, Elsevier, 2021, pp. 205–219.

[59] “Basic CNN Architecture: Explaining 5 Layers of Convolutional Neural Network | upGrad blog.” https://www.upgrad.com/blog/basic-cnn-architecture/ (accessed Jun. 13, 2021).

[60] “1.5. Stochastic Gradient Descent — scikit-learn 0.24.2 documentation.” https://scikit-learn.org/stable/modules/sgd.html (accessed Jun. 12, 2021).

[61] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553. Nature Publishing Group, pp. 436–444, May 27, 2015, doi: 10.1038/nature14539.

[62] L. Deng and D. Yu, “Deep learning: Methods and applications,” Foundations and Trends in Signal Processing, vol. 7, no. 3–4. Now Publishers Inc, pp. 197–387, Jun. 30, 2013, doi: 10.1561/2000000039.

[63] “Ignaciomoragues/TFE — Repository.” https://github.com/Ignaciomoragues/TFE (accessed Jun. 16, 2021).

[64] “EIC Faculty | Tecnológico de Monterrey en Guadalajara.” https://gda.itesm.mx/faculty/en/professors/gilberto-ochoa-ruiz (accessed Jun. 13, 2021).

[65] “Mammographic Image Analysis Homepage - Databases.” https://www.mammoimage.org/databases/ (accessed Jun. 08, 2021).

[66] “Breast Cancer Digital Repository.” https://bcdr.eu/ (accessed Jun. 08, 2021).

[67] A. Oliver, “Automatic mass segmentation in mammographic images, PhD Thesis,” University of Girona, 2008.

[68] “Module: transform — skimage v0.19.0.dev0 docs.” https://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.rescale (accessed May 26, 2021).

[69] “Module: feature — skimage v0.19.0.dev0 docs.” https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.graycomatrix (accessed May 26, 2021).

Report

86

[70] L. Putzu and C. Di Ruberto, “Rotation invariant co-occurrence matrix features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2017, vol. 10484 LNCS, pp. 391–401, doi: 10.1007/978-3-319-68560-1_35.

[71] Z. W. Pan, H. L. Shen, C. Li, S. J. Chen, and J. H. Xin, “Fast Multispectral Imaging by Spatial Pixel-Binning and Spectral Unmixing,” IEEE Trans. Image Process., vol. 25, no. 8, pp. 3612–3625, Aug. 2016, doi: 10.1109/TIP.2016.2576401.

[72] A. Torrent et al., “Breast Density Segmentation: A Comparison of Clustering and Region Based Techniques,” in Digital Mammography, 2008, pp. 9–16.

[73] A. Torrent et al., “Breast density segmentation: A comparison of clustering and region based techniques,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, vol. 5116 LNCS, pp. 9–16, doi: 10.1007/978-3-540-70538-3_2.

[74] M. Dias, A. Florêncio, and dirk, “omadson/fuzzy-c-means: v1.4.0,” May 2021, doi: 10.5281/ZENODO.4747689.

[75] “Fuzzy clustering - Wikipedia.” https://en.wikipedia.org/wiki/Fuzzy_clustering (accessed Jun. 04, 2021).

[76] V. Wasule and P. Sonar, “Classification of brain MRI using SVM and KNN classifier,” in Proceedings of 2017 3rd IEEE International Conference on Sensing, Signal Processing and Security, ICSSS 2017, Oct. 2017, pp. 218–223, doi: 10.1109/SSPS.2017.8071594.

[77] S. Manjunath, “Texture Features and KNN in Classification of Flower Images D S Guru,” no. November 2014, 2010.

[78] “KNN Classification using Scikit-learn - DataCamp.” https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn (accessed Jun. 05, 2021).

[79] A. C. Nusantara, E. Purwanti, and S. Soelistiono, “Classification of digital mammogram based on nearest-neighbor method for breast cancer detection,” Int. J. Technol., vol. 7, no. 1, pp. 71–77, 2016, doi: 10.14716/ijtech.v7i1.1393.

[80] “scipy.optimize.minimize — SciPy v1.6.3 Reference Guide.” https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html (accessed Jun. 06, 2021).

[81] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” 2012. Accessed: Jun. 13, 2021. [Online]. Available: http://code.google.com/p/cuda-convnet/.

[82] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2015, Accessed: Jun. 13, 2021. [Online]. Available: http://www.robots.ox.ac.uk/.

87

[83] “The Architecture and Implementation of VGG-16 – Towards AI — The Best of Tech, Science, and Engineering.” https://towardsai.net/p/machine-learning/the-architecture-and-implementation-of-vgg-16 (accessed Jun. 13, 2021).

[84] “Breast Cancer Digital Repository.” https://bcdr.eu/information/about (accessed Jun. 13, 2021).

[85] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J. Big Data, vol. 6, no. 1, pp. 1–48, Dec. 2019, doi: 10.1186/s40537-019-0197-0.

[86] “torchvision.transforms — Torchvision master documentation.” https://pytorch.org/vision/stable/transforms.html# (accessed Jun. 13, 2021).

[87] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 336–359, Oct. 2016, doi: 10.1007/s11263-019-01228-7.

[88] “Confusion matrix — scikit-learn 0.24.2 documentation.” https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html (accessed Jun. 14, 2021).

[89] “Home — Spyder IDE.” https://www.spyder-ide.org/ (accessed Jun. 10, 2021).

[90] “Sueldo: Ingeniero Junior | Glassdoor.” https://www.glassdoor.es/Sueldos/ingeniero-junior-sueldo-SRCH_KO0,16.htm (accessed Jun. 10, 2021).

[91] “Dell XPS 13 9300 Laptop Review.” https://www.notebookcheck.net/Dell-XPS-13-9300-4K-UHD-Laptop-Review-16-10-is-the-New-16-9.464337.0.html (accessed Jun. 10, 2021).

[92] “GPU NVIDIA Tesla T4 con núcleos Tensor para inferencias de IA | NVIDIA Data Center.” https://www.nvidia.com/es-es/data-center/tesla-t4/ (accessed Jun. 11, 2021).

[93] “Guia pràctica per al càlcul d’emissions de gasos amb efecte d’hivernacle (GEH) 0 GUIA PRÀCTICA PER AL CÀLCUL D’EMISSIONS DE GASOS AMB EFECTE D’HIVERNACLE (GEH).”

89

Annex A

In this annex, the preliminary results obtained with VGG-16 and the other CNN architectures tested

are attached. The best result of each one is highlighted in yellow.

AlexNet

No-DropOut

Dropout just On FC layers

0.2, 0.2, 0.2

0.5, 0.5, 0.0

0.5, 0.5, 0.2

0.5, 0.5, 0.5

0.8, 0.8, 0.5

0.8, 0.8, 0.8

Weigthed Avg Precision 0,72 0,62 0,70 0,66 0,70 0,62 Weigthed Avg Recall 0,71 0,62 0,64 0,65 0,64 0,61 Weigthed Avg F1-Score 0,71 0,61 0,63 0,64 0,63 0,59

Macro Avg ROC 0,80 0,74 0,75 0,76 0,75 0,74

Vgg16

No-DropOut

Dropout just On FC layers Original

Dropout on FC layers and:

0.2, 0.2, 0.2

0.0, 0.5, 0.5

0.5, 0.5, 0.2

0.5, 0.5, 0.5

0.8, 0.8, 0.2

0.8, 0.8, 0.8

25 % on

Conv

50% on

Conv

Weigthed Avg Precision 0,79 0,72 0,71 0,70 0,66

Weigthed Avg Recall 0,74 0,69 0,65 0,69 0,64

Weigthed Avg F1-Score 0,73 0,68 0,64 0,69 0,61

Macro Avg ROC 0,81 0,78 0,75 0,79 0,74

Inception

No-DropOut


0.2, 0.2, 0.2

0.0, 0.5, 0.0

0.5, 0.5, 0.2

0.5, 0.5, 0.5

0.8, 0.8, 0.2

0.8, 0.8, 0.8

Weigthed Avg Precision 0,65 0,66 0,64 0,61 0,69 Weigthed Avg Recall 0,61 0,63 0,63 0,60 0,67 Weigthed Avg F1-Score 0,58 0,62 0,63 0,58 0,67

Macro Avg ROC 0,73 0,75 0,75 0,73 0,77

Annexes

90

ResNet50

No-DropOut


0.2, 0.2, 0.2

0.0, 0.5, 0.5

0.5, 0.5, 0.2

0.5, 0.5, 0.5

0.8, 0.8, 0.5

0.8, 0.8, 0.8




Macro Avg ROC 0,73 0,72 0,74 0,80 0,75

DenseNet121

No-DropOut


0.2, 0.2, 0.2

0.5, 0.5, 0.0

0.5, 0.5, 0.2

0.5, 0.5, 0.5

0.8, 0.8, 0.5

0.8, 0.8, 0.8




Macro Avg ROC 0,72 0,77 0,74 0,81 0,71

91

Annex B

The global confusion matrices and the binned ones of the deep learning approach trained with AUG1

and AUG2 are attached in this annex.


Figure 0.2. Binned Confusion matrix of the DL approach (AUG1).

Annexes

92


Figure 0.4. Binned Confusion matrix of the DL approach (AUG2).

COMPARISON BETWEEN MACHINE LEARNING AND DEEP …

Documents

Transcript of COMPARISON BETWEEN MACHINE LEARNING AND DEEP …