Anomaly Detection in Images - polimi.it

246
Anomaly Detection in Images Giacomo Boracchi, Politecnico di Milano, DEIB. http://home.deib.polimi.it/boracchi/ Diego Carrera , System Research and Applications, STMicroelectronics, Agrate Brianza October 25 th , 2020 ICIP 2020, Abu Dhabi, United Arab Emirates

Transcript of Anomaly Detection in Images - polimi.it

Page 1: Anomaly Detection in Images - polimi.it

Anomaly Detection in ImagesGiacomo Boracchi,

Politecnico di Milano, DEIB.

http://home.deib.polimi.it/boracchi/

Diego Carrera,

System Research and Applications, STMicroelectronics, Agrate Brianza

October 25th, 2020

ICIP 2020, Abu Dhabi, United Arab Emirates

Page 2: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GIACOMO BORACCHI

Mathematician (UniversitΓ  Statale degli Studi di Milano 2004),

PhD in Information Technology (DEIB, Politecnico di Milano 2008)

Associate Professor since 2019 at DEIB (Computer Science), Polimi

Research Interests are mathematical and statistical methods for:

β€’ Image / Signal analysis and processing

β€’ Unsupervised learning, change / anomaly detection

Page 3: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DIEGO CARRERA

Mathematician (UniversitΓ  Statale degli Studi di Milano 2013),

PhD in Information Technology (DEIB, Politecnico di Milano 2019)

Researcher at STMicroelectronics since 2019

Research Interests are mainly focus on:

β€’ Change detection in high dimensional datastreams

β€’ Anomaly detection in signal and images

β€’ Unsupervised learning algorithms

Page 4: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION IN HEALTH

Mammography

Mam

mog

ram

s

James Heilman, MD / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

Sato et al, A primitive study on unsupervised anomaly detection with an autoencoder in emergency head CT volumes, SPIE Medical Imaging, 2018

Page 5: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020https://www.mvtec.com/company/research/datasets/mvtec-ad/

ANOMALY DETECTION FOR AUTOMATIC QUALITY CONTROL

Page 6: Anomaly Detection in Images - polimi.it

Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472

Our Running Example: Monitoring Nanofiber Production

Page 7: Anomaly Detection in Images - polimi.it

Our Running Example: Monitoring Nanofiber Production

Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472

Page 8: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SYLICON WAFER MANUFACTURING

Defects detected as anomalies in microscope images

Page 9: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Not only images

Page 10: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Di Bella, Carrera, Rossi, Fragneto, Boracchi Wafer Defect Map Classification Using Sparse Convolutional Neural Networks ICIAP09

DETECTION OF ANOMALOUS PATTERNS

Detect/Identify patterns in wafer defect maps

These might indicate faults,problems or malfunctioning

in the chip production.

Page 11: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

AUTOMATIC AND LONG TERM EGC MONITORING

Health monitoring / wearable devices:

Automatically analyze EGC tracings to detect arrhythmias or incorrect device positioning

D. Carrera, B. Rossi, D. Zambon, P. Fragneto, and G. Boracchi "ECG Monitoring in Wearable Devices by Sparse Models" in Proceedings of ECML-PKDD 2016, 16 pages

Page 12: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020USCD Anomaly Dataset http://www.svcl.ucsd.edu/projects/anomaly/dataset.html

ANOMALOUS ACTIVITIES DETECTION IN VIDEOS

Page 13: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

FRAUD DETECTION IN CREDIT CARD TRANSACTIONS

Dal Pozzolo A., Boracchi G., Caelen O., Alippi C. and Bontempi G., β€œCredit Card Fraud Detection: a Realistic Modeling and a Novel Learning Strategy” , IEEE TNNL 2017, 14 pages

Page 14: Anomaly Detection in Images - polimi.it

Tutorial Outline

Page 15: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

PRESENTATION OUTLINE

Part 1, Problem Formulation and Statistical solutions:

β€’ Problem formulation

β€’ Performance measures

β€’ Most relevant approaches

Part 2, Anomaly detection in images:

β€’ The general approach

β€’ Subspace / Feature extraction methods

β€’ Reference-based methods

Part 3, Anomaly detection by deep learning models:

β€’ Transfer Learning / Self-supervised

β€’ Autoencoders

β€’ Domain based, Generative models

Page 16: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DISCLAIMERS

We will refer to either anomalies/changes according to our personal experience with a primary focus on the applications we have been addressing.

For a complete overview on anomaly algorithms please refer to surveys below.

We will consider unsupervised and semi-supervised approaches, as these better conform with anomaly detection settings

T. Ehret, A. Davy, JM Morel, M. Delbracio "Image Anomalies: A Review and Synthesis of Detection Methods", Journal of Mathematical Imaging and Vision, 1-34

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages.

Pimentel, M. A., Clifton, D. A., Clifton, L., Tarassenko, L. β€œA review of novelty detection” Signal Processing, 99, 215-249 (2014)

A. Zimek, E. Schubert, H.P. Kriegel. β€œA survey on unsupervised outlier detection in high-dimensional numerical data”Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), 2012.

Page 17: Anomaly Detection in Images - polimi.it

The Problem FormulationAnomaly Detection in Images

Page 18: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALIES

β€œAnomalies are patterns in data that do not conform to a well defined notion of normal behavior”

Thus:

β€’ Normal data are generated from a stationary process 𝒫𝒫𝑁𝑁‒ Anomalies are from a different process 𝒫𝒫𝐴𝐴 β‰  𝒫𝒫𝑁𝑁

Examples:

β€’ Frauds in the stream of all the credit card transactions

β€’ Arrhythmias in ECG tracings

β€’ Defective regions in an image, which do not conform a reference pattern

Anomalies might appear as spurious elements, and are typically the most informative samples in the stream

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages.

Page 19: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

PROBLEM FORMULATION: ANOMALY DETECTION IN IMAGES

Let 𝑠𝑠 be an image defined over the pixel domain 𝒳𝒳 βŠ‚ β„€2,

let 𝑐𝑐 ∈ 𝒳𝒳 be a pixel and 𝑠𝑠 𝑐𝑐 the corresponding intesnity.

Our goal is to locate any anomalous region in 𝑠𝑠, i.e. estimating the unknown anomaly mask 𝛀𝛀 defined as

if 𝑐𝑐 falls in a normal regionif 𝑐𝑐 falls in an anomalous region

Ξ© 𝑐𝑐 = οΏ½01

𝑠𝑠

Page 20: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

PROBLEM FORMULATION: ANOMALY DETECTION IN IMAGES

Let 𝑠𝑠 be an image defined over the pixel domain 𝒳𝒳 βŠ‚ β„€2,

let 𝑐𝑐 ∈ 𝒳𝒳 be a pixel and 𝑠𝑠 𝑐𝑐 the corresponding intesnity.

Our goal is to locate any anomalous region in 𝑠𝑠, i.e. estimating the unknown anomaly mask 𝛀𝛀 defined as

if 𝑐𝑐 falls in a normal regionif 𝑐𝑐 falls in an anomalous region

Ξ© 𝑐𝑐 = οΏ½01

Ω𝑠𝑠

Page 21: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

PATCH-WISE ANOMALY DETECTION

The goal not determining whether the whole image is normal or anomalous, but locate/segment possible anomalies

Therefore, it is convenient to

1. Analyze the image patch-wise

2. Isolate regions containingpatches that are detected asas anomalies

Page 22: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

PATCH-WISE ANOMALY DETECTION

The goal not determining whether the whole image is normal or anomalous, but locate/segment possible anomalies

Therefore, it is convenient to

1. Analyze the image patch-wise

2. Isolate regions containingpatches that are detected asas anomalies

Page 23: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

TYPICAL ASSUMPTIONS

A training set 𝑇𝑇𝑇𝑇 is provided for configuring the AD algorithm

Depending on the algorithm

β€’ Only normal images -> semi-supervised methods

β€’ Unlabeled images -> unsupervised methods

β€’ Annotated images -> supervised methods

𝑇𝑇𝑇𝑇

Page 24: Anomaly Detection in Images - polimi.it

Anomaly Detection in StatisticsAnomaly Detection Problem where observations are i.i.d.

realizations of a random variable

Page 25: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Forget about images for a while and look for anomalies in a set of

random vectors (or in a datastream)

… these techniques will come veryhandy also for images

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 26: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY-DETECTION IN A STATISTICAL FRAMEWORK

Often, the anomaly-detection problem boils down to:

Monitor a set of data (not necessarily a stream)

𝒙𝒙 𝑑𝑑 , 𝑑𝑑 = 𝑑𝑑0, … , 𝒙𝒙 𝑑𝑑 ∈ ℝ𝑑𝑑

where π‘₯π‘₯(𝑑𝑑) are realizations of a random variable having pdf πœ™πœ™π‘œπ‘œ,and detect outliers i.e., those points that do not conform with πœ™πœ™π‘œπ‘œ

𝒙𝒙 𝑑𝑑 ∼ οΏ½πœ™πœ™0 normal dataπœ™πœ™1 anomalies ,

𝑑𝑑

𝒙𝒙(𝑑𝑑) ……

πœ™πœ™1πœ™πœ™0 πœ™πœ™0

Page 27: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY-DETECTION IN A STATISTICAL FRAMEWORK

Often, the anomaly-detection problem boils down to:

Monitor a set of data (not necessarily a stream)

𝒙𝒙 𝑑𝑑 , 𝑑𝑑 = 𝑑𝑑0, … , 𝒙𝒙 𝑑𝑑 ∈ ℝ𝑑𝑑

where π‘₯π‘₯(𝑑𝑑) are realizations of a random variable having pdf πœ™πœ™π‘œπ‘œ,and detect outliers i.e., those points that do not conform with πœ™πœ™π‘œπ‘œ

𝒙𝒙 𝑑𝑑 ∼ οΏ½πœ™πœ™0 normal dataπœ™πœ™1 anomalies ,

𝑑𝑑

𝒙𝒙(𝑑𝑑) ……

πœ™πœ™1πœ™πœ™0 πœ™πœ™0

Page 28: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY-DETECTION IN A STATISTICAL FRAMEWORK

Often, the anomaly-detection problem boils down to:

Monitor a set of data (not necessarily a stream)

𝒙𝒙 𝑑𝑑 , 𝑑𝑑 = 𝑑𝑑0, … , 𝒙𝒙 𝑑𝑑 ∈ ℝ𝑑𝑑

where π‘₯π‘₯(𝑑𝑑) are realizations of a random variable having pdf πœ™πœ™π‘œπ‘œ,and detect outliers i.e., those points that do not conform with πœ™πœ™π‘œπ‘œ

𝒙𝒙 𝑑𝑑 ∼ οΏ½πœ™πœ™0 normal dataπœ™πœ™1 anomalies ,

𝑑𝑑

𝒙𝒙(𝑑𝑑) ……

πœ™πœ™1πœ™πœ™0 πœ™πœ™0

Locate those samples that do not conform the normal ones or a model explaining normal ones

Page 29: Anomaly Detection in Images - polimi.it

Statistical Approaches..to detect anomalies

Page 30: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL SOLUTIONS

Most algorithms are composed of:

β€’ A statistic that has a known response to normal data (e.g., the average, the sample variance, the log-likelihood, the confidence of a classifier, an β€œanomaly score”…)

β€’ A decision rule to analyze the statistic (e.g., an adaptive threshold, a confidence region)

Page 31: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL SOLUTIONS

𝑑𝑑

𝒙𝒙(𝑑𝑑) ……

data

Page 32: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL SOLUTIONS

𝑑𝑑

𝒙𝒙(𝑑𝑑) ……

𝑑𝑑

𝑆𝑆(𝒙𝒙)

…

𝛾𝛾statistic

data

Page 33: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL SOLUTIONS

𝑑𝑑

𝒙𝒙(𝑑𝑑) ……

𝑑𝑑

𝑆𝑆(𝒙𝒙)

…

𝛾𝛾statistic

decision rule: 𝑆𝑆 𝒙𝒙 > 𝛾𝛾

data

Page 34: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL SOLUTIONS

𝑑𝑑

𝒙𝒙(𝑑𝑑) ……

𝑑𝑑

𝑆𝑆(𝒙𝒙)

…

𝛾𝛾statistic

decision rule: 𝑆𝑆 𝒙𝒙 > 𝛾𝛾

data

Statistics and decision rules are β€œone-shot”, analyzing a set of historical data or each new data (or chunk) independently

Page 35: Anomaly Detection in Images - polimi.it

Performance MeasuresAssessing performance of anomaly detection algorithms

Page 36: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY-DETECTION PERFORMANCE

Anomaly detection performance:

β€’ True positive rate: 𝑇𝑇𝑇𝑇𝑇𝑇 = # anomalies detected# anomalie𝑠𝑠

β€’ False positive rate: 𝐹𝐹𝑇𝑇𝑇𝑇 = # normal samples detected# normal samples

You have probably also heard of

β€’ False negative rate (or miss-rate): 𝐹𝐹𝐹𝐹𝑇𝑇 = 1 βˆ’ 𝑇𝑇𝑇𝑇𝑇𝑇‒ True negative rate (or specificity): 𝑇𝑇𝐹𝐹𝑇𝑇 = 1 βˆ’ 𝐹𝐹𝑇𝑇𝑇𝑇

β€’ Precision on anomalies: # anomalies detected

# detections

β€’ Recall on anomalies (or sensitivity, hit-rate): 𝑇𝑇𝑇𝑇𝑇𝑇

Page 37: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

TPR / FPR TRADE-OFF

There is always a trade-off between 𝑻𝑻𝑻𝑻𝑻𝑻 and 𝑭𝑭𝑻𝑻𝑻𝑻 (and similarly for derived quantities), which is ruled by algorithm parameters

By changing 𝛾𝛾 performance changes (e.g. true positive increases but also false positives do)

𝑑𝑑

𝑆𝑆(𝒙𝒙)

…

𝛾𝛾statistic

decision rule: 𝑆𝑆 𝒙𝒙 > 𝛾𝛾

Page 38: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

TPR / FPR TRADE-OFF

There is always a trade-off between 𝑻𝑻𝑻𝑻𝑻𝑻 and 𝑭𝑭𝑻𝑻𝑻𝑻 (and similarly for derived quantities), which is ruled by algorithm parameters

By changing 𝛾𝛾 performance changes (e.g. true positive increases but also false positives do)

𝑑𝑑

𝑆𝑆(𝒙𝒙)

…𝛾𝛾

statisticdecision rule: 𝑆𝑆 𝒙𝒙 > 𝛾𝛾

Page 39: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY-DETECTION PERFORMANCE

There is always a trade-off between 𝑻𝑻𝑻𝑻𝑻𝑻 and 𝑭𝑭𝑻𝑻𝑻𝑻 (and similarly for derived quantities), which is ruled by algorithm parameters

Thus, to correctly assess performance it is necessary to consider at least two indicators (e.g., 𝑇𝑇𝑇𝑇𝑇𝑇,𝐹𝐹𝑇𝑇𝑇𝑇)

Indicators combining both 𝑇𝑇𝑇𝑇𝑇𝑇 and 𝐹𝐹𝑇𝑇𝑇𝑇:

Accuracy = # anomalies detected + # normal samples not detected

# samples

F1 score = 2# anomalies detected

# detections + # anomalies

These equal 1 in case of β€œideal detector” which detects all the anomalies and has no false positives

Page 40: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY-DETECTION PERFORMANCE

Comparing different methods might be tricky since we have to make sure that both have been configured in their best conditions

Testing a large number of parameters lead to the ROC (receiver operating characteristic) curve

The ideal detector would achieve:

β€’ 𝐹𝐹𝑇𝑇𝑇𝑇 = 0%,β€’ 𝑇𝑇𝑇𝑇𝑇𝑇 = 100%

Thus, the closer to (0,1) the better

The largest the Area Under the Curve (AUC), the better

The optimal parameter is the oneyielding the point closest to (0,1)

(𝐹𝐹𝑇𝑇𝑇𝑇,𝑇𝑇𝑇𝑇𝑇𝑇) for a specific parameter

Page 41: Anomaly Detection in Images - polimi.it

Statitsical Approaches to Detect Anomalies

…when πœ™πœ™0 and πœ™πœ™1 are unknown

Page 42: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION WHEN π“π“πŸŽπŸŽ AND π“π“πŸπŸ ARE UNKNOWN

Most often, only a training set 𝑇𝑇𝑇𝑇 is provided:

There are three scenarios:

β€’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in 𝑇𝑇𝑇𝑇.

β€’ Unsupervised: 𝑇𝑇𝑇𝑇 is provided without label.

β€’ Supervised: Both normal and anomalous training data are provided in 𝑇𝑇𝑇𝑇.

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 43: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION WHEN π“π“πŸŽπŸŽ AND π“π“πŸπŸ ARE UNKNOWN

Most often, only a training set 𝑇𝑇𝑇𝑇 is provided:

There are three scenarios:

β€’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in 𝑇𝑇𝑇𝑇.

β€’ Unsupervised: 𝑇𝑇𝑇𝑇 is provided without label.

β€’ Supervised: Both normal and anomalous training data are provided in 𝑇𝑇𝑇𝑇.

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 44: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED ANOMALY DETECTION

In semi-supervised methods the 𝑇𝑇𝑇𝑇 is composed of normal data

𝑇𝑇𝑇𝑇 = π‘₯π‘₯ 𝑑𝑑 , π‘₯π‘₯ ∼ πœ™πœ™0 and 𝑑𝑑 < 𝑑𝑑0,

Very practical assumptions:

β€’ Normal data are easy to gather and the vast majority

β€’ Anomalous data are difficult/costly to collect/select and it would be difficult to gather a representative training set

β€’ Training examples in 𝑇𝑇𝑇𝑇 might not be representative of all the possible anomalies that can occur

All in all, it is often safer to detect any data departing from the normal conditions

Semi-supervised anomaly-detection methods are also referred to as novelty-detection methods

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Pimentel, M. A., Clifton, D. A., Clifton, L., Tarassenko, L. β€œA review of novelty detection” Signal Processing, 99, 215-249 (2014)

Page 45: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DENSITY-BASED METHODS

Density-Based Methods: Normal data occur in high probability regions of a stochastic model, while anomalies occur in the low probability regions of the model

During training: οΏ½πœ™πœ™0 can be estimated from the training set

𝑇𝑇𝑇𝑇 = π‘₯π‘₯ 𝑑𝑑 , π‘₯π‘₯ ∼ πœ™πœ™0 and 𝑑𝑑 < 𝑑𝑑0,β€’ parametric models (e.g., Gaussian mixture models)

β€’ nonparametric models (e.g. KDE, histograms)

During testing:

β€’ Anomalies are detected as data yielding οΏ½πœ™πœ™0 𝒙𝒙 < πœ‚πœ‚

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 46: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DENSITY-BASED METHODS: MONITORING THE LOG-LIKELIHOOD

Monitoring the log-likelihood of data w.r.t οΏ½πœ™πœ™0 allow to address anomaly-detection problems in multivariate data

1. During training, estimate οΏ½πœ™πœ™0 from 𝑇𝑇𝑇𝑇

2. During testing, compute

β„’ 𝒙𝒙 𝑑𝑑 = log( οΏ½πœ™πœ™0(𝒙𝒙(𝑑𝑑)))

3. Monitor β„’ 𝒙𝒙 𝑑𝑑 , 𝑑𝑑 = 1, …

𝑑𝑑

𝒙𝒙(𝑑𝑑)

ℒ𝒙𝒙𝑑𝑑

………

Page 47: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DENSITY-BASED METHODS: MONITORING THE LOG-LIKELIHOOD

Monitoring the log-likelihood of data w.r.t οΏ½πœ™πœ™0 allow to address anomaly-detection problems in multivariate data

1. During training, estimate οΏ½πœ™πœ™0 from 𝑇𝑇𝑇𝑇

2. During testing, compute

β„’ 𝒙𝒙 𝑑𝑑 = log( οΏ½πœ™πœ™0(𝒙𝒙(𝑑𝑑)))

3. Monitor β„’ 𝒙𝒙 𝑑𝑑 , 𝑑𝑑 = 1, …

This is quite a popular approach in either anomaly and change detection algorithms

L. I. Kuncheva, β€œChange detection in streaming multivariate data using likelihood detectors," IEEE Transactions on Knowledgeand Data Engineering, vol. 25, no. 5, 2013.

X. Song, M. Wu, C. Jermaine, and S. Ranka, β€œStatistical change detection for multidimensional data," in Proceedings ofInternational Conference on Knowledge Discovery and Data Mining (KDD), 2007.

J. H. Sullivan and W. H. Woodall, β€œChange-point detection of mean vector or covariance matrix shifts using multivariateindividual observations," IIE transactions, vol. 32, no. 6, 2000.

C. Alippi, G. Boracchi, D. Carrera, M. Roveri, "Change Detection in Multivariate Datastreams: Likelihood and DetectabilityLoss" IJCAI 2016, New York, USA, July 9 - 13

Page 48: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DENSITY-BASED METHODS

Advantages:

β€’ οΏ½πœ™πœ™0 𝒙𝒙 indicates how safe a detection is (like a p-value)

β€’ If the density estimation process is robust to outliers, it is possible to tolerate few anomalous samples in 𝑇𝑇𝑇𝑇

β€’ in relatively small dimensions, you might use non-parametric models like histograms

Challenges:

β€’ It is challenging to fit models for high-dimensional data

β€’ Histograms traditionally suffer of curse of dimensionality when 𝑑𝑑 increases

β€’ Often the 1D histograms of the marginals are monitored, ignoring the correlations among components

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 49: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DOMAIN-BASED METHODS

Domain-based methods: Estimate a boundary around normal data, rather than the density of normal data.

A drawback of density-estimation methods is that they are meant to be accurate in high-density regions, while anomalies live in low-density ones.

One-Class SVM are domain-based methods defined by the normal samples at the periphery of the distribution.

SchΓΆlkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., Platt, J. C. "Support Vector Method for Novelty Detection". In NIPS 1999 (Vol. 12, pp. 582-588).

Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)

Pimentel, M. A., Clifton, D. A., Clifton, L., Tarassenko, L. β€œA review of novelty detection” Signal Processing, 99, 215-249 (2014)

Page 50: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SchΓΆlkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., Platt, J. C. "Support Vector Method for Novelty Detection". In NIPS 1999 (Vol. 12, pp. 582-588).

ONE-CLASS SVM (SCHΓ–LKOPF ET AL. 1999)

Idea: define boundaries by estimating a binary function 𝑓𝑓 that captures regions of the input space where density is higher.

As in support vector methods, 𝑓𝑓 is defined in the feature space 𝐹𝐹 and decision boundaries are defined by a few support vectors (i.e., a few normal data).

Let πœ“πœ“(𝒙𝒙) the feature associated to 𝒙𝒙, 𝑓𝑓 is defined as

𝑓𝑓 𝒙𝒙 = sign(< 𝑀𝑀,πœ“πœ“ 𝒙𝒙 > βˆ’πœŒπœŒ)

Where the hyperplane parameters 𝑀𝑀,𝜌𝜌 are optimized to yield a function that is positive on most training samples. Thus in the feature space, normal points can be separated from the origin.

A linear separation in the feature space corresponds to a variety of nonlinear boundaries in the space of 𝒙𝒙.

Page 51: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)

ONE-CLASS SVM (TAX AND DUIN 1999)

Boundaries of normal region can be also defined by an hypersphere that, in the feature space, encloses most of the normal data i.e., πœ“πœ“(𝒙𝒙) for 𝒙𝒙 ∈ 𝑇𝑇𝑇𝑇.

Similar detection formulas hold, measuring the distance in the feature space from the sphere center.

The sphere center can be defined in terms of support vectors.

Remarks: In both one-class approaches, the amount of samples that falls within the margin (outliers) is controlled by regularization parameters.

This parameter regulates the number of outliers in the training set and the detector sensitivity.

Page 52: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION WHEN π“π“πŸŽπŸŽ AND π“π“πŸπŸ ARE UNKNOWN

Most often, only a training set 𝑇𝑇𝑇𝑇 is provided:

There are three scenarios:

β€’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in 𝑇𝑇𝑇𝑇.

β€’ Unsupervised: 𝑇𝑇𝑇𝑇 is provided without label.

β€’ Supervised: Both normal and anomalous training data are provided in 𝑇𝑇𝑇𝑇.

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 53: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

UNSUPERVISED ANOMALY-DETECTION

The training set 𝑇𝑇𝑇𝑇 might contain both normal and anomalous data. However, no labels are provided

𝑇𝑇𝑇𝑇 = π‘₯π‘₯ 𝑑𝑑 , 𝑑𝑑 < 𝑑𝑑0Underlying assumption: anomalies are rare w.r.t. normal data 𝑇𝑇𝑇𝑇

In principle:

β€’ Density/Domain based methods that are robust to outliers can be applied in an unsupervised scenario

β€’ Unsupervised methods can be improved whenever labels are available

Page 54: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DISTANCE-BASED METHODS

Distance-based methods: normal data fall in dense neighborhoods, while anomalies are far from their closest neighbors.

A critical aspect is the choice of the similarity measure to use.

Anomalies are detected by monitoring:

β€’ distance between each data and its π’Œπ’Œ βˆ’nearest neighbor

β€’ the density of each data relatively to its neighbors

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages

Zhao, M., Saligrama, V. β€œAnomaly detection with score functions based on nearest neighbor graphs”. NIPS 2009

A. Zimek, E. Schubert, H. Kriegel. β€œA survey on unsupervised outlier detection in high-dimensional numerical data” SADM 2012

Page 55: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DISTANCE-BASED METHODS

Distance-based methods: normal data fall in dense neighborhoods, while anomalies are far from their closest neighbors.

A critical aspect is the choice of the similarity measure to use.

Anomalies are detected by monitoring:

β€’ distance between each data and its π’Œπ’Œ βˆ’nearest neighbor

β€’ the above distance considered relatively to neighbors

β€’ whether they do not belong to clusters, or are at the cluster periphery, or belong to small and sparse clusters

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 56: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DISTANCE-BASED METHODS

Distance-based methods: normal data fall in dense neighborhoods, while anomalies are far from their closest neighbors.

A critical aspect is the choice of the similarity measure to use.

Anomalies are detected by monitoring:

β€’ distance between each data and its π’Œπ’Œ βˆ’nearest neighbor

β€’ the above distance considered relatively to neighbors

β€’ whether they do not belong to clusters, or are at the cluster periphery, or belong to small and sparse clusters

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 57: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Page 58: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Randomly choose1. a component π‘₯π‘₯𝑖𝑖

Page 59: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Randomly choose1. a component π‘₯π‘₯𝑖𝑖

Page 60: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Randomly choose1. a component π‘₯π‘₯𝑖𝑖2. a value in the range of

projections of 𝑇𝑇𝑇𝑇 over the 𝑖𝑖-th component

This yields a splitting

Page 61: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

This yields a splittingcriteria

Page 62: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Repeat the procedure on each node:

Page 63: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Repeat the procedure on each node:

Randomly selecta component

Page 64: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Repeat the procedure on each node:

Randomly selecta component and

a cut point

Page 65: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Randomly choosea component and a value within the range and definea splitting criteria

Page 66: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Repeat the procedure on the

nodes:Randomly selecta component and

a cut point

Page 67: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"

This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure

Anomalies lies in leaves close to

the root.

Page 68: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

An anomalous point (π‘₯π‘₯0) can be easily isolated

Genuine points (π‘₯π‘₯𝑖𝑖) are instead difficult to isolate.

Page 69: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Anomalies

{IFOR [ … ]

π‘₯π‘₯0

Page 70: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST

Normal data

{IFOR [ … ]

π‘₯π‘₯𝑖𝑖

Page 71: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST: TESTING

Compute 𝐸𝐸 β„Ž 𝒙𝒙 , the average path length among all the trees in the forest, of a test sample 𝒙𝒙

𝒙𝒙

Page 72: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008

ISOLATION FOREST: TESTING

A test sample is identified as anomalous when:

π’œπ’œ 𝒙𝒙 = 2βˆ’πΈπΈ β„Ž 𝒙𝒙𝑐𝑐 𝑛𝑛 > 𝛾𝛾

β€’ 𝑛𝑛 : number of sessions in 𝑇𝑇𝑇𝑇

β€’ 𝑐𝑐 𝑛𝑛 : average path length of unsuccessful search in Binary

Page 73: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION WHEN π“π“πŸŽπŸŽ AND π“π“πŸπŸ ARE UNKNOWN

Most often, only a training set 𝑇𝑇𝑇𝑇 is provided:

There are three scenarios:

β€’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in 𝑇𝑇𝑇𝑇.

β€’ Unsupervised: 𝑇𝑇𝑇𝑇 is provided without label.

β€’ Supervised: Both normal and anomalous training data are provided in 𝑇𝑇𝑇𝑇.

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 74: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020T. Ehret, A. Davy, JM Morel, M. Delbracio "Image Anomalies: A Review and Synthesis of Detection Methods", Journal of Mathematical Imaging and Vision, 1-34

SUPERVISED ANOMALY DETECTION – DISCLAIMER

Most papers and reviews agree that supervised methods have not to be considered part of anomaly detection, because:

β€’ Anomalies in general lacks of a statistical coherence

β€’ Not (enough) training samples are provided for anomalies

However,

β€’ Some supervised problems are often referred to as Β«detectionΒ», in case of severe class imbalance (e.g. fraud detection)

β€’ Supervised models can be transferred in unsupervised settings, in particular for deep learning

Page 75: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUPERVISED ANOMALY DETECTION - SOLUTIONS

In supervised methods training data are annotated and divided in normal (+) and anomalies (βˆ’) :

𝑇𝑇𝑇𝑇 = 𝒙𝒙(𝑑𝑑),𝑦𝑦(𝑑𝑑) ,𝒙𝒙 ∈ ℝ𝑑𝑑 ,𝑦𝑦 ∈ +,βˆ’ and 𝑑𝑑 < 𝑑𝑑0Solution:

β€’ Train a two-class classifier to distinguish normal vs anomalous data.

During training:

β€’ Train a classifier 𝒦𝒦 from 𝑇𝑇𝑇𝑇.

During testing:

β€’ Compute the classifier output 𝒦𝒦 𝒙𝒙 , or

β€’ Set a threshold on the posterior 𝑝𝑝𝒦𝒦 βˆ’ 𝒙𝒙 , or

β€’ Select the π‘˜π‘˜ βˆ’most likely anomalies

Page 76: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUPERVISED ANOMALY DETECTION – CHALLENGES

These classification problems are challenging because these anomaly-detection settings typically imply:

β€’ Class Imbalance: Normal data far outnumber anomalies

β€’ Concept Drift: Anomalies might evolve over time, thus the few annotated anomalies might not be representative of anomalies occurring during operations

β€’ Selection Bias: Training samples are typically selected through a closed-loop and biased procedure. Often only detected anomalies are annotated, and the vast majority of the stream remain unsupervised. This biases the selection of training samples.

Dal Pozzolo A., Boracchi G., Caelen O., Alippi C. and Bontempi G., β€œCredit Card Fraud Detection: a Realistic Modeling and a Novel Learning Strategy” , IEEE TNNL 2017, 14 pages

Page 77: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

… ANOMALY DETECTION IN CREDIT CARD TRANSACTIONS

Fraud detection in credit card transactions/web sessions

Dal Pozzolo A., Boracchi G., Caelen O., Alippi C. and Bontempi G., β€œCredit Card Fraud Detection: a Realistic Modeling and a Novel Learning Strategy” , IEEE TNNL 2017, 14 pages

Page 78: Anomaly Detection in Images - polimi.it

Let’s go back to images now…Applying statistical methods to image patches

Page 79: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

OUR RUNNING EXAMPLE

Goal: Automatically measure area covered by defects

Page 80: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION IN IMAGES

The goal not determining whether the whole image is normal or anomalous, but locate/segment possible anomalies

Therefore, it is convenient to

1. Analyze the image patch-wise

2. Isolate regions containingpatches that are detected asas anomalies

Normal patches

Anomalous patches

Page 81: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Du, B., Zhang, L.: Random-selection-based anomaly detector for hyperspectral imagery. IEEE Transactions on Geoscience and Remote sensing

DENSITY-BASED APPROACH ON IMAGE PATCHES

A density-based approach to AD would be:

Training

i. Split the normal image in patches 𝒔𝒔

ii. Fit a statistical model οΏ½πœ™πœ™0 = 𝒩𝒩 πœ‡πœ‡, Ξ£ describing normal patches.

Testing

i. Split the test image in patches

ii. Compute οΏ½πœ™πœ™0(𝒔𝒔) the likelihood of each test patch 𝒔𝒔

iii. Detect anomalies by thresholding the likelihood

Page 82: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Du, B., Zhang, L.: Random-selection-based anomaly detector for hyperspectral imagery. IEEE Transactions on Geoscience and Remote sensing

X Xie, M Mirmehdi β€œTexture exemplars for defect detection on random texturesβ€œ – ICPR 2005

THE LIMITATIONS OF THE RANDOM VARIABLE MODEL

This model is rarely accurate on natural images.

Small patches (e.g. 2 Γ— 2 or 5 Γ— 5) are typically preferred

The model becomes even more unfit as the patch size

increases

Page 83: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

REAL WORLD DETECTION PROBLEMS

Random variable model does not successfully apply to signals or images (not even small portions)

Stacking each patch/signal 𝒔𝒔 ∈ ℝ𝑑𝑑 in a vector 𝒙𝒙 is not convenient:

β€’ Data dimension 𝑑𝑑 becomes huge

β€’ Strong correlations among components, difficult to directly model by a probability density function πœ™πœ™0

Page 84: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE LIMITATIONS OF THE RANDOM VARIABLE MODEL

Distribution of adjacent pixel values inside a patch:

Page 85: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE LIMITATIONS OF THE RANDOM VARIABLE MODEL

Distribution of adjacent pixel values inside a patch:

Data are clearly correlated in space, and are difficult to model by a smooth density function (e.g., Gaussians)

The random variable model is not very appropriate for describing images

Page 86: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Bengio, Courville, Vincent, Representation Learning: A Review and New Perspectives, IEEE Pattern Analysis and Machine Intelligence β€˜ 2013

THE LIMITATIONS OF THE RANDOM VARIABLE MODEL

Patches from natural images leave close to a low dimensional manifold

These means that patches can be well described by few latent variables

Page 87: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

A SIMPLE EXPERIMENT

Let’s approximate this manifold with the simplest one: a linear subspace

In practice, we compute the PCA of training patches and consider the PCA score as latent variables

Page 88: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

A SIMPLE EXPERIMENT

Distribution of first 6 PCA coefficients:

Page 89: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

A SIMPLE EXPERIMENT

We fit a οΏ½πœ™πœ™0 = 𝒩𝒩 πœ‡πœ‡, Ξ£ on PCA components to describe normal patches, and perform anomaly detection

Page 90: Anomaly Detection in Images - polimi.it

Part 2: Anomaly detection in imagesOut of the β€œRandom Variable” World:

signal-based models for images

Page 91: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL APPROACH

Most of the considered methods

1. Estimate a model describing normal data (background model)

2. Use the background model to provide, for each test patch, an anomaly score, or measure of rareness

3. Apply a decision rule to the anomaly score to detect anomalies (typically thresholding)

4. [optional] Perform post-processing operations to enforce smooth detections and avoid isolated pixels that are not consistent with neighbourhoods

Remark: Statistical-based approaches seen before uses as background model the statistical distribution οΏ½πœ™πœ™0 and a statistic as anomaly score

Page 92: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL APPROACH

Most of the considered methods

1. Estimate a model describing normal data (background model)

2. Use the background model to provide, for each test patch, an anomaly score, or measure of rareness

3. Apply a decision rule to the anomaly score to detect anomalies (typically thresholding)

4. [optional] Perform post-processing operations to enforce smooth detections and avoid isolated pixels that are not consistent with neighbourhoods

Remark: Statistical-based approaches seen before uses as background model the statistical distribution οΏ½πœ™πœ™0 and a statistic as anomaly score

The background model is used to bring an image patch into the

β€œrandom variable world”

Page 93: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE TYPICAL APPROACH

Most of the considered methods

1. Estimate a model describing normal data (background model)

2. Use the background model to provide, for each test patch, an anomaly score, or measure of rareness

3. Apply a decision rule to the anomaly score to detect anomalies (typically thresholding)

4. [optional] Perform post-processing operations to enforce smooth detections and avoid isolated pixels that are not consistent with neighbourhoods

Remark: Statistical-based approaches seen before uses as background model the statistical distribution οΏ½πœ™πœ™0 and a statistic as anomaly score

Once β€œhaving appliedβ€œ the background model, one can use anomaly detection

methods for the β€œrandom variable world”.This might require fitting an

additional model

Page 94: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED ANOMALY-DETECTION IN IMAGES

Out of the "Random Variable" world

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

Page 95: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED ANOMALY-DETECTION IN IMAGES

Out of the "Random Variable" world

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

Page 96: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

RECONSTRUCTION-BASED METHODS

Fit a statistical model to the observation to describe dependence, apply anomaly detection on the independent residuals.

Detection is performed by using a model β„³ which can encode and reconstruct normal data:

β€’ During training: learn the model β„³ from training set 𝑆𝑆‒ During testing:

βˆ’ Encode and reconstruct each test signal 𝒔𝒔 through β„³.

βˆ’ Assess err(𝒔𝒔), namely the residual between 𝒔𝒔 and its reconstruction through β„³

The rationale is that β„³ can reconstruct only normal data, thus anomalies are expected to yield large reconstruction errors.

Page 97: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

MONITORING THE RECONSTRUCTION ERROR

Normal data are expected to yield values of err(𝒔𝒔) that are low, while anomalies do not. This holds when the model β„³ was specifically learned to describe normal data

Outliers can be detected by thresholding err(𝒔𝒔)

𝑑𝑑

err(𝒔𝒔)

……

πœ™πœ™1πœ™πœ™0 πœ™πœ™0

𝛾𝛾

Page 98: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

RECONSTRUCTION-BASED METHODS

Popular models are:

β€’ neural networks, in particular auto-encoders, for higher dimensional data

β€’ projection on subspaces / manifolds

β€’ dictionaries yielding sparse-representations

β€’ autoregressive models for time series (ARMA, ARIMA…)

Methods based on projections and dictionaries can be also interpreted as subspace methods

Page 99: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

RECONSTRUCTION-BASED METHODS

Autoencoders are neural networks used for data reconstruction (they learn the identity function)

The typical structure of an autoencoder is:

…

𝑠𝑠1

𝑠𝑠𝑑𝑑

𝑠𝑠2

𝑠𝑠3

𝑠𝑠4

…

𝑠𝑠1

𝑠𝑠𝑑𝑑

𝑠𝑠2

𝑠𝑠3

𝑠𝑠4

Input layer, 𝑑𝑑 neurons

Output layer, 𝑑𝑑 neurons

Hidden layer, π‘šπ‘š neuronsπ‘šπ‘š β‰ͺ 𝑑𝑑

Encoder β„° Decoder π’Ÿπ’Ÿ

Page 100: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

RECONSTRUCTION-BASED METHODS

Autoencoders are trained to reconstruct all the samples in the training set. The reconstruction loss over the training set 𝑆𝑆 is

β„’ 𝑆𝑆 = οΏ½π’”π’”βˆˆπ‘†π‘†

𝒔𝒔 βˆ’ π’Ÿπ’Ÿ β„° 𝒔𝒔 2

and training of π’Ÿπ’Ÿ β„° β‹… is performed through standard backpropagation algorithms (e.g. SGD)

Remarks

β€’ Typically π’Ÿπ’Ÿ β„° β‹… does not provide perfect reconstruction, since π‘šπ‘š β‰ͺ 𝑑𝑑.

β€’ Regularization terms might be included in the loss function for the latent representation β„° 𝒔𝒔 to feature specific properties

Bengio, Y., Courville, A., Vincent, P. "Representation learning: A review and new perspectives". IEEE TPAMI 2013

Mishne, G., Shaham, U., Cloninger, A., & Cohen, I. Diffusion nets. Applied and Computational Harmonic Analysis (2017).

Page 101: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

MONITORING THE RECONSTRUCTION ERROR

Detection by reconstruction error monitoring (AE notation)

Training (Monitoring the Reconstruction Error):

1. Train the model π’Ÿπ’Ÿ(β„°(β‹…)) from the training set 𝑆𝑆

2. Learn the distribution of reconstruction errors

err 𝒔𝒔 = 𝒔𝒔 βˆ’ π’Ÿπ’Ÿ β„° 𝒔𝒔 2, 𝒔𝒔 ∈ 𝑉𝑉

over a validation set 𝑉𝑉, such that 𝑉𝑉 ∩ 𝑆𝑆 = βˆ…, and define a suitable threshold 𝛾𝛾

Testing (Monitoring the Reconstruction Error):

1. Perform encoding and compute the reconstruction error

err 𝒔𝒔 = 𝒔𝒔 βˆ’ π’Ÿπ’Ÿ β„° 𝒔𝒔 2

2. Consider 𝒔𝒔 anomalous when err 𝒔𝒔 > 𝛾𝛾

Page 102: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

OUTLINE ON SEMI-SUPERVISED APPROACHES

Out of the "Random Variable" world

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

β€’ Extended models

Page 103: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUBSPACE METHODS

The underlying assumption is that

β€’ normal patches live in a subspace that can be identified by 𝑆𝑆

β€’ anomalies can be detected by projecting test patches in such subspace and by monitoring the reconstruction error (distance with the projection)

NormalDataData

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 104: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUBSPACE METHODS

A few example of models used for describing normal patches:

β€’ Orthogonal basis: normal patches can be expressed by a few selected basis elements (Fourier, Wavelets..)

β€’ PCA: normal patches live in the linear subspace of the first components

β€’ Robust PCA: defined on the β„“1 distance to be insensitive to outliers in normal data

β€’ Kernel PCA: normal patches live in a non-linear manifold

β€’ Dictionaries yielding sparse representations

β€’ Random projectionsNormalDataData

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 105: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUBSPACE METHODS: PCA-BASED MONITORING

Anomaly detection based on PCA (and similar techniques):

1. Compute the projection on the subspace,𝒔𝒔′ = 𝑇𝑇𝑇𝑇𝒔𝒔, 𝑇𝑇 ∈ β„π‘‘π‘‘Γ—π‘šπ‘š, π‘šπ‘š β‰ͺ 𝑑𝑑

which is the projection over the first π‘šπ‘š principal components and a way to reduce data-dimensionality.

2. Monitor the reconstruction error:err 𝐬𝐬 = 𝒔𝒔 βˆ’ 𝑇𝑇𝑇𝑇𝑇𝑇𝒔𝒔 2

which is the distance between 𝒔𝒔 and its projection 𝑇𝑇𝑇𝑇𝑇𝑇𝒔𝒔 over the subspace of normal patches

2. [bis] The projection along the last principal component, is also a good anomaly score, as it becomes large at anomalies.

NormalDataData 𝒔𝒔𝒔𝒔𝒔

Page 106: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

M. Elad "Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing",Springer, 2010

SUBSPACE METHODS: SPARSE REPRESENTATIONS

Basic assumption: normal data live in a union of low-dimensional subspaces of the input space

β€’ The model learned from 𝑆𝑆 is a matrix: the dictionary 𝐷𝐷.

β€’ Each signal is decomposed as the sum of a few dictionary atoms (representation is constrained to be sparse).

β€’ Atoms represent the many building blocks that can be used to reconstruct normal signals.

β€’ There are typically more atoms than the signal dimension (redundant dictionaries).

β€’ Effective as long as the learned dictionary 𝐷𝐷 is very specific for normal data

Page 107: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DICTIONARIES YIELDING SPARSE REPRESENTATIONS

Dictionaries are just matrices: 𝐷𝐷 ∈ β„π‘‘π‘‘Γ—π‘šπ‘š

𝐷𝐷 =

Page 108: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DICTIONARIES YIELDING SPARSE REPRESENTATIONS

Dictionaries are just matrices: 𝐷𝐷 ∈ β„π‘‘π‘‘Γ—π‘šπ‘š

Each column of 𝐷𝐷 is an atom:

β€’ lives in the input space ℝ𝑑𝑑

β€’ it is one of the building blocks that have been learned to reconstruct the input signal in the training set 𝑆𝑆

𝒔𝒔

𝐷𝐷 =

Page 109: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SPARSE REPRESENTATIONS

Let 𝒔𝒔 ∈ ℝ𝑑𝑑 be the input signal, a sparse representation is

𝒔𝒔 = �𝑖𝑖=1

π‘šπ‘š

𝛼𝛼𝑖𝑖 π’…π’…π’Šπ’Š

a linear combination of few dictionary atoms {π’…π’…π’Šπ’Š}, i.e., most of coefficients are such that 𝛼𝛼𝑖𝑖 = 0

An illustrative example in case of our patches

= 0.7 βˆ— +0.1 βˆ— βˆ’0.2 βˆ—

Page 110: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SPARSE REPRESENTATIONS IN MATRIX EXPRESSION

Let 𝒔𝒔 ∈ ℝ𝑑𝑑 be the input signal, a sparse representation is

𝒔𝒔 = �𝑖𝑖=1

π‘šπ‘š

𝛼𝛼𝑖𝑖 π’…π’…π’Šπ’Š = 𝐷𝐷𝜢𝜢, 𝐷𝐷 ∈ β„π‘‘π‘‘Γ—π‘šπ‘š

a linear combination of few dictionary atoms {π’…π’…π’Šπ’Š} and 𝜢𝜢 0 < 𝐿𝐿, i.e. only a few coefficients are nonzero, i.e. 𝜢𝜢 is sparse.

𝒔𝒔 𝜢𝜢𝐷𝐷

This vector𝜢𝜢 = [𝛼𝛼1, … , π›Όπ›Όπ‘šπ‘š]

is sparse

= βˆ—

Overcomplete / Redundant dictionary

Page 111: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SPARSE REPRESENTATIONS IN MATRIX EXPRESSION

Let 𝒔𝒔 ∈ ℝ𝑑𝑑 be the input signal, a sparse representation is

𝒔𝒔 = �𝑖𝑖=1

π‘šπ‘š

𝛼𝛼𝑖𝑖 π’…π’…π’Šπ’Š = 𝐷𝐷𝜢𝜢, 𝐷𝐷 ∈ β„π‘‘π‘‘Γ—π‘šπ‘š

a linear combination of few dictionary atoms {π’…π’…π’Šπ’Š} and 𝜢𝜢 0 < 𝐿𝐿, i.e. only a few coefficients are nonzero, i.e. 𝜢𝜢 is sparse.

𝒔𝒔 𝜢𝜢𝐷𝐷This vector

𝜢𝜢 = [𝛼𝛼1, … , π›Όπ›Όπ‘šπ‘š]is sparse

Undercompletedictionary

Carrera D., Rossi B., Fragneto P., Boracchi G. "Online Anomaly Detection for Long-Term ECG Monitoring using Wearable Devices", Pattern Recognition 2019

Page 112: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Pati, Y.; Rezaiifar, R.; Krishnaprasad, P. Orthogonal Matching Pursuit: recursive function approximation with application to wavelet decomposition. Asilomar Conf. on Signals, Systems and Comput. 1993

SPARSE CODING…

Sprase Coding: computing the sparse representation for an input signal 𝒔𝒔 w.r.t. 𝐷𝐷

It is solved as the following optimization problem, (e.g. via the Orthogonal Matching Pursuit, OMP)

𝜢𝜢 = argminπ’‚π’‚βˆˆβ„π‘šπ‘š

𝐷𝐷𝒂𝒂 βˆ’ 𝑠𝑠 2 s. t. 𝒂𝒂 0 < 𝐿𝐿

In this illustration 𝜢𝜢 = [0.7, 0, 0, 0.1, 0, 0, 0,βˆ’0.2]

𝒔𝒔 ∈ ℝ𝑑𝑑 𝜢𝜢 ∈ β„π‘šπ‘š

𝒔𝒔

0.7 0 0 0.1 0 0 0 βˆ’0.2𝜢𝜢 =

Page 113: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Pati, Y.; Rezaiifar, R.; Krishnaprasad, P. Orthogonal Matching Pursuit: recursive function approximation with application to wavelet decomposition. Asilomar Conf. on Signals, Systems and Comput. 1993

SPARSE CODING…

Sprase Coding: computing the sparse representation for an input signal 𝒔𝒔 w.r.t. 𝐷𝐷

Or through an alternative formulation based on β„“1 minimization

𝜢𝜢 = argminπ’‚π’‚βˆˆβ„π‘›π‘›

𝐷𝐷𝒂𝒂 βˆ’ 𝐬𝐬 𝟐𝟐𝟐𝟐 + πœ†πœ† 𝒂𝒂 1, πœ†πœ† > 0

𝒔𝒔 ∈ ℝ𝑑𝑑 𝜢𝜢 ∈ β„π‘šπ‘š

Page 114: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Aharon, M.; Elad, M. Bruckstein, A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation IEEE TSP, 2006

… AND DICTIONARY LEARNING

Dictionary Learning: estimate 𝐷𝐷 from a training set of normal signals 𝑆𝑆 βŠ‚ ℝ𝑑𝑑

It is solved as the following optimization problem typically via block-coordinates descent (e.g. KSVD algorithm)

𝐷𝐷,𝑋𝑋 = argmin𝐴𝐴∈ ℝ𝑑𝑑×𝑛𝑛, π‘Œπ‘Œβˆˆ β„π‘›π‘›Γ—π‘šπ‘š

𝐴𝐴𝐴𝐴 βˆ’ 𝑆𝑆 2 s. t. π’šπ’šπ’Šπ’Š 0 < 𝐿𝐿, βˆ€π’šπ’šπ’Šπ’Š

𝑆𝑆 = {π’”π’”πŸπŸ, … 𝒔𝒔𝒏𝒏} 𝐷𝐷 ∈ β„π‘‘π‘‘Γ—π‘šπ‘š

… .𝑇𝑇𝑇𝑇 𝐷𝐷… .

Page 115: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SPARSE REPRESENTATION: RECONSTRUCTION ERROR

Anomalies can be directly detected by performing the sparse coding of test signals and then analysing the reconstruction error

err(𝒔𝒔) = 𝐷𝐷𝜢𝜢 βˆ’ 𝒔𝒔 2

As in all reconstruction-based techniques

Carrera D., Rossi B., Zambon D., Fragneto P., Boracchi G. "ECG Monitoring in Wearable Devices by Sparse Models" in Proceedings of ECML-PKDD 2016, 16 pages

Carrera D., Rossi B., Fragneto P., Boracchi G. "Online Anomaly Detection for Long-Term ECG Monitoring using Wearable Devices" Pattern Recognition 2019

Low err 𝒔𝒔 High err 𝒔𝒔

Page 116: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION DURING SPARSE CODING

Anomalies can be directly detected during the sparse coding stage, by adopting a special loss during optimization.

A set of test signals is modeled as:

𝑆𝑆 = 𝐷𝐷𝑋𝑋 + 𝐸𝐸 + 𝑉𝑉

where 𝑋𝑋 is sparse, 𝑉𝑉 is a noise term, and 𝐸𝐸 is a matrix having most columns set to zero. Columns 𝒆𝒆𝑖𝑖 β‰  𝟎𝟎 indicate anomalies, as they do not admit a sparse representation w.r.t. 𝐷𝐷

A. Adler, M. Elad, Y. Hel-Or, and E. Rivlin, β€œSparse coding with anomaly detection” Journal of Signal Processing Systems, vol.79, no. 2, pp. 179–188, 2015.

Page 117: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION DURING SPARSE CODING

Anomalies can be detected by solving (through ADMM) the following sparse coding problem

argmin𝑋𝑋,𝐸𝐸

12

𝑆𝑆 βˆ’ 𝐷𝐷𝑋𝑋 βˆ’ 𝐸𝐸 𝐹𝐹2 + πœ†πœ† 𝑋𝑋 1 + πœ‡πœ‡ 𝐸𝐸 2,1

.. and identifying as anomalies the signals corresponding to columns of 𝐸𝐸 that are nonzero.

A. Adler, M. Elad, Y. Hel-Or, and E. Rivlin, β€œSparse coding with anomaly detection” Journal of Signal Processing Systems, vol.79, no. 2, pp. 179–188, 2015.

Data-fidelity for normal data Sparsity Group sparsityregularization, only a fewcolumns can be nonzero

Page 118: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

OUTLINE ON SEMI-SUPERVISED APPROACHES

β€’ Detrending/Filtering for time-series

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

β€’ Extended models

Page 119: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

TYPICAL APPROACH: MONITORING FEATURES

Feature extraction: meaningful indicators to be monitored which have a known / controlled response w.r.t. normal data

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Input signal

FeatureExtraction

Featurevector

Anomaly/Changedetection

𝒙𝒙 𝑑𝑑 ∈ β„π‘šπ‘š

π‘šπ‘š β‰ͺ 𝑑𝑑𝒔𝒔𝑑𝑑 ∈ ℝ𝑑𝑑

οΏ½πœ™πœ™0 𝒙𝒙 𝑑𝑑 β‰Ά πœ‚πœ‚

The customary statisticalframework for anomaly detection

Feature Extraction: signal processing, a priori information, learning methods

Signals Random variables

Page 120: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

MONITORING FEATURE DISTRIBUTION

Normal data are expected to yield features 𝒙𝒙 that are i.i.d. and follow an unknown distribution πœ™πœ™0.

Anomalous data do not, as they follow πœ™πœ™1 β‰  πœ™πœ™0.

We are back to our statistical framework and we can

β€’ learn οΏ½πœ™πœ™0 from a set features extracted from normal data

β€’ detect anomalous data by extracting features 𝒙𝒙 associated to each input 𝑠𝑠, and then testing whether οΏ½πœ™πœ™0 π‘₯π‘₯ < 𝛾𝛾

𝑑𝑑

𝒙𝒙(𝑑𝑑)

οΏ½ πœ™πœ™0𝒙𝒙

………

𝛾𝛾

Page 121: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

MONITORING FEATURE DISTRIBUTION

Normal data are expected to yield features 𝒙𝒙 that are i.i.d. and follow an unknown distribution πœ™πœ™0.

Anomalous data do not, as they follow πœ™πœ™1 β‰  πœ™πœ™0.

We are back to our statistical framework and we can

β€’ learn οΏ½πœ™πœ™0 from a set features extracted from normal data

β€’ detect anomalous data by extracting features 𝒙𝒙 associated to each input 𝑠𝑠, and then testing whether οΏ½πœ™πœ™0 π‘₯π‘₯ < 𝛾𝛾

𝑑𝑑

𝒙𝒙(𝑑𝑑)

οΏ½ πœ™πœ™0𝒙𝒙

………

𝛾𝛾

Or by adopting any other statistical toolto detect anomalies in 𝒙𝒙

Page 122: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

FEATURE EXTRACTION

Data dimensionality can be reduced by extracting features

Good features should:

β€’ Yield a stable response w.r.t. normal data

β€’ Yield unusual response on anomalies

Examples of features seen so far:

β€’ Reconstruction error err(𝒔𝒔)

β€’ representation coefficients (𝑇𝑇𝑇𝑇𝒔𝒔,𝜢𝜢, … )

… but these are not the only

V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.

Page 123: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

FEATURE EXTRACTION APPROACHES

There are two major approaches for extracting features:

Expert-driven (hand-crafted) features: computational expressions that are manually designed by experts to distinguish between normal and anomalous data

Data-driven features: features characterizing normal data are automatically learned from training set of normal samples 𝑆𝑆

Page 124: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

OUTLINE ON SEMI-SUPERVISED APPROACHES

β€’ Detrending/Filtering for time-series

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

β€’ Data-driven Features: extended models

Page 125: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EXAMPLES OF EXPERT-DRIVEN FEATURES

Expert-driven features: each patch of an image 𝑠𝑠𝐬𝐬𝑐𝑐 = {𝑠𝑠 𝑐𝑐 + 𝑒𝑒 ,𝑒𝑒 ∈ 𝒰𝒰}

Example of features are:

β€’ the average,

β€’ the variance,

β€’ the total variation (the energy of gradients)

These can hopefully distinguish normal and anomalous patches, considering also how anomalous regions will be (e.g. flat or without edges)

Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEETransactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472

Page 126: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

OUTLINE ON SEMI-SUPERVISED APPROACHES

β€’ Detrending/Filtering for time-series

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

β€’ Data-driven Features: extended models

Page 127: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EXAMPLES OF DATA-DRIVEN FEATURES

Analyze each patch of an image 𝑠𝑠𝐬𝐬𝑐𝑐 = {𝑠𝑠 𝑐𝑐 + 𝑒𝑒 ,𝑒𝑒 ∈ 𝒰𝒰}

and determine whether it is normal or anomalous.

Data driven features: expressions to quantitatively assess whether test patches conform or not with the model, learned from normal data.

Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEETransactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472

Page 128: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

AUTOENCODERS AS FEATURE EXTRACTORS

Autoencoders can be also used in feature-based monitoring schemes, monitoring as feature the hidden/latent representation of the input signal

…

𝑠𝑠1

𝑠𝑠𝑑𝑑

𝑠𝑠2

𝑠𝑠3

𝑠𝑠4

…

𝑠𝑠1

𝑠𝑠𝑑𝑑

𝑠𝑠2

𝑠𝑠3

𝑠𝑠4

Input layer, 𝑑𝑑 neurons

Output layer, 𝑑𝑑 neurons

Hidden layer, π‘šπ‘š neuronsπ‘šπ‘š β‰ͺ 𝑑𝑑

Encoder β„° Decoder π’Ÿπ’Ÿ

Page 129: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION BY MONITORING FEATURE DISTRIBUTION

Detection by feature monitoring (AE notation)

Training (Monitoring Feature Distribution):

β€’ Learn the autoencoder π’Ÿπ’Ÿ(β„°(β‹…)) from the training set 𝑆𝑆

β€’ Fit a density model οΏ½πœ™πœ™0 to the encoded features

{β„°(𝒔𝒔), 𝒔𝒔 ∈ 𝑉𝑉}over a validation set 𝑉𝑉, such that 𝑉𝑉 ∩ 𝑆𝑆 = βˆ…

β€’ Define a suitable threshold 𝛾𝛾 for οΏ½πœ™πœ™0(𝒔𝒔)

Testing (Monitoring Feature Distribution):

β€’ Encode each incoming signal 𝒔𝒔 through β„°

β€’ Detect anomalies if οΏ½πœ™πœ™0 β„°(𝒔𝒔) < 𝛾𝛾

Page 130: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY DETECTION BY MONITORING PCA PROJECTIONS

Compute the projection on the subspace,𝒔𝒔′ = 𝑇𝑇𝑇𝑇𝒔𝒔, 𝑇𝑇 ∈ β„π‘‘π‘‘Γ—π‘šπ‘š, π‘šπ‘š β‰ͺ 𝑑𝑑

which is the projection over the first π‘šπ‘š principal components and a way to reduce data-dimensionality.

Monitor the projections 𝑇𝑇𝑇𝑇𝒔𝒔 by a suitable statistical technique (e.g. density based), as when monitoring β„°(𝒔𝒔)

NormalDataData 𝒔𝒔𝑇𝑇𝑇𝑇𝒔𝒔

Page 131: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SPARSE REPRESENTATIONS AS FEATURE EXTRACTORS

To assess the conformance of 𝒔𝒔𝑐𝑐 with 𝐷𝐷 we solve the following

Sparse coding:

𝜢𝜢 = argminπ’‚π’‚βˆˆβ„π‘›π‘›

𝐷𝐷𝒂𝒂 βˆ’ 𝐬𝐬 𝟐𝟐𝟐𝟐 + πœ†πœ† 𝒂𝒂 1, πœ†πœ† > 0

which is the BPDN formulation and we solve using ADMM.

The penalized β„“1 formulation has more degrees of freedom in the reconstruction, the conformance of 𝒔𝒔 with 𝑫𝑫 have to be assessed monitoring both terms of the functional

Boyd S., Parikh N., Chu E., Peleato B., Eckstein J., "Distributed optimization and statistical learning via the alternatingdirection method of multipliers" 2011

Page 132: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

FEATURES EXTRACTED FROM SPARSE CODING

Features then include both the reconstruction error

err 𝒔𝒔 = 𝐷𝐷𝜢𝜢 βˆ’ 𝐬𝐬 𝟐𝟐𝟐𝟐

and the sparsity of the representation

𝜢𝜢 𝟏𝟏

Thus obtaining a data-driven feature vector 𝒙𝒙 = 𝐷𝐷𝜢𝜢 βˆ’ 𝐬𝐬 𝟐𝟐𝟐𝟐

𝜢𝜢 1

Page 133: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DENSITY-BASED MONITORING

Normal patches:

Anomalies

Page 134: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472

FEATURES EXTRACTED FROM SPARSE CODING

Training:

β€’ Learn from 𝑆𝑆 the dictionary 𝐷𝐷

β€’ Compute the sparse representation w.r.t. 𝐷𝐷, thus features 𝒙𝒙over the validation set 𝑉𝑉, such that 𝑉𝑉 ∩ 𝑆𝑆 = βˆ…

β€’ Learn from 𝑉𝑉, the distribution οΏ½πœ™πœ™0 of normal features vectors 𝒙𝒙and the threshold 𝛾𝛾.

The model for anomaly detection is (𝐷𝐷, οΏ½πœ™πœ™0, 𝛾𝛾)

Testing:

β€’ Perform sparse coding of a test signal 𝒔𝒔, thus get the feature vector 𝒙𝒙

β€’ Detect anomalies when οΏ½πœ™πœ™0 𝒙𝒙 < 𝛾𝛾

Page 135: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472

FEATURES EXTRACTED FROM SPARSE CODING

Training:

β€’ Learn from 𝑆𝑆 the dictionary 𝐷𝐷

β€’ Compute the sparse representation w.r.t. 𝐷𝐷, thus features 𝒙𝒙over the validation set 𝑉𝑉, such that 𝑉𝑉 ∩ 𝑆𝑆 = βˆ…

β€’ Learn from 𝑉𝑉, the distribution οΏ½πœ™πœ™0 of normal features vectors 𝒙𝒙and the threshold 𝛾𝛾.

The model for anomaly detection is (𝐷𝐷, οΏ½πœ™πœ™0, 𝛾𝛾)

Testing:

β€’ Perform sparse coding of a test signal 𝒔𝒔, thus get the feature vector 𝒙𝒙

β€’ Detect anomalies when οΏ½πœ™πœ™0 𝒙𝒙 < 𝛾𝛾

This is rather a flexible solution and can be adapted whenoperating conditions changes (e.g. different zooming level)

Page 136: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Page 137: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Detections

Page 138: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

OUTLINE ON SEMI-SUPERVISED APPROACHES

β€’ Detrending/Filtering for time-series

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

β€’ Data-driven Features: extended models

Page 139: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. β€œDeconvolutional networks” CVPR 2010

CONVOLUTIONAL SPARSITY

Data-driven features from convolutional sparse models

𝒔𝒔 β‰ˆοΏ½π‘–π‘–=1

π‘šπ‘š

π’…π’…π’Šπ’Š βŠ› πœΆπœΆπ’Šπ’Š , s. t. πœΆπœΆπ’Šπ’Š is sparse

where the whole image 𝒔𝒔 is entirely encoded as the sum of 𝑛𝑛convolutions between a filter π’…π’…π’Šπ’Š and a coefficient map πœΆπœΆπ’Šπ’Šβ€’ Translation invariant representation

β€’ Few small filters are typically required

β€’ Filters exhibit very specific image structures

β€’ Easy to use filters having different size

Page 140: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EXAMPLE OF LEARNED FILTERS

Training Image Learned Filters

Page 141: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EXAMPLE OF FEATURE MAPS

Coefficient maps FiltersTest Image

Page 142: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

FEATURES FROM CONVOLUTIONAL SPARSE MODEL

The standard convolutional sparse coding

�𝜢𝜢 = argmin𝜢𝜢 𝑛𝑛

�𝑖𝑖=1

𝑛𝑛

π’…π’…π’Šπ’Š βŠ› πœΆπœΆπ’Šπ’Š βˆ’ 𝐬𝐬𝟐𝟐

𝟐𝟐

+ πœ†πœ†οΏ½π‘–π‘–=1

𝑛𝑛

𝜢𝜢 1

leads to the following feature vector for each image region in 𝑐𝑐:

𝒙𝒙𝑐𝑐 =

�𝒄𝒄

�𝑖𝑖=1

𝑛𝑛

π’…π’…π’Šπ’Š βŠ› οΏ½πœΆπœΆπ’Šπ’Š βˆ’ 𝐬𝐬𝟐𝟐

𝟐𝟐

�𝑖𝑖=1

𝑛𝑛

�𝒄𝒄

�𝜢𝜢𝟏𝟏

…but, anomaly detection performance are rather poor

Page 143: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SPARSITY IS TOO LOOSE A CRITERION FOR DETECTION

These two (normal and anomalous) patches exhibit same sparsity and

reconstruction error

Coefficientm

apsnorm

alpatchCoefficient

maps

anomalous

patch

Page 144: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020D. Carrera, G. Boracchi, A. Foi and B. Wohlberg , β€œDetecting Anomalous Structures by Convolutional Sparse Models β€œ IEEE IJCNN 2015

FEATURES FROM CONVOLUTIONAL SPARSE MODEL

Add the group sparsity of the maps on the patch support as an additional feature

π‘₯π‘₯𝑐𝑐 =

�𝒄𝒄

�𝑖𝑖=1

π‘šπ‘š

π’…π’…π’Šπ’Š βŠ› οΏ½πœΆπœΆπ’Šπ’Š βˆ’ 𝐬𝐬𝟐𝟐

𝟐𝟐

�𝑖𝑖=1

π‘šπ‘š

�𝒄𝒄

�𝜢𝜢1

�𝑖𝑖=1

π‘šπ‘š

�𝒄𝒄

�𝜢𝜢2

Page 145: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOMALY-DETECTION PERFORMANCE

On 25 different textures and 600 test images (pair of textures to mimic normal/anomalous regions)

Best performance achieved by the 3-dimensional feature indicators

Achieve similar performance than steerable pyramid specifically designed for texture classification

Page 146: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

OUTLINE ON SEMI-SUPERVISED APPROACHES

β€’ Detrending/Filtering for time-series

β€’ Reconstruction-based methods

β€’ Subspace methods

β€’ Feature-based monitoring

β€’ Expert-driven Features

β€’ Data-driven Features

β€’ Data-driven Features: extended models

Dictionary Adaptation: Counteracting Domain Shift in Anomaly Detection

Page 147: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DATA-DRIVEN MODELS AND ONLINE MONITORING

A challenge often occurring when performing online monitoring

Test data might differ from training data: need of adaptation

Defects have to be detected at different zooming levels, that might not be present in the training set.

Page 148: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Pan, S. J., and Yang, Q. "A survey on transfer learningβ€œ IEEE TKDE 2010

S. Shekhar, V. M. Patel, H. V. Nguyen, & R. Chellappa, β€œGeneralized domain-adaptive dictionaries,” CVPR 2013

MODEL ADAPTATION

In the machine-learning literature these problems go under the name of transfer learning / domain adaptation

Transfer Learning (TL): adapt a model learned in the source domain (e.g. heartbeats at a given heartrate / fibers at a certain zoom level) to a target domain (e.g. heartbeats at an higher heartrate / fibers zoomed in or out)

Many TL methods have been designed for supervised / semi-supervised / unsupervised methods, depending on the availability of (annotated) data in the source and target domains.

In our settings, no labels in the target data are provided

Page 149: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016

DOMAIN ADAPTATION ON QUALITY INSPECTION

SEM images can be acquired at different zooming levels

Solution:

β€’ Synthetically generate training images at different zooming levels

β€’ Learn a dictionary 𝐷𝐷𝑖𝑖 at each scale

β€’ Combine the learned dictionaries in a multiscale dictionary 𝐷𝐷

𝐷𝐷 = [ 𝐷𝐷1 𝐷𝐷2 𝐷𝐷3 𝐷𝐷4 ]

Page 150: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016

DOMAIN ADAPTATION ON QUALITY INSPECTION

SEM images can be acquired at different zooming levels

Solution:

β€’ Synthetically generate training images at different zooming levels

β€’ Learn a dictionary 𝐷𝐷𝑖𝑖 at each scale

β€’ Combine the learned dictionaries in a multiscale dictionary 𝐷𝐷

β€’ Sparse-coding including a penalized, group sparsity term

𝜢𝜢 = argminπ’‚π’‚βˆˆβ„π‘›π‘›

12

𝒔𝒔 βˆ’ 𝐷𝐷𝒂𝒂2

2+ πœ†πœ† 𝒂𝒂 𝟏𝟏 + πœ‡πœ‡οΏ½

π’Šπ’Š

𝒂𝒂 2

Page 151: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016

DOMAIN ADAPTATION ON QUALITY INSPECTION

SEM images can be acquired at different zooming levels

Solution:

β€’ Synthetically generate training images at different zooming levels

β€’ Learn a dictionary 𝐷𝐷𝑖𝑖 at each scale

β€’ Combine the learned dictionaries in a multiscale dictionary 𝐷𝐷

β€’ Sparse-coding including a penalized, group sparsity term

β€’ Monitor a tri-variate feature vector

𝒙𝒙 =

𝒔𝒔 βˆ’ 𝐷𝐷𝜢𝜢 𝟐𝟐𝟐𝟐

𝜢𝜢 1

οΏ½π’Šπ’Š

πœΆπœΆπ’Šπ’Š 2

Page 152: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016

DOMAIN ADAPTATION ON QUALITY INSPECTION

Performance on SEM image dataset acquired at 4 different zooming levels (A,B,C,D). It is important to include group-sparsity regularization also in the sparse coding stage

Page 153: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DOMAIN ADAPTATION FOR ONLINE ECG MONITORING

β€’ Dictionary has to be learned from each user

β€’ Training set 𝑆𝑆 of ECG tracings for training can be only acquired in resting conditions

β€’ During daily activities heart-rate changes and do not match the learned dictionary

The heartbeats get transformed when the heart rate changes: learned models have to be adapted according to the heart rate.

Page 154: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DOMAIN ADAPTATION FOR ONLINE ECG MONITORING

We propose to design linear transformations πΉπΉπ‘Ÿπ‘Ÿ1,π‘Ÿπ‘Ÿ0 to adapt user-specific dictionaries

𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ1 = πΉπΉπ‘Ÿπ‘Ÿ1,π‘Ÿπ‘Ÿ0 β‹… 𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ0 , πΉπΉπ‘Ÿπ‘Ÿ0,π‘Ÿπ‘Ÿ1 ∈ β„π‘šπ‘šΓ—π‘šπ‘š

Surprisingly these transformations can be learned from a publicly available dataset containing ECG recordings at different heart rates from several users

User-independent transformations enable accurate mapping of user-specific dictionaries

Carrera D., Rossi B., Fragneto P., and Boracchi G. "Domain Adaptation for Online ECG Monitoring” ICDM 2017,

�𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ1

= οΏ½

𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ0πΉπΉπ‘Ÿπ‘Ÿ1,π‘Ÿπ‘Ÿ0

Page 155: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DOMAIN ADAPTATION FOR ONLINE ECG MONITORING

We propose to design linear transformations πΉπΉπ‘Ÿπ‘Ÿ1,π‘Ÿπ‘Ÿ0 to adapt user-specific dictionaries

𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ1 = πΉπΉπ‘Ÿπ‘Ÿ1,π‘Ÿπ‘Ÿ0 β‹… 𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ0 , πΉπΉπ‘Ÿπ‘Ÿ0,π‘Ÿπ‘Ÿ1 ∈ β„π‘šπ‘šΓ—π‘šπ‘š

Surprisingly these transformations can be learned from a publicly available dataset containing ECG recordings at different heart rates from several users

User-independent transformations enable accurate mapping of user-specific dictionaries

Carrera D., Rossi B., Fragneto P., and Boracchi G. "Domain Adaptation for Online ECG Monitoring” ICDM 2017,

�𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ1

= οΏ½

𝐷𝐷𝑒𝑒,π‘Ÿπ‘Ÿ0πΉπΉπ‘Ÿπ‘Ÿ1,π‘Ÿπ‘Ÿ0

Page 156: Anomaly Detection in Images - polimi.it

Reference-based approaches

Page 157: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Zontak, M., Cohen, I.: Defect detection in patterned wafers using anisotropic kernels." Machine Vision and Applications 21(2), 129{141 (2010)

REFERENCE BASED-METHODS IN QUALITY INSPECETION

In some cases anomalies can be detected by comparing

β€’ the target, namely the image to be tested

β€’ against a reference, namely an anomaly-free image

refe

renc

e

targ

et

Page 158: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Zontak, M., Cohen, I.: Defect detection in patterned wafers using anisotropic kernels." Machine Vision and Applications 21(2), 129{141 (2010)

REFERENCE BASED-METHODS IN QUALITY INSPECETION

In some cases anomalies can be detected by comparing

β€’ the target, namely the image to be tested

β€’ against a reference, namely an anomaly-free image

refe

renc

e

targ

et

Page 159: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

REFERENCE BASED-METHODS: E-CHUCKING

The e-chuck is adopted to safely maneuver, position and block the silicon wafer during production.

The device is configured by means of markingmaterials and the markers should conformto a reference template

Electrodes

Template

Good (normal) Bad (anomalous)

Page 160: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

CHANGE DETECTION IN REMOTE SENSING

A key problem in remote sensing is to detect changes, due to manmade or natural phenomenon, by comparing multiple (satellite) images at different times.

In the remote sensing literature this is typically referred to as change detection

reference target

Page 161: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

L. T. Luppino, F. M. Bianchi, G. Moser, S. N. Anfinsen, β€œUnsupervised Image Regression for Heterogeneous Change Detection” IEEE Transactions on Geoscience and Remote Sensing (2019)

MULTIMODAL, REFERENCE BASED ANOMALY DETECTION

Non trivial when direct comparison is prevented:

β€’ Reference and target might not be aligned nor easy to register with a global transformation

β€’ Reference and target might be from different modalities / resolution / view

Multispectral vs SAR images California flood 2017

Temperature Map from CFD

IR

RGB

Page 162: Anomaly Detection in Images - polimi.it

Part3: Deep learning for anomaly Detection in images

Page 163: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

IMAGE CLASSIFICATION

The problem: assigning to an input image 𝒔𝒔 one label 𝑙𝑙 from a fixed set of 𝐿𝐿 categories Ξ›

𝒔𝒔

𝒔𝒔

β€œwheel” 65%, β€œtyre” 30%..

β€œcastle” 55%, β€œtower” 43%..

Ξ› = {"wheel", "cars" … . .……"castle", "baboon", … }

Page 164: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DEEP LEARNING AND IMAGE CLASSIFICATION

Since 2010 ImageNet organizes ILSVRC (ImageNet Large Scale Visual Recognition Challenge)

Classification error rate (top 5 accuracy):

β€’ In 2011: 25%

β€’ In 2012: 16% (achieved by a CNN)

β€’ In 2017: < 5% (for 29 of 38 competing teams, deep learning)

Deep learning

Page 165: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DEEP LEARNING AND IMAGE CLASSIFICATION

Deep Learning boasted image classification performance, thanks to

β€’ Advances in parallel hardware (e.g. GPU)

β€’ Availability of large annotated dataset (e.g. the ImageNet project is a large database visual recognition over 14M hand-annotated images in more than 20K categories)

Page 166: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

CONVOLUTIONAL NEURAL NETWORKS (CNN)

The typical architecture of a convolutional neural network

Convolution filters are learned for the

classification task at hand

Thresholding + Downsampling

(ReLu + Maxpooling)

Fully connected NeuralNetwork providing as

output the classscores

Page 167: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE OUTPUT OF A CNN

0.02

0.01

0.11

0.81

…

…

…

0.1

TrainedCNN

"Castle" probability

"Wheel" probability

Page 168: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

CNN AS DATA-DRIVEN FEATURE EXTRACTOR

Extract high-level features from pixel data Classify

Page 169: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

CNN AS DATA-DRIVEN FEATURE EXTRACTOR

Extract high-level features from pixel data Classify

The feature extractor and the classifier are jointly learned in a end-to-end fashion

Page 170: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED APPROACHES

β€’ CNN as data-driven feature extractor

β€’ Transfer learning

β€’ Self-supervised learning

β€’ Autoencoders

β€’ Domain-based

β€’ Generative models

Page 171: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED APPROACHES

β€’ CNN as data-driven feature extractor

β€’ Transfer learning

β€’ Self-supervised learning

β€’ Autoencoders

β€’ Domain-based

β€’ Generative models

Page 172: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

THE THREE STEP IN ANOMALY DETECTION IN IMAGES

Recall the three major ingredients

β€’ Feature extraction

β€’ Anomaly score

β€’ Decision rule

Page 173: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

CNN AS DATA-DRIVEN FEATURE EXTRACTOR

Extract high-level features from pixel data Classify

The feature vector extracted from the last layer can be modeled as a random vector

Page 174: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

TRANSFER LEARNING

Idea:

β€’ Use a pretrained network 𝐢𝐢𝐹𝐹𝐹𝐹 (e.g. AlexNet), that was trained for a different task and on a different dataset

β€’ Throw away the last layer(s)

β€’ Use the reduced 𝐢𝐢𝐹𝐹𝐹𝐹 πœ“πœ“ to build a new dataset 𝑇𝑇𝑇𝑇𝒔 from 𝑇𝑇𝑇𝑇:

𝑇𝑇𝑇𝑇′ = {πœ“πœ“ π’”π’”π’Šπ’Š , π’”π’”π’Šπ’Š ∈ 𝑇𝑇𝑇𝑇}

β€’ Train your favorite anomaly detector on 𝑇𝑇𝑇𝑇𝒔 to define the anomaly score and the decision rule

Page 175: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Napoletano P., Piccoli F., Schettini R., "Anomaly Detection in Nanofibrous Materials by CNN-Based Self-Similarity", Sensors 2018

TRANSFER LEARNING

β€’ Features extracted from a 𝐢𝐢𝐹𝐹𝐹𝐹, i.e., πœ“πœ“(𝒔𝒔) is typically very large for deep networks (e.g. ResNET). Reduce data-dimensionality by PCA defined on a set of normal features

β€’ Anomalies can be detected by measuring distance w.r.t. normal features, possibly using clustering to speed up performance.

Page 176: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

TRANSFER LEARNING

Pros: pretrained networks are very powerful models, since theyusually trained on datasets with million of images

Cons:

β€’ the network is not trained on normal data. Meaningfulstructures in normal images might not be successfully capturedby network trained on images from a different domain (e.g. medical vs natural images)

β€’ The anomaly score and the CNN are not jointly learned, whilethe end-to-end learning strategy is the key to achieveimpressive results in supervised tasks.

Page 177: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED APPROACHES

β€’ CNN as data-driven feature extractor

β€’ Transfer learning

β€’ Self-supervised learning

β€’ Autoencoders

β€’ Domain-based

β€’ Generative models

Page 178: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SELF-SUPERVISED LEARNING

With transfer learning we use a model trained on a different dataset (e.g., Imagenet) to address a different task (e.g., classification).

The idea of self-supervised learning is to train a model on the target dataset but solving a different task, for which we can easily obtain the labels.

Page 179: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SELF-SUPERVISED LEARNING

We can build a labeled dataset for multiclass classification from normal data

β€’ Consider a set of 𝑇𝑇 transformation 𝒯𝒯 = {𝜏𝜏1, … , πœπœπ‘‡π‘‡}

β€’ Apply each transformation πœπœπ‘–π‘– to every 𝐬𝐬 ∈ 𝑇𝑇𝑇𝑇:

𝑇𝑇𝑇𝑇𝑛𝑛𝑛𝑛𝑛𝑛 = πœπœπ‘–π‘– 𝒔𝒔 , 𝑖𝑖 𝒔𝒔 ∈ 𝑇𝑇𝑇𝑇, 𝑖𝑖 = 1, … ,𝑇𝑇}

β€’ Train a 𝐢𝐢𝐹𝐹𝐹𝐹 on 𝑇𝑇𝑇𝑇𝑛𝑛𝑛𝑛𝑛𝑛‒ The output of the last layer of the 𝐢𝐢𝐹𝐹𝐹𝐹 is used as feature

vector

Page 180: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SELF-SUPERVISED LEARNING

We can build a labeled dataset for multiclass classification from normal data

β€’ Consider a set of 𝑇𝑇 transformation 𝒯𝒯 = {𝜏𝜏1, … , πœπœπ‘‡π‘‡}

β€’ Apply each transformation πœπœπ‘–π‘– to every 𝐬𝐬 ∈ 𝑇𝑇𝑇𝑇:

𝑇𝑇𝑇𝑇𝑛𝑛𝑛𝑛𝑛𝑛 = πœπœπ‘–π‘– 𝒔𝒔 , 𝑖𝑖 𝒔𝒔 ∈ 𝑇𝑇𝑇𝑇, 𝑖𝑖 = 1, … ,𝑇𝑇}

β€’ Train a 𝐢𝐢𝐹𝐹𝐹𝐹 on 𝑇𝑇𝑇𝑇𝑛𝑛𝑛𝑛𝑛𝑛‒ The output of the last layer of the 𝐢𝐢𝐹𝐹𝐹𝐹 is used as feature

vector

We can avoid using a pre-trained model, and train a CNN directly on our dataset

Page 181: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SELF-SUPERVISED LEARNING

𝜏𝜏2 𝜏𝜏3𝜏𝜏4

𝜏𝜏1 𝜏𝜏2 𝜏𝜏3 𝜏𝜏4

Example:

β€’ 𝑇𝑇𝑇𝑇 contains only images representing digit 3

β€’ 𝒯𝒯 contains rotations and horizontal/vertical flips

𝜏𝜏1

Page 182: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SELF-SUPERVISED LEARNING

Example:

β€’ 𝑇𝑇𝑇𝑇 contains only images representing digit 3

β€’ 𝒯𝒯 contains rotations and horizontal/vertical flips

𝜏𝜏1𝜏𝜏2 𝜏𝜏3

𝜏𝜏4

𝜏𝜏1 𝜏𝜏2 𝜏𝜏3 𝜏𝜏4

Training set of normal data

Labeled training set with 𝑇𝑇 classes

Page 183: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SELF-SUPERVISED LEARNING

We can use the second last layer as feature vector and train an anomaly detector

Problem: we have to split our training set in two sets. The first set is used to train the classifier, the second one to train the anomaly detector

SoftmaxConvolution Pooling Fully connected

𝒔𝒔

Page 184: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Golan, El-Yaniv, β€œDeep Anomaly Detection Using Geometric Transformations”, NeurIPS 2018

SELF-SUPERVISED LEARNING

Another approach is to compute feed the network with the all the trasformed versions πœπœπ‘–π‘–(𝒔𝒔) of the test sample 𝒔𝒔 to obtain {πœΆπœΆπ‘–π‘–}𝑖𝑖=1𝑇𝑇

SoftmaxConvolution Pooling Fully connected

πœπœπ‘–π‘–(𝒔𝒔)πœΆπœΆπ‘–π‘–

Page 185: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Golan, El-Yaniv, β€œDeep Anomaly Detection Using Geometric Transformations”, NeurIPS 2018

SELF-SUPERVISED LEARNING

Another approach is to compute feed the network with the all the trasformed versions πœπœπ‘–π‘–(𝒔𝒔) of the test sample 𝒔𝒔 to obtain {πœΆπœΆπ‘–π‘–}𝑖𝑖=1𝑇𝑇

𝑠𝑠𝑐𝑐𝑠𝑠𝑠𝑠𝑠𝑠 𝒔𝒔 = 1 βˆ’1𝑇𝑇�𝑖𝑖=1

𝑇𝑇

πœΆπœΆπ‘–π‘– 𝑖𝑖

Where πœΆπœΆπ‘–π‘– 𝑖𝑖 is the posterior probability of label 𝑖𝑖 given πœπœπ‘–π‘–(𝒔𝒔)

SoftmaxConvolution Pooling Fully connected

πœπœπ‘–π‘–(𝒔𝒔)πœΆπœΆπ‘–π‘–

Page 186: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Golan, El-Yaniv, β€œDeep Anomaly Detection Using Geometric Transformations”, NeurIPS 2018

SELF-SUPERVISED LEARNING

The set of transformation has to be properly chosen:

β€’ if during training the trained classifier cannot discriminate the transformed samples, it does not extract meaningful feature for anomaly detection

β€’ Non-geometric transformations (Gaussian blur, gamma correction, sharpening) might eliminate important features and are less performing than geometric ones

Page 187: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED APPROACHES

β€’ CNN as data-driven feature extractor

β€’ Transfer learning

β€’ Self-supervised learning

β€’ Autoencoders

β€’ Domain-based

β€’ Generative models

Page 188: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

AUTOENCODERS (REVISITED)

Autoencoders can be trained directly on normal data by minimizing the reconstruction loss:

Encoder β„° Decoder π’Ÿπ’Ÿ

οΏ½π’”π’”βˆˆπ‘‡π‘‡π‘‡π‘‡

𝒔𝒔 βˆ’ π’Ÿπ’Ÿ β„° 𝒔𝒔 2

𝒔𝒔 𝜢𝜢 𝒔𝒔𝒔

Page 189: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

AUTOENCODERS (REVISITED)

We can fit a density model (e.g. Gaussian Mixture) on 𝜢𝜢 = β„°(𝒔𝒔):

𝜢𝜢 βˆΌοΏ½π‘–π‘–

πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖 ,

Where πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖 is the pdf of 𝒩𝒩 πππ’Šπ’Š, Σ𝑖𝑖

Encoder β„° Decoder π’Ÿπ’Ÿ

𝒔𝒔 𝜢𝜢 𝒔𝒔𝒔

Page 190: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EM-ALGORITHM FOR GAUSSIAN MIXTURES

Estimation of Gaussian Mixture parameters {πœ‹πœ‹π‘–π‘– ,𝝁𝝁𝑖𝑖 , Σ𝑖𝑖} from a training set πœΆπœΆπ‘›π‘› 𝑛𝑛is typically performed via EM-algorithm, that iterates the E and M steps

β€’ E-step: compute the membership weights 𝛾𝛾𝑛𝑛,𝑖𝑖 for each training sample πœΆπœΆπ‘›π‘›

𝛾𝛾𝑛𝑛,𝑖𝑖 =πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖(πœΆπœΆπ‘›π‘›)

βˆ‘π‘˜π‘˜ πœ‹πœ‹π‘˜π‘˜πœ‘πœ‘πππ‘˜π‘˜,Ξ£π‘˜π‘˜(πœΆπœΆπ‘›π‘›)

Bishop, β€œPattern recognition and machine learningβ€πœΆπœΆ

𝛾𝛾1 ∼ 1𝛾𝛾2 ∼ 0

Page 191: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EM-ALGORITHM FOR GAUSSIAN MIXTURES

Estimation of Gaussian Mixture parameters {πœ‹πœ‹π‘–π‘– ,𝝁𝝁𝑖𝑖 , Σ𝑖𝑖} from a training set πœΆπœΆπ‘›π‘› 𝑛𝑛is typically performed via EM-algorithm, that iterates the E and M steps

β€’ E-step: compute the membership weights 𝛾𝛾𝑛𝑛,𝑖𝑖 for each training sample πœΆπœΆπ‘›π‘›

𝛾𝛾𝑛𝑛,𝑖𝑖 =πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖(πœΆπœΆπ‘›π‘›)

βˆ‘π‘˜π‘˜ πœ‹πœ‹π‘˜π‘˜πœ‘πœ‘πππ‘˜π‘˜,Ξ£π‘˜π‘˜(πœΆπœΆπ‘›π‘›)

Bishop, β€œPattern recognition and machine learningβ€πœΆπœΆ

𝛾𝛾1 ∼ 0𝛾𝛾2 ∼ 1

Page 192: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EM-ALGORITHM FOR GAUSSIAN MIXTURES

Estimation of Gaussian Mixture parameters {πœ‹πœ‹π‘–π‘– ,𝝁𝝁𝑖𝑖 , Σ𝑖𝑖} from a training set πœΆπœΆπ‘›π‘› 𝑛𝑛is typically performed via EM-algorithm, that iterates the E and M steps

β€’ E-step: compute the membership weights 𝛾𝛾𝑛𝑛,𝑖𝑖 for each training sample πœΆπœΆπ‘›π‘›

𝛾𝛾𝑛𝑛,𝑖𝑖 =πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖(πœΆπœΆπ‘›π‘›)

βˆ‘π‘˜π‘˜ πœ‹πœ‹π‘˜π‘˜πœ‘πœ‘πππ‘˜π‘˜,Ξ£π‘˜π‘˜(πœΆπœΆπ‘›π‘›)

Bishop, β€œPattern recognition and machine learningβ€πœΆπœΆ

𝛾𝛾1 ∼12

𝛾𝛾2 ∼12

Page 193: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

EM-ALGORITHM FOR GAUSSIAN MIXTURES

Estimation of Gaussian Mixture parameters {πœ‹πœ‹π‘–π‘– ,𝝁𝝁𝑖𝑖 , Σ𝑖𝑖} from a training set πœΆπœΆπ‘›π‘› 𝑛𝑛is typically performed via EM-algorithm, that iterates the E and M steps

β€’ E-step: compute the membership weights 𝛾𝛾𝑛𝑛,𝑖𝑖 for each training sample πœΆπœΆπ‘›π‘›

𝛾𝛾𝑛𝑛,𝑖𝑖 =πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖(πœΆπœΆπ‘›π‘›)

βˆ‘π‘˜π‘˜ πœ‹πœ‹π‘˜π‘˜πœ‘πœ‘πππ‘˜π‘˜,Ξ£π‘˜π‘˜(πœΆπœΆπ‘›π‘›)

β€’ M-step: update the parameters of the Gaussian Mixture

πœ‹πœ‹π‘–π‘– = 1π‘π‘βˆ‘π‘›π‘› 𝛾𝛾𝑛𝑛,𝑖𝑖

πœ‡πœ‡π‘–π‘– = βˆ‘π‘›π‘› 𝛾𝛾𝑛𝑛,π‘–π‘–πœΆπœΆπ‘›π‘›Ξ£π‘›π‘›π›Ύπ›Ύπ‘›π‘›,𝑖𝑖

Σ𝑖𝑖 = βˆ‘π‘›π‘› 𝛾𝛾𝑛𝑛,𝑖𝑖 πœΆπœΆπ‘›π‘›βˆ’πππ‘–π‘– πœΆπœΆπ‘›π‘›βˆ’πππ‘–π‘– 𝑇𝑇

Σ𝑛𝑛𝛾𝛾𝑛𝑛,𝑖𝑖

Bishop, β€œPattern recognition and machine learning”

Page 194: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

AUTOENCODERS (REVISITED)

We can compute the likelihood of a test sample 𝒔𝒔 as:

β„’ 𝒔𝒔 = �𝑖𝑖

πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖(β„° 𝒔𝒔 ) ,

Encoder β„° Decoder π’Ÿπ’Ÿ

𝒔𝒔 𝜢𝜢 𝒔𝒔𝒔

Page 195: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

AUTOENCODERS (REVISITED)

We can compute the likelihood of a test sample 𝒔𝒔 as:

β„’ 𝒔𝒔 = �𝑖𝑖

πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖(β„° 𝒔𝒔 ) ,

Encoder β„° Decoder π’Ÿπ’Ÿ

𝒔𝒔 𝜢𝜢 𝒔𝒔𝒔

The autoencoder and the Gaussian Mixture are not jointly learned!

Page 196: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

JOINT LEARNING OF AUTOENCODER AND DENSITY MODEL

Idea: given a training set of N samples use a NN to predict the membership weights of each sample

𝒔𝒔 𝜢𝜢 𝒔𝒔𝒔

𝜢𝜢 𝜸𝜸

Zong et al, β€œDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection”, ICLR 2018

Estimation Network

Page 197: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

JOINT LEARNING OF AUTOENCODER AND DENSITY MODEL

Idea: given a training set of N samples use a NN to predict the membership weights of each sample

𝒔𝒔 𝜢𝜢 𝒔𝒔𝒔

𝜢𝜢 𝜸𝜸

Zong et al, β€œDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection”, ICLR 2018

Estimate the GM parameters as:

πœ‹πœ‹π‘–π‘– =1𝐹𝐹�𝑛𝑛

𝛾𝛾𝑛𝑛,𝑖𝑖

πœ‡πœ‡π‘–π‘– =βˆ‘π‘›π‘› 𝛾𝛾𝑛𝑛,π‘–π‘–πœΆπœΆπ‘›π‘›

Σ𝑛𝑛𝛾𝛾𝑛𝑛,𝑖𝑖

Σ𝑖𝑖 =βˆ‘π‘›π‘› 𝛾𝛾𝑛𝑛,𝑖𝑖 πœΆπœΆπ‘›π‘› βˆ’ 𝝁𝝁𝑖𝑖 πœΆπœΆπ‘›π‘› βˆ’ 𝝁𝝁𝑖𝑖 𝑇𝑇

Σ𝑛𝑛𝛾𝛾𝑛𝑛,𝑖𝑖 M-stepE-step

Estimation Network

Page 198: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

DEEP AUTOENCODING GAUSSIAN MIXTURE MODEL

Minimize the loss:

min�𝒔𝒔

𝒔𝒔 βˆ’ π’Ÿπ’Ÿ β„° 𝒔𝒔2

2+ πœ†πœ†β„›(β„° 𝒔𝒔 )

Where

β„› 𝜢𝜢 = βˆ’ log�𝑖𝑖

πœ‹πœ‹π‘–π‘–πœ‘πœ‘πππ‘–π‘–,Σ𝑖𝑖(𝜢𝜢)

Additional regularizations has to be imposed on Σ𝑖𝑖 to avoid trivial solution

β„›(β„° 𝒔𝒔 ) can be used an anomaly score for a sample 𝒔𝒔

Zong et al, β€œDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection”, ICLR 2018

Page 199: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SOME REMARKS

β€’ The estimation network introduces a regularization that helps to avoid local optima of the recontruction error

β€’ The autoencoder is then able to extract meaningful feature from normal data

β€’ Density estimation enables anomaly detection, but it is a more complicated task

Page 200: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED APPROACHES

β€’ CNN as data-driven feature extractor

β€’ Transfer learning

β€’ Self-supervised learning

β€’ Autoencoders

β€’ Domain-based

β€’ Generative models

Page 201: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED

We want to find an hypersphere that, in the feature space, encloses most of the normal data

Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)

Page 202: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED

We want to find an hypersphere that, in the feature space, encloses most of the normal data

We expect that anomalous data lie outside the sphere

Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)

Page 203: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED

Typically the sphere is computed in a high (possibly infinite) dimensional feature space

Feature are defined using kernels

β€’ Polinomial kernel

β€’ Gaussian kernel

πœ“πœ“

Page 204: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Ruff et al, β€œDeep One-Class Classification”, ICML 2018

SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED

Idea: can we learn the feature from normal data using a neural network?

πœ“πœ“

Page 205: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Ruff et al, β€œDeep One-Class Classification”, ICML 2018

SOFT-BOUNDARY DEEP SVDD

Minimize the loss:

min𝑇𝑇,𝜽𝜽

𝑇𝑇2 +1𝜈𝜈𝐹𝐹

�𝑛𝑛=1

𝑁𝑁

max{0, πœ“πœ“πœ½πœ½ 𝒔𝒔𝑛𝑛 βˆ’ 𝒄𝒄 𝟐𝟐 βˆ’ 𝑇𝑇2} + πœ†πœ† 𝜽𝜽 2

β€’ The samples 𝒔𝒔𝑛𝑛 such that πœ“πœ“πœ½πœ½ 𝒔𝒔𝑛𝑛 is inside the sphere do not contribute to the loss

β€’ 𝜈𝜈 provides a bound on the False Positive Rate

β€’ πœ†πœ† 𝜽𝜽 2is a regularization term

A test sample 𝒔𝒔 is anomalous if πœ“πœ“πœ½πœ½ 𝒔𝒔 βˆ’ 𝒄𝒄 > 𝑇𝑇

𝑇𝑇

𝒄𝒄

Page 206: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Ruff et al, β€œDeep One-Class Classification”, ICML 2018

SOFT-BOUNDARY DEEP SVDD

Minimize the loss:

min𝑇𝑇,𝜽𝜽

𝑇𝑇2 +1𝜈𝜈𝐹𝐹

�𝑛𝑛=1

𝑁𝑁

max{0, πœ“πœ“πœ½πœ½ 𝒔𝒔𝑛𝑛 βˆ’ 𝒄𝒄 𝟐𝟐 βˆ’ 𝑇𝑇2} + πœ†πœ† 𝜽𝜽 2

Remarks:

β€’ Some contraints must be imposed on the network πœ“πœ“πœ½πœ½ to avoid trivial solutions:

β€’ No bias terms

β€’ Unbounded activations

β€’ 𝒄𝒄 is not optimized but has to be precomputed from data

β€’ 𝒄𝒄 must be different from 𝒄𝒄0 = πœ“πœ“0 𝒔𝒔

Page 207: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Ruff et al, β€œDeep One-Class Classification”, ICML 2018

A SIMPLER FORMULATION: DEEP SVDD

min𝜽𝜽

+1𝐹𝐹�𝑛𝑛=1

𝑁𝑁

πœ“πœ“πœ½πœ½ 𝒔𝒔𝑛𝑛 βˆ’ 𝒄𝒄 2 + πœ†πœ† 𝜽𝜽 2

Cons:

β€’ No bound on the FPR provided by 𝜈𝜈

β€’ A threshold has to be chosen for the anomaly score:

πœ“πœ“πœ½πœ½ 𝒔𝒔 βˆ’ 𝒄𝒄 2

Page 208: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

SEMI-SUPERVISED APPROACHES

β€’ CNN as data-driven feature extractor

β€’ Transfer learning

β€’ Self-supervised learning

β€’ Autoencoders

β€’ Domain-based

β€’ Generative models

Page 209: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GENERATIVE MODELS

Goal:

generative models generate, given a training set of images (data) 𝑆𝑆, other images (data) that are similar to those in 𝑆𝑆

Page 210: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

WHAT FOR GENERATIVE MODELS?

β€’ Generative models can be used for data augmentation, simulation and planning

β€’ Realistic samples for artwork, super-resolution, colorization, etc.

β€’ You are getting close to the β€œholy grail” of modeling the distribution of natural images

Radford, et al β€œUnsupervised representation learning with deep convolutional generative adversarial networks. ICLR 2016

Page 211: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GENERATIVE ADVERSARIAL NETWORKS (GAN)

The GAN approach:

Do not look for an explicit density model πœ™πœ™π‘†π‘† describing the manifold of natural images.

Just find out a model able to generate samples that looks liketraining samples 𝑆𝑆 βŠ‚ ℝ𝑛𝑛

Instead of sampling from πœ™πœ™π‘†π‘†, just use:

β€’ Sample a seed from a known distribution πœ™πœ™π‘§π‘§β€’ Feed this seed to a learned transformation that generates

realistic samples, as if they were drawn from πœ™πœ™π‘†π‘†

Use a neural network to learn this transformation

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

Page 212: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GENERATIVE ADVERSARIAL NETWORKS (GAN)

The GAN approach:

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

𝑧𝑧 ∼ πœ™πœ™π‘§π‘§

Generative

Network

Draw a sample from the noise distribution

Page 213: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GENERATIVE ADVERSARIAL NETWORKS (GAN)

The GAN approach:

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

𝑧𝑧 ∼ πœ™πœ™π‘§π‘§

Generative

Network

Draw a sample from the noise distribution

Page 214: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GENERATIVE ADVERSARIAL NETWORKS (GAN)

The GAN solution: Train a pair of neural networks with differenttasks that compete in a sort of two player game.

These models are:

β€’ Generator 𝒒𝒒 that produces realistic samples e.g. taking as input some random noise. 𝒒𝒒 tries to fool the discriminator

β€’ Discriminator π’Ÿπ’Ÿ that takes as input an image and assess whether it is real or generated by 𝒒𝒒

Train the two and at the end, keep only 𝒒𝒒

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

Page 215: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GAN Architecture

𝑧𝑧 ∼ πœ™πœ™π‘§π‘§

𝒒𝒒

π’Ÿπ’Ÿ Real/Fake

Generated image

Real image fromthe training set 𝑆𝑆

Page 216: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

Using GAN

𝑧𝑧 ∼ πœ™πœ™π‘§π‘§

𝒒𝒒

π’Ÿπ’Ÿ Real/Fake

Generated image

Real image fromthe training set 𝑆𝑆

Discriminator π’Ÿπ’Ÿ is completely useless and as such dropped. After a successful GAN training, π’Ÿπ’Ÿ is not able to distinguish the real/fake

Page 217: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GAN

Both π’Ÿπ’Ÿ and 𝒒𝒒 are conveniently chosen as Neural Networks

Setting up the stage

Our networks take as input:

β€’ π’Ÿπ’Ÿ = π’Ÿπ’Ÿ(𝒔𝒔)β€’ 𝒒𝒒 = 𝒒𝒒 𝒛𝒛

𝒔𝒔 ∈ ℝ𝑛𝑛 is an input image (either real or generated by 𝒒𝒒) and 𝒛𝒛 ∈ ℝ𝑑𝑑 issome random noise to be fed to the generator.

Our network give as output:

π’Ÿπ’Ÿ β‹… :ℝ𝑛𝑛 β†’ [0,1]

the posteriori for an input to be a true image (1)

𝒒𝒒 β‹… :ℝ𝑑𝑑 β†’ ℝ𝑛𝑛

the generated imageGoodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

Page 218: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GAN TRAINING

A good discriminator is such:

β€’ π’Ÿπ’Ÿ 𝒔𝒔 is maximum when 𝑠𝑠 ∈ 𝑆𝑆

β€’ 1 βˆ’π’Ÿπ’Ÿ 𝒔𝒔 is maximum when 𝑠𝑠 was generated from 𝒒𝒒

β€’ 1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 is maximum when 𝒛𝒛 ∼ πœ™πœ™π‘π‘

Training π’Ÿπ’Ÿ consists in maximizing the binary cross-entroy

maxπ’Ÿπ’Ÿ

Eπ‘ π‘ βˆΌπœ™πœ™π‘†π‘† logπ’Ÿπ’Ÿ 𝒔𝒔 + Eπ‘§π‘§βˆΌπœ™πœ™π‘π‘ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

Page 219: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GAN TRAINING

A good discriminator is such:

β€’ π’Ÿπ’Ÿ 𝒔𝒔 is maximum when 𝑠𝑠 ∈ 𝑆𝑆

β€’ 1 βˆ’π’Ÿπ’Ÿ 𝒔𝒔 is maximum when 𝑠𝑠 was generated from 𝒒𝒒

β€’ 1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 is maximum when 𝒛𝒛 ∼ πœ™πœ™π‘π‘

Training π’Ÿπ’Ÿ consists in maximizing the binary cross-entroy

maxπ’Ÿπ’Ÿ

Eπ‘ π‘ βˆΌπœ™πœ™π‘†π‘† logπ’Ÿπ’Ÿ 𝒔𝒔 + Eπ‘§π‘§βˆΌπœ™πœ™π‘π‘ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

This has to be 1 since 𝑠𝑠 ∼ πœ™πœ™π‘†π‘†, thus images are real

This has to be 0 since 𝒒𝒒 𝒛𝒛is a generated (fake) image

Page 220: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GAN TRAINING

A good discriminator is such:

β€’ π’Ÿπ’Ÿ 𝒔𝒔 is maximum when 𝑠𝑠 ∈ 𝑆𝑆

β€’ 1 βˆ’π’Ÿπ’Ÿ 𝒔𝒔 is maximum when 𝑠𝑠 was generated from 𝒒𝒒

β€’ 1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 is maximum when 𝒛𝒛 ∼ πœ™πœ™π‘π‘

Training π’Ÿπ’Ÿ consists in maximizing the binary cross-entroy

maxπ’Ÿπ’Ÿ

Eπ‘ π‘ βˆΌπœ™πœ™π‘†π‘† logπ’Ÿπ’Ÿ 𝒔𝒔 + Eπ‘§π‘§βˆΌπœ™πœ™π‘π‘ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

A good generator 𝒒𝒒 is the one which makes π’Ÿπ’Ÿ to fail

min𝒒𝒒

maxπ’Ÿπ’Ÿ

Eπ‘ π‘ βˆΌπœ™πœ™π‘†π‘† logπ’Ÿπ’Ÿ 𝒔𝒔 + Eπ‘§π‘§βˆΌπœ™πœ™π‘π‘ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

Page 221: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ILLUSTRATION

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

𝒔𝒔 real𝒔𝒔 fakeπ’Ÿπ’Ÿ(𝒔𝒔)

𝒔𝒔

𝒛𝒛

Page 222: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

MNIST

Generated samples

nearest sample to the second-last

column

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

Page 223: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

INTERPOLATION IN THE LATENT SPACE

We can interpolate between two points in the latent space and obtain smooth transitions from a digit to another one

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014

Page 224: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

GAN FOR ANOMALY DETECTION

Idea: let us train a GAN on normal data. We expect that the generator 𝒒𝒒 cannot generate any anomalous sample 𝒔𝒔.

Problem: Given a test sample 𝒔𝒔 how can we determine if it could be generated by 𝒒𝒒?

𝒒𝒒

Latent space Manifold β„³ of normal data

Anomalous data

Page 225: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOGAN

Project the test sample 𝒔𝒔 on the manifold β„³ by solving the optimization problem:

�𝒛𝒛 = min𝒛𝒛

𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

β€’ 𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 ensures that 𝒔𝒔 is well approximated by the generator

β€’ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 ) ensures that the projection 𝒒𝒒 �𝒛𝒛 is similar to a real (normal) sample

𝒒𝒒

Latent space Manifold β„³ of normal data

�𝒛𝒛

Page 226: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Schlegl et al, "Unsupervised anomaly detection with generative adversarial networks to guide marker discovery", IPMI 2017

ANOGAN

Project the test sample 𝒔𝒔 on the manifold β„³ by solving the optimization problem:

�𝒛𝒛 = min𝒛𝒛

𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

β€’ 𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 ensures that 𝒔𝒔 is well approximated by the generator

β€’ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 ) ensures that the projection 𝒒𝒒 �𝒛𝒛 is similar to a real (normal) sample (since 𝒒𝒒 fools π’Ÿπ’Ÿ)

Anomaly score:

𝑠𝑠𝑐𝑐𝑠𝑠𝑠𝑠𝑠𝑠 𝒔𝒔 = 𝒒𝒒 �𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 �𝒛𝒛 )

Page 227: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

ANOGAN

Project the test sample 𝒔𝒔 on the manifold β„³ by solving the optimization problem:

�𝒛𝒛 = min𝒛𝒛

𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

β€’ 𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 ensures that 𝒔𝒔 is well approximated by the generator

β€’ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 ) ensures that the projection 𝒒𝒒 �𝒛𝒛 is similar to a real (normal) sample (since 𝒒𝒒 fools π’Ÿπ’Ÿ)

Anomaly score:

𝑠𝑠𝑐𝑐𝑠𝑠𝑠𝑠𝑠𝑠 𝒔𝒔 = 𝒒𝒒 �𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 �𝒛𝒛 )

We need to solve an optimization problem for each test sample!

Schlegl et al, "Unsupervised anomaly detection with generative adversarial networks to guide marker discovery", IPMI 2017

Page 228: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Donahue et al, "Adversarial feature learning", ICLR 2017

BIDIRECTIONAL GAN

𝒛𝒛

β„°(𝒔𝒔)

𝒒𝒒

β„°

𝒒𝒒(𝒛𝒛)

𝒔𝒔

π’Ÿπ’Ÿπ’’π’’ 𝒛𝒛 , 𝒛𝒛

𝒔𝒔,β„° 𝒔𝒔

Latent Space Data SpaceGenerator

Encoder

Discriminator

Page 229: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Donahue et al, "Adversarial feature learning", ICLR 2017

BIDIRECTIONAL GAN

𝒛𝒛

β„°(𝒔𝒔)

𝒒𝒒

β„°

𝒒𝒒(𝒛𝒛)

𝒔𝒔

π’Ÿπ’Ÿπ’’π’’ 𝒛𝒛 , 𝒛𝒛

𝒔𝒔,β„° 𝒔𝒔

Latent Space Data SpaceGenerator

Encoder

Discriminator

min𝒒𝒒,β„°

maxπ’Ÿπ’Ÿ

β„’(π’Ÿπ’Ÿ,β„°,𝒒𝒒)

β„’ π’Ÿπ’Ÿ,β„°,𝒒𝒒 = Eπ‘ π‘ βˆΌπœ™πœ™π‘†π‘† logπ’Ÿπ’Ÿ 𝒔𝒔,β„°(𝒔𝒔 + Eπ‘§π‘§βˆΌπœ™πœ™π‘π‘ log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 , 𝒛𝒛 )

Page 230: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Donahue et al, "Adversarial feature learning", ICLR 2017

BIDIRECTIONAL GAN

𝒛𝒛

β„°(𝒔𝒔)

𝒒𝒒

β„°

𝒒𝒒(𝒛𝒛)

𝒔𝒔

π’Ÿπ’Ÿπ’’π’’ 𝒛𝒛 , 𝒛𝒛

𝒔𝒔,β„° 𝒔𝒔

Latent Space Data SpaceGenerator

Encoder

Discriminator

It can be proved that on the manifold β„³ (i.e. on normal data):

β„° = π’’π’’βˆ’1

Page 231: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Zenati et al, "Efficient GAN-based anomaly detection", Workshop ICLR 2018

ANOGAN IMPROVED

We can use BiGAN to efficiently invert the generator in AnoGAN:

�𝒛𝒛 = min𝒛𝒛

𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

�𝒛𝒛 = β„°(𝒔𝒔)

Page 232: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Zenati et al, "Efficient GAN-based anomaly detection", Workshop ICLR 2018

ANOGAN IMPROVED

We can use BiGAN to efficiently invert the generator in AnoGAN:

�𝒛𝒛 = min𝒛𝒛

𝒒𝒒 𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log(1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 𝒛𝒛 )

�𝒛𝒛 = β„°(𝒔𝒔)

𝑠𝑠𝑐𝑐𝑠𝑠𝑠𝑠𝑠𝑠 𝒔𝒔 = 𝒒𝒒 �𝒛𝒛 βˆ’ 𝒔𝒔 + πœ†πœ† log 1 βˆ’π’Ÿπ’Ÿ 𝒒𝒒 �𝒛𝒛 =

= 𝒒𝒒(β„° 𝒔𝒔 ) βˆ’ 𝒔𝒔 + πœ†πœ† log 1 βˆ’ π’Ÿπ’Ÿ 𝒒𝒒 β„°(𝒔𝒔)

Page 233: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018

AUTOENCODER AS GENERATIVE MODELS

Denoising Autoencoder

Discriminatorβ„›

π’Ÿπ’Ÿ

𝒔𝒔 + 𝒩𝒩(0,𝜎𝜎2) 𝒛𝒛 𝒔𝒔𝒔 [0,1]

β€’ β„› tries to reconstruct sample 𝒔𝒔 from its noisy version �𝒔𝒔=𝒔𝒔 +𝒩𝒩(0,𝜎𝜎2)

β€’ π’Ÿπ’Ÿ tries to discriminate between noise-free and reconstructed samples

𝒒𝒒ℰ

Page 234: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018

AUTOENCODER AS GENERATIVE MODELS

Denoising Autoencoder

Discriminatorβ„›

π’Ÿπ’Ÿ

𝒔𝒔 + 𝒩𝒩(0,𝜎𝜎2) 𝒛𝒛 𝒔𝒔𝒔 [0,1]

β€’ β„› tries to reconstruct sample 𝒔𝒔 from its noisy version �𝒔𝒔=𝒔𝒔 +𝒩𝒩(0,𝜎𝜎2)

β€’ π’Ÿπ’Ÿ tries to discriminate between noise-free and reconstructed samples

𝒒𝒒ℰ

Page 235: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018

AUTOENCODER AS GENERATIVE MODELS

Denoising Autoencoder

Discriminatorβ„›

𝒔𝒔 + 𝒩𝒩(0,𝜎𝜎2) 𝒛𝒛 𝒔𝒔𝒔 [0,1]

minβ„›

maxπ’Ÿπ’Ÿ

Eπ’”π’”βˆΌπœ™πœ™π‘†π‘† logπ’Ÿπ’Ÿ 𝒔𝒔 + EοΏ½π’”π’”βˆΌπœ™πœ™π‘†π‘†+𝒩𝒩(0,𝜎𝜎2) log(1 βˆ’π’Ÿπ’Ÿ β„› �𝒔𝒔 )

π’Ÿπ’Ÿπ’’π’’β„°

Page 236: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018

AUTOENCODER AS GENERATIVE MODELS

Denoising Autoencoder

Discriminatorβ„›

𝒔𝒔 + 𝒩𝒩(0,𝜎𝜎2) 𝒛𝒛 𝒔𝒔𝒔 [0,1]

minβ„›

maxπ’Ÿπ’Ÿ

Eπ’”π’”βˆΌπœ™πœ™π‘†π‘† logπ’Ÿπ’Ÿ 𝒔𝒔 + EοΏ½π’”π’”βˆΌπœ™πœ™π‘†π‘†+𝒩𝒩(0,𝜎𝜎2) log(1 βˆ’ π’Ÿπ’Ÿ β„› �𝒔𝒔 )

We expect that β„› can successfully reconstruct (thus, fool π’Ÿπ’Ÿ) only normal sample:

𝑠𝑠𝑐𝑐𝑠𝑠𝑠𝑠𝑠𝑠 𝒔𝒔 = 1 βˆ’π’Ÿπ’Ÿ(β„› 𝒔𝒔 )

π’Ÿπ’Ÿπ’’π’’β„°

Page 237: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019

AUTOENCODER AS GENERATIVE MODELS

The generator 𝒒𝒒 may be able to generate samples also of anomalous class

In this case it would be impossible to use 𝒒𝒒 to discriminate between normal and anomalous samples

𝒒𝒒

β„°This may happen if the β„° maps anomalous samples in region of the latent space that were not explored during training

Page 238: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019

AUTOENCODER AS GENERATIVE MODELS

The generator 𝒒𝒒 may be able to generate samples also of anomalous class

In this case it would be impossible to use 𝒒𝒒 to discriminate between normal and anomalous samples

𝒒𝒒

β„°This may happen if the β„° maps anomalous samples in region of the latent space that were not explored during training

Idea: we can enforce a known distribution on the latent space

Page 239: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020Zong et al, β€œDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection”, ICLR 2018

AUTOENCODER AS GENERATIVE MODELS

𝒔𝒔 𝜢𝜢 𝒔𝒔𝒔

𝜢𝜢 𝜸𝜸

Estimate the GM parameters as:

πœ‹πœ‹π‘–π‘– =1𝐹𝐹�𝑛𝑛

𝛾𝛾𝑛𝑛,𝑖𝑖

πœ‡πœ‡π‘–π‘– =βˆ‘π‘›π‘› 𝛾𝛾𝑛𝑛,π‘–π‘–πœΆπœΆπ‘›π‘›

Σ𝑛𝑛𝛾𝛾𝑛𝑛,𝑖𝑖

Σ𝑖𝑖 =βˆ‘π‘›π‘› 𝛾𝛾𝑛𝑛,𝑖𝑖 πœΆπœΆπ‘›π‘› βˆ’ 𝝁𝝁𝑖𝑖 πœΆπœΆπ‘›π‘› βˆ’ 𝝁𝝁𝑖𝑖 𝑇𝑇

Σ𝑛𝑛𝛾𝛾𝑛𝑛,𝑖𝑖

Estimation Network

Page 240: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

AUTOENCODER AS GENERATIVE MODELS

The generator 𝒒𝒒 may be able to generate samples also of anomalous class

In this case it would be impossible to use 𝒒𝒒 to discriminate between normal and anomalous samples

Idea: we can enforce a known distribution on the latent space

This can be done using a latentdiscriminator on the latent space 𝒒𝒒

β„°

Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019

Page 241: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

MUCH MORE COMPLICATED ARCHITECTURE

Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019

Page 242: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

MUCH MORE COMPLICATED ARCHITECTURE

Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019

The posterior probability of the classifier is used as anomaly score

Page 243: Anomaly Detection in Images - polimi.it

Concluding Remarks

Page 244: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

A FEW CONCLUDING REMARKS

Nowadays, anomaly detection problems are ubiquitous in engineering and applied sciences.

The presented general framework encompasses most of algorithms in the literature, which often boil down to

β€’ Feature extraction

β€’ Definition of suitable statistics (anomaly score)

β€’ Applying decision rules to a set of random variables.

Page 245: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

A FEW CONCLUDING REMARKS

When data are characterized by complex structures, as in case of images and signals, the feature extraction phase is the most critical one.

Data-driven models provide meaningful representations of images, and these can be used to find good feature for anomaly detection.

Nowadays the most powerful algorithms for feature extraction are based on deep learning, and in particular Convolutional Neural Networks

Page 246: Anomaly Detection in Images - polimi.it

Boracchi, Carrera, ICIP 2020

A FEW CONCLUDING REMARKS

The key of the success of CNNs is the end-to-end learning, and in our case the feature extractor and the anomaly score are jointly learned.

Still, anomaly-detection methods based on deep learning rely on decision rule, which has to be defined according to some statistical criteria, e.g. to guarantee a fixed false positive rate.