Anomaly Detection in Images - polimi.it
Transcript of Anomaly Detection in Images - polimi.it
Anomaly Detection in ImagesGiacomo Boracchi,
Politecnico di Milano, DEIB.
http://home.deib.polimi.it/boracchi/
Diego Carrera,
System Research and Applications, STMicroelectronics, Agrate Brianza
October 25th, 2020
ICIP 2020, Abu Dhabi, United Arab Emirates
Boracchi, Carrera, ICIP 2020
GIACOMO BORACCHI
Mathematician (UniversitΓ Statale degli Studi di Milano 2004),
PhD in Information Technology (DEIB, Politecnico di Milano 2008)
Associate Professor since 2019 at DEIB (Computer Science), Polimi
Research Interests are mathematical and statistical methods for:
β’ Image / Signal analysis and processing
β’ Unsupervised learning, change / anomaly detection
Boracchi, Carrera, ICIP 2020
DIEGO CARRERA
Mathematician (UniversitΓ Statale degli Studi di Milano 2013),
PhD in Information Technology (DEIB, Politecnico di Milano 2019)
Researcher at STMicroelectronics since 2019
Research Interests are mainly focus on:
β’ Change detection in high dimensional datastreams
β’ Anomaly detection in signal and images
β’ Unsupervised learning algorithms
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION IN HEALTH
Mammography
Mam
mog
ram
s
James Heilman, MD / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)
Sato et al, A primitive study on unsupervised anomaly detection with an autoencoder in emergency head CT volumes, SPIE Medical Imaging, 2018
Boracchi, Carrera, ICIP 2020https://www.mvtec.com/company/research/datasets/mvtec-ad/
ANOMALY DETECTION FOR AUTOMATIC QUALITY CONTROL
Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472
Our Running Example: Monitoring Nanofiber Production
Our Running Example: Monitoring Nanofiber Production
Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472
Boracchi, Carrera, ICIP 2020
SYLICON WAFER MANUFACTURING
Defects detected as anomalies in microscope images
Boracchi, Carrera, ICIP 2020
Not only images
Boracchi, Carrera, ICIP 2020Di Bella, Carrera, Rossi, Fragneto, Boracchi Wafer Defect Map Classification Using Sparse Convolutional Neural Networks ICIAP09
DETECTION OF ANOMALOUS PATTERNS
Detect/Identify patterns in wafer defect maps
These might indicate faults,problems or malfunctioning
in the chip production.
Boracchi, Carrera, ICIP 2020
AUTOMATIC AND LONG TERM EGC MONITORING
Health monitoring / wearable devices:
Automatically analyze EGC tracings to detect arrhythmias or incorrect device positioning
D. Carrera, B. Rossi, D. Zambon, P. Fragneto, and G. Boracchi "ECG Monitoring in Wearable Devices by Sparse Models" in Proceedings of ECML-PKDD 2016, 16 pages
Boracchi, Carrera, ICIP 2020USCD Anomaly Dataset http://www.svcl.ucsd.edu/projects/anomaly/dataset.html
ANOMALOUS ACTIVITIES DETECTION IN VIDEOS
Boracchi, Carrera, ICIP 2020
FRAUD DETECTION IN CREDIT CARD TRANSACTIONS
Dal Pozzolo A., Boracchi G., Caelen O., Alippi C. and Bontempi G., βCredit Card Fraud Detection: a Realistic Modeling and a Novel Learning Strategyβ , IEEE TNNL 2017, 14 pages
Tutorial Outline
Boracchi, Carrera, ICIP 2020
PRESENTATION OUTLINE
Part 1, Problem Formulation and Statistical solutions:
β’ Problem formulation
β’ Performance measures
β’ Most relevant approaches
Part 2, Anomaly detection in images:
β’ The general approach
β’ Subspace / Feature extraction methods
β’ Reference-based methods
Part 3, Anomaly detection by deep learning models:
β’ Transfer Learning / Self-supervised
β’ Autoencoders
β’ Domain based, Generative models
Boracchi, Carrera, ICIP 2020
DISCLAIMERS
We will refer to either anomalies/changes according to our personal experience with a primary focus on the applications we have been addressing.
For a complete overview on anomaly algorithms please refer to surveys below.
We will consider unsupervised and semi-supervised approaches, as these better conform with anomaly detection settings
T. Ehret, A. Davy, JM Morel, M. Delbracio "Image Anomalies: A Review and Synthesis of Detection Methods", Journal of Mathematical Imaging and Vision, 1-34
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages.
Pimentel, M. A., Clifton, D. A., Clifton, L., Tarassenko, L. βA review of novelty detectionβ Signal Processing, 99, 215-249 (2014)
A. Zimek, E. Schubert, H.P. Kriegel. βA survey on unsupervised outlier detection in high-dimensional numerical dataβStatistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), 2012.
The Problem FormulationAnomaly Detection in Images
Boracchi, Carrera, ICIP 2020
ANOMALIES
βAnomalies are patterns in data that do not conform to a well defined notion of normal behaviorβ
Thus:
β’ Normal data are generated from a stationary process π«π«ππβ’ Anomalies are from a different process π«π«π΄π΄ β π«π«ππ
Examples:
β’ Frauds in the stream of all the credit card transactions
β’ Arrhythmias in ECG tracings
β’ Defective regions in an image, which do not conform a reference pattern
Anomalies might appear as spurious elements, and are typically the most informative samples in the stream
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (July 2009), 58 pages.
Boracchi, Carrera, ICIP 2020
PROBLEM FORMULATION: ANOMALY DETECTION IN IMAGES
Let π π be an image defined over the pixel domain π³π³ β β€2,
let ππ β π³π³ be a pixel and π π ππ the corresponding intesnity.
Our goal is to locate any anomalous region in π π , i.e. estimating the unknown anomaly mask ππ defined as
if ππ falls in a normal regionif ππ falls in an anomalous region
Ξ© ππ = οΏ½01
π π
Boracchi, Carrera, ICIP 2020
PROBLEM FORMULATION: ANOMALY DETECTION IN IMAGES
Let π π be an image defined over the pixel domain π³π³ β β€2,
let ππ β π³π³ be a pixel and π π ππ the corresponding intesnity.
Our goal is to locate any anomalous region in π π , i.e. estimating the unknown anomaly mask ππ defined as
if ππ falls in a normal regionif ππ falls in an anomalous region
Ξ© ππ = οΏ½01
Ξ©π π
Boracchi, Carrera, ICIP 2020
PATCH-WISE ANOMALY DETECTION
The goal not determining whether the whole image is normal or anomalous, but locate/segment possible anomalies
Therefore, it is convenient to
1. Analyze the image patch-wise
2. Isolate regions containingpatches that are detected asas anomalies
Boracchi, Carrera, ICIP 2020
PATCH-WISE ANOMALY DETECTION
The goal not determining whether the whole image is normal or anomalous, but locate/segment possible anomalies
Therefore, it is convenient to
1. Analyze the image patch-wise
2. Isolate regions containingpatches that are detected asas anomalies
Boracchi, Carrera, ICIP 2020
TYPICAL ASSUMPTIONS
A training set ππππ is provided for configuring the AD algorithm
Depending on the algorithm
β’ Only normal images -> semi-supervised methods
β’ Unlabeled images -> unsupervised methods
β’ Annotated images -> supervised methods
ππππ
Anomaly Detection in StatisticsAnomaly Detection Problem where observations are i.i.d.
realizations of a random variable
Boracchi, Carrera, ICIP 2020
Forget about images for a while and look for anomalies in a set of
random vectors (or in a datastream)
β¦ these techniques will come veryhandy also for images
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
ANOMALY-DETECTION IN A STATISTICAL FRAMEWORK
Often, the anomaly-detection problem boils down to:
Monitor a set of data (not necessarily a stream)
ππ π‘π‘ , π‘π‘ = π‘π‘0, β¦ , ππ π‘π‘ β βππ
where π₯π₯(π‘π‘) are realizations of a random variable having pdf ππππ,and detect outliers i.e., those points that do not conform with ππππ
ππ π‘π‘ βΌ οΏ½ππ0 normal dataππ1 anomalies ,
π‘π‘
ππ(π‘π‘) β¦β¦
ππ1ππ0 ππ0
Boracchi, Carrera, ICIP 2020
ANOMALY-DETECTION IN A STATISTICAL FRAMEWORK
Often, the anomaly-detection problem boils down to:
Monitor a set of data (not necessarily a stream)
ππ π‘π‘ , π‘π‘ = π‘π‘0, β¦ , ππ π‘π‘ β βππ
where π₯π₯(π‘π‘) are realizations of a random variable having pdf ππππ,and detect outliers i.e., those points that do not conform with ππππ
ππ π‘π‘ βΌ οΏ½ππ0 normal dataππ1 anomalies ,
π‘π‘
ππ(π‘π‘) β¦β¦
ππ1ππ0 ππ0
Boracchi, Carrera, ICIP 2020
ANOMALY-DETECTION IN A STATISTICAL FRAMEWORK
Often, the anomaly-detection problem boils down to:
Monitor a set of data (not necessarily a stream)
ππ π‘π‘ , π‘π‘ = π‘π‘0, β¦ , ππ π‘π‘ β βππ
where π₯π₯(π‘π‘) are realizations of a random variable having pdf ππππ,and detect outliers i.e., those points that do not conform with ππππ
ππ π‘π‘ βΌ οΏ½ππ0 normal dataππ1 anomalies ,
π‘π‘
ππ(π‘π‘) β¦β¦
ππ1ππ0 ππ0
Locate those samples that do not conform the normal ones or a model explaining normal ones
Statistical Approaches..to detect anomalies
Boracchi, Carrera, ICIP 2020
THE TYPICAL SOLUTIONS
Most algorithms are composed of:
β’ A statistic that has a known response to normal data (e.g., the average, the sample variance, the log-likelihood, the confidence of a classifier, an βanomaly scoreββ¦)
β’ A decision rule to analyze the statistic (e.g., an adaptive threshold, a confidence region)
Boracchi, Carrera, ICIP 2020
THE TYPICAL SOLUTIONS
π‘π‘
ππ(π‘π‘) β¦β¦
data
Boracchi, Carrera, ICIP 2020
THE TYPICAL SOLUTIONS
π‘π‘
ππ(π‘π‘) β¦β¦
π‘π‘
ππ(ππ)
β¦
πΎπΎstatistic
data
Boracchi, Carrera, ICIP 2020
THE TYPICAL SOLUTIONS
π‘π‘
ππ(π‘π‘) β¦β¦
π‘π‘
ππ(ππ)
β¦
πΎπΎstatistic
decision rule: ππ ππ > πΎπΎ
data
Boracchi, Carrera, ICIP 2020
THE TYPICAL SOLUTIONS
π‘π‘
ππ(π‘π‘) β¦β¦
π‘π‘
ππ(ππ)
β¦
πΎπΎstatistic
decision rule: ππ ππ > πΎπΎ
data
Statistics and decision rules are βone-shotβ, analyzing a set of historical data or each new data (or chunk) independently
Performance MeasuresAssessing performance of anomaly detection algorithms
Boracchi, Carrera, ICIP 2020
ANOMALY-DETECTION PERFORMANCE
Anomaly detection performance:
β’ True positive rate: ππππππ = # anomalies detected# anomalieπ π
β’ False positive rate: πΉπΉππππ = # normal samples detected# normal samples
You have probably also heard of
β’ False negative rate (or miss-rate): πΉπΉπΉπΉππ = 1 β ππππππβ’ True negative rate (or specificity): πππΉπΉππ = 1 β πΉπΉππππ
β’ Precision on anomalies: # anomalies detected
# detections
β’ Recall on anomalies (or sensitivity, hit-rate): ππππππ
Boracchi, Carrera, ICIP 2020
TPR / FPR TRADE-OFF
There is always a trade-off between π»π»π»π»π»π» and πππ»π»π»π» (and similarly for derived quantities), which is ruled by algorithm parameters
By changing πΎπΎ performance changes (e.g. true positive increases but also false positives do)
π‘π‘
ππ(ππ)
β¦
πΎπΎstatistic
decision rule: ππ ππ > πΎπΎ
Boracchi, Carrera, ICIP 2020
TPR / FPR TRADE-OFF
There is always a trade-off between π»π»π»π»π»π» and πππ»π»π»π» (and similarly for derived quantities), which is ruled by algorithm parameters
By changing πΎπΎ performance changes (e.g. true positive increases but also false positives do)
π‘π‘
ππ(ππ)
β¦πΎπΎ
statisticdecision rule: ππ ππ > πΎπΎ
Boracchi, Carrera, ICIP 2020
ANOMALY-DETECTION PERFORMANCE
There is always a trade-off between π»π»π»π»π»π» and πππ»π»π»π» (and similarly for derived quantities), which is ruled by algorithm parameters
Thus, to correctly assess performance it is necessary to consider at least two indicators (e.g., ππππππ,πΉπΉππππ)
Indicators combining both ππππππ and πΉπΉππππ:
Accuracy = # anomalies detected + # normal samples not detected
# samples
F1 score = 2# anomalies detected
# detections + # anomalies
These equal 1 in case of βideal detectorβ which detects all the anomalies and has no false positives
Boracchi, Carrera, ICIP 2020
ANOMALY-DETECTION PERFORMANCE
Comparing different methods might be tricky since we have to make sure that both have been configured in their best conditions
Testing a large number of parameters lead to the ROC (receiver operating characteristic) curve
The ideal detector would achieve:
β’ πΉπΉππππ = 0%,β’ ππππππ = 100%
Thus, the closer to (0,1) the better
The largest the Area Under the Curve (AUC), the better
The optimal parameter is the oneyielding the point closest to (0,1)
(πΉπΉππππ,ππππππ) for a specific parameter
Statitsical Approaches to Detect Anomalies
β¦when ππ0 and ππ1 are unknown
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION WHEN ππππ AND ππππ ARE UNKNOWN
Most often, only a training set ππππ is provided:
There are three scenarios:
β’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in ππππ.
β’ Unsupervised: ππππ is provided without label.
β’ Supervised: Both normal and anomalous training data are provided in ππππ.
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION WHEN ππππ AND ππππ ARE UNKNOWN
Most often, only a training set ππππ is provided:
There are three scenarios:
β’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in ππππ.
β’ Unsupervised: ππππ is provided without label.
β’ Supervised: Both normal and anomalous training data are provided in ππππ.
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED ANOMALY DETECTION
In semi-supervised methods the ππππ is composed of normal data
ππππ = π₯π₯ π‘π‘ , π₯π₯ βΌ ππ0 and π‘π‘ < π‘π‘0,
Very practical assumptions:
β’ Normal data are easy to gather and the vast majority
β’ Anomalous data are difficult/costly to collect/select and it would be difficult to gather a representative training set
β’ Training examples in ππππ might not be representative of all the possible anomalies that can occur
All in all, it is often safer to detect any data departing from the normal conditions
Semi-supervised anomaly-detection methods are also referred to as novelty-detection methods
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Pimentel, M. A., Clifton, D. A., Clifton, L., Tarassenko, L. βA review of novelty detectionβ Signal Processing, 99, 215-249 (2014)
Boracchi, Carrera, ICIP 2020
DENSITY-BASED METHODS
Density-Based Methods: Normal data occur in high probability regions of a stochastic model, while anomalies occur in the low probability regions of the model
During training: οΏ½ππ0 can be estimated from the training set
ππππ = π₯π₯ π‘π‘ , π₯π₯ βΌ ππ0 and π‘π‘ < π‘π‘0,β’ parametric models (e.g., Gaussian mixture models)
β’ nonparametric models (e.g. KDE, histograms)
During testing:
β’ Anomalies are detected as data yielding οΏ½ππ0 ππ < ππ
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
DENSITY-BASED METHODS: MONITORING THE LOG-LIKELIHOOD
Monitoring the log-likelihood of data w.r.t οΏ½ππ0 allow to address anomaly-detection problems in multivariate data
1. During training, estimate οΏ½ππ0 from ππππ
2. During testing, compute
β ππ π‘π‘ = log( οΏ½ππ0(ππ(π‘π‘)))
3. Monitor β ππ π‘π‘ , π‘π‘ = 1, β¦
π‘π‘
ππ(π‘π‘)
βπππ‘π‘
β¦β¦β¦
Boracchi, Carrera, ICIP 2020
DENSITY-BASED METHODS: MONITORING THE LOG-LIKELIHOOD
Monitoring the log-likelihood of data w.r.t οΏ½ππ0 allow to address anomaly-detection problems in multivariate data
1. During training, estimate οΏ½ππ0 from ππππ
2. During testing, compute
β ππ π‘π‘ = log( οΏ½ππ0(ππ(π‘π‘)))
3. Monitor β ππ π‘π‘ , π‘π‘ = 1, β¦
This is quite a popular approach in either anomaly and change detection algorithms
L. I. Kuncheva, βChange detection in streaming multivariate data using likelihood detectors," IEEE Transactions on Knowledgeand Data Engineering, vol. 25, no. 5, 2013.
X. Song, M. Wu, C. Jermaine, and S. Ranka, βStatistical change detection for multidimensional data," in Proceedings ofInternational Conference on Knowledge Discovery and Data Mining (KDD), 2007.
J. H. Sullivan and W. H. Woodall, βChange-point detection of mean vector or covariance matrix shifts using multivariateindividual observations," IIE transactions, vol. 32, no. 6, 2000.
C. Alippi, G. Boracchi, D. Carrera, M. Roveri, "Change Detection in Multivariate Datastreams: Likelihood and DetectabilityLoss" IJCAI 2016, New York, USA, July 9 - 13
Boracchi, Carrera, ICIP 2020
DENSITY-BASED METHODS
Advantages:
β’ οΏ½ππ0 ππ indicates how safe a detection is (like a p-value)
β’ If the density estimation process is robust to outliers, it is possible to tolerate few anomalous samples in ππππ
β’ in relatively small dimensions, you might use non-parametric models like histograms
Challenges:
β’ It is challenging to fit models for high-dimensional data
β’ Histograms traditionally suffer of curse of dimensionality when ππ increases
β’ Often the 1D histograms of the marginals are monitored, ignoring the correlations among components
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
DOMAIN-BASED METHODS
Domain-based methods: Estimate a boundary around normal data, rather than the density of normal data.
A drawback of density-estimation methods is that they are meant to be accurate in high-density regions, while anomalies live in low-density ones.
One-Class SVM are domain-based methods defined by the normal samples at the periphery of the distribution.
SchΓΆlkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., Platt, J. C. "Support Vector Method for Novelty Detection". In NIPS 1999 (Vol. 12, pp. 582-588).
Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)
Pimentel, M. A., Clifton, D. A., Clifton, L., Tarassenko, L. βA review of novelty detectionβ Signal Processing, 99, 215-249 (2014)
Boracchi, Carrera, ICIP 2020
SchΓΆlkopf, B., Williamson, R. C., Smola, A. J., Shawe-Taylor, J., Platt, J. C. "Support Vector Method for Novelty Detection". In NIPS 1999 (Vol. 12, pp. 582-588).
ONE-CLASS SVM (SCHΓLKOPF ET AL. 1999)
Idea: define boundaries by estimating a binary function ππ that captures regions of the input space where density is higher.
As in support vector methods, ππ is defined in the feature space πΉπΉ and decision boundaries are defined by a few support vectors (i.e., a few normal data).
Let ππ(ππ) the feature associated to ππ, ππ is defined as
ππ ππ = sign(< π€π€,ππ ππ > βππ)
Where the hyperplane parameters π€π€,ππ are optimized to yield a function that is positive on most training samples. Thus in the feature space, normal points can be separated from the origin.
A linear separation in the feature space corresponds to a variety of nonlinear boundaries in the space of ππ.
Boracchi, Carrera, ICIP 2020Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)
ONE-CLASS SVM (TAX AND DUIN 1999)
Boundaries of normal region can be also defined by an hypersphere that, in the feature space, encloses most of the normal data i.e., ππ(ππ) for ππ β ππππ.
Similar detection formulas hold, measuring the distance in the feature space from the sphere center.
The sphere center can be defined in terms of support vectors.
Remarks: In both one-class approaches, the amount of samples that falls within the margin (outliers) is controlled by regularization parameters.
This parameter regulates the number of outliers in the training set and the detector sensitivity.
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION WHEN ππππ AND ππππ ARE UNKNOWN
Most often, only a training set ππππ is provided:
There are three scenarios:
β’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in ππππ.
β’ Unsupervised: ππππ is provided without label.
β’ Supervised: Both normal and anomalous training data are provided in ππππ.
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
UNSUPERVISED ANOMALY-DETECTION
The training set ππππ might contain both normal and anomalous data. However, no labels are provided
ππππ = π₯π₯ π‘π‘ , π‘π‘ < π‘π‘0Underlying assumption: anomalies are rare w.r.t. normal data ππππ
In principle:
β’ Density/Domain based methods that are robust to outliers can be applied in an unsupervised scenario
β’ Unsupervised methods can be improved whenever labels are available
Boracchi, Carrera, ICIP 2020
DISTANCE-BASED METHODS
Distance-based methods: normal data fall in dense neighborhoods, while anomalies are far from their closest neighbors.
A critical aspect is the choice of the similarity measure to use.
Anomalies are detected by monitoring:
β’ distance between each data and its ππ βnearest neighbor
β’ the density of each data relatively to its neighbors
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages
Zhao, M., Saligrama, V. βAnomaly detection with score functions based on nearest neighbor graphsβ. NIPS 2009
A. Zimek, E. Schubert, H. Kriegel. βA survey on unsupervised outlier detection in high-dimensional numerical dataβ SADM 2012
Boracchi, Carrera, ICIP 2020
DISTANCE-BASED METHODS
Distance-based methods: normal data fall in dense neighborhoods, while anomalies are far from their closest neighbors.
A critical aspect is the choice of the similarity measure to use.
Anomalies are detected by monitoring:
β’ distance between each data and its ππ βnearest neighbor
β’ the above distance considered relatively to neighbors
β’ whether they do not belong to clusters, or are at the cluster periphery, or belong to small and sparse clusters
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
DISTANCE-BASED METHODS
Distance-based methods: normal data fall in dense neighborhoods, while anomalies are far from their closest neighbors.
A critical aspect is the choice of the similarity measure to use.
Anomalies are detected by monitoring:
β’ distance between each data and its ππ βnearest neighbor
β’ the above distance considered relatively to neighbors
β’ whether they do not belong to clusters, or are at the cluster periphery, or belong to small and sparse clusters
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Randomly choose1. a component π₯π₯ππ
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Randomly choose1. a component π₯π₯ππ
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Randomly choose1. a component π₯π₯ππ2. a value in the range of
projections of ππππ over the ππ-th component
This yields a splitting
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
This yields a splittingcriteria
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Repeat the procedure on each node:
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Repeat the procedure on each node:
Randomly selecta component
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Repeat the procedure on each node:
Randomly selecta component and
a cut point
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Randomly choosea component and a value within the range and definea splitting criteria
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Repeat the procedure on the
nodes:Randomly selecta component and
a cut point
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Builds upon the rationale that"anomalies are easier to separate from the rest of normal data"
This idea is implemented very efficiently through a forest of binarytrees that are constructed via an iterative procedure
Anomalies lies in leaves close to
the root.
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
An anomalous point (π₯π₯0) can be easily isolated
Genuine points (π₯π₯ππ) are instead difficult to isolate.
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Anomalies
{IFOR [ β¦ ]
π₯π₯0
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST
Normal data
{IFOR [ β¦ ]
π₯π₯ππ
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST: TESTING
Compute πΈπΈ β ππ , the average path length among all the trees in the forest, of a test sample ππ
ππ
Boracchi, Carrera, ICIP 2020Fei Tony Liu, Kai Ming Ting and Zhi-Hua Zhou, Isolation Forest, ICDM 2008
ISOLATION FOREST: TESTING
A test sample is identified as anomalous when:
ππ ππ = 2βπΈπΈ β ππππ ππ > πΎπΎ
β’ ππ : number of sessions in ππππ
β’ ππ ππ : average path length of unsuccessful search in Binary
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION WHEN ππππ AND ππππ ARE UNKNOWN
Most often, only a training set ππππ is provided:
There are three scenarios:
β’ Semi-Supervised: Only normal training data are provided, i.e. no anomalies in ππππ.
β’ Unsupervised: ππππ is provided without label.
β’ Supervised: Both normal and anomalous training data are provided in ππππ.
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020T. Ehret, A. Davy, JM Morel, M. Delbracio "Image Anomalies: A Review and Synthesis of Detection Methods", Journal of Mathematical Imaging and Vision, 1-34
SUPERVISED ANOMALY DETECTION β DISCLAIMER
Most papers and reviews agree that supervised methods have not to be considered part of anomaly detection, because:
β’ Anomalies in general lacks of a statistical coherence
β’ Not (enough) training samples are provided for anomalies
However,
β’ Some supervised problems are often referred to as Β«detectionΒ», in case of severe class imbalance (e.g. fraud detection)
β’ Supervised models can be transferred in unsupervised settings, in particular for deep learning
Boracchi, Carrera, ICIP 2020
SUPERVISED ANOMALY DETECTION - SOLUTIONS
In supervised methods training data are annotated and divided in normal (+) and anomalies (β) :
ππππ = ππ(π‘π‘),π¦π¦(π‘π‘) ,ππ β βππ ,π¦π¦ β +,β and π‘π‘ < π‘π‘0Solution:
β’ Train a two-class classifier to distinguish normal vs anomalous data.
During training:
β’ Train a classifier π¦π¦ from ππππ.
During testing:
β’ Compute the classifier output π¦π¦ ππ , or
β’ Set a threshold on the posterior πππ¦π¦ β ππ , or
β’ Select the ππ βmost likely anomalies
Boracchi, Carrera, ICIP 2020
SUPERVISED ANOMALY DETECTION β CHALLENGES
These classification problems are challenging because these anomaly-detection settings typically imply:
β’ Class Imbalance: Normal data far outnumber anomalies
β’ Concept Drift: Anomalies might evolve over time, thus the few annotated anomalies might not be representative of anomalies occurring during operations
β’ Selection Bias: Training samples are typically selected through a closed-loop and biased procedure. Often only detected anomalies are annotated, and the vast majority of the stream remain unsupervised. This biases the selection of training samples.
Dal Pozzolo A., Boracchi G., Caelen O., Alippi C. and Bontempi G., βCredit Card Fraud Detection: a Realistic Modeling and a Novel Learning Strategyβ , IEEE TNNL 2017, 14 pages
Boracchi, Carrera, ICIP 2020
β¦ ANOMALY DETECTION IN CREDIT CARD TRANSACTIONS
Fraud detection in credit card transactions/web sessions
Dal Pozzolo A., Boracchi G., Caelen O., Alippi C. and Bontempi G., βCredit Card Fraud Detection: a Realistic Modeling and a Novel Learning Strategyβ , IEEE TNNL 2017, 14 pages
Letβs go back to images nowβ¦Applying statistical methods to image patches
Boracchi, Carrera, ICIP 2020
OUR RUNNING EXAMPLE
Goal: Automatically measure area covered by defects
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION IN IMAGES
The goal not determining whether the whole image is normal or anomalous, but locate/segment possible anomalies
Therefore, it is convenient to
1. Analyze the image patch-wise
2. Isolate regions containingpatches that are detected asas anomalies
Normal patches
Anomalous patches
Boracchi, Carrera, ICIP 2020
Du, B., Zhang, L.: Random-selection-based anomaly detector for hyperspectral imagery. IEEE Transactions on Geoscience and Remote sensing
DENSITY-BASED APPROACH ON IMAGE PATCHES
A density-based approach to AD would be:
Training
i. Split the normal image in patches ππ
ii. Fit a statistical model οΏ½ππ0 = π©π© ππ, Ξ£ describing normal patches.
Testing
i. Split the test image in patches
ii. Compute οΏ½ππ0(ππ) the likelihood of each test patch ππ
iii. Detect anomalies by thresholding the likelihood
Boracchi, Carrera, ICIP 2020
Du, B., Zhang, L.: Random-selection-based anomaly detector for hyperspectral imagery. IEEE Transactions on Geoscience and Remote sensing
X Xie, M Mirmehdi βTexture exemplars for defect detection on random texturesβ β ICPR 2005
THE LIMITATIONS OF THE RANDOM VARIABLE MODEL
This model is rarely accurate on natural images.
Small patches (e.g. 2 Γ 2 or 5 Γ 5) are typically preferred
The model becomes even more unfit as the patch size
increases
Boracchi, Carrera, ICIP 2020
REAL WORLD DETECTION PROBLEMS
Random variable model does not successfully apply to signals or images (not even small portions)
Stacking each patch/signal ππ β βππ in a vector ππ is not convenient:
β’ Data dimension ππ becomes huge
β’ Strong correlations among components, difficult to directly model by a probability density function ππ0
Boracchi, Carrera, ICIP 2020
THE LIMITATIONS OF THE RANDOM VARIABLE MODEL
Distribution of adjacent pixel values inside a patch:
Boracchi, Carrera, ICIP 2020
THE LIMITATIONS OF THE RANDOM VARIABLE MODEL
Distribution of adjacent pixel values inside a patch:
Data are clearly correlated in space, and are difficult to model by a smooth density function (e.g., Gaussians)
The random variable model is not very appropriate for describing images
Boracchi, Carrera, ICIP 2020Bengio, Courville, Vincent, Representation Learning: A Review and New Perspectives, IEEE Pattern Analysis and Machine Intelligence β 2013
THE LIMITATIONS OF THE RANDOM VARIABLE MODEL
Patches from natural images leave close to a low dimensional manifold
These means that patches can be well described by few latent variables
Boracchi, Carrera, ICIP 2020
A SIMPLE EXPERIMENT
Letβs approximate this manifold with the simplest one: a linear subspace
In practice, we compute the PCA of training patches and consider the PCA score as latent variables
Boracchi, Carrera, ICIP 2020
A SIMPLE EXPERIMENT
Distribution of first 6 PCA coefficients:
Boracchi, Carrera, ICIP 2020
A SIMPLE EXPERIMENT
We fit a οΏ½ππ0 = π©π© ππ, Ξ£ on PCA components to describe normal patches, and perform anomaly detection
Part 2: Anomaly detection in imagesOut of the βRandom Variableβ World:
signal-based models for images
Boracchi, Carrera, ICIP 2020
THE TYPICAL APPROACH
Most of the considered methods
1. Estimate a model describing normal data (background model)
2. Use the background model to provide, for each test patch, an anomaly score, or measure of rareness
3. Apply a decision rule to the anomaly score to detect anomalies (typically thresholding)
4. [optional] Perform post-processing operations to enforce smooth detections and avoid isolated pixels that are not consistent with neighbourhoods
Remark: Statistical-based approaches seen before uses as background model the statistical distribution οΏ½ππ0 and a statistic as anomaly score
Boracchi, Carrera, ICIP 2020
THE TYPICAL APPROACH
Most of the considered methods
1. Estimate a model describing normal data (background model)
2. Use the background model to provide, for each test patch, an anomaly score, or measure of rareness
3. Apply a decision rule to the anomaly score to detect anomalies (typically thresholding)
4. [optional] Perform post-processing operations to enforce smooth detections and avoid isolated pixels that are not consistent with neighbourhoods
Remark: Statistical-based approaches seen before uses as background model the statistical distribution οΏ½ππ0 and a statistic as anomaly score
The background model is used to bring an image patch into the
βrandom variable worldβ
Boracchi, Carrera, ICIP 2020
THE TYPICAL APPROACH
Most of the considered methods
1. Estimate a model describing normal data (background model)
2. Use the background model to provide, for each test patch, an anomaly score, or measure of rareness
3. Apply a decision rule to the anomaly score to detect anomalies (typically thresholding)
4. [optional] Perform post-processing operations to enforce smooth detections and avoid isolated pixels that are not consistent with neighbourhoods
Remark: Statistical-based approaches seen before uses as background model the statistical distribution οΏ½ππ0 and a statistic as anomaly score
Once βhaving appliedβ the background model, one can use anomaly detection
methods for the βrandom variable worldβ.This might require fitting an
additional model
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED ANOMALY-DETECTION IN IMAGES
Out of the "Random Variable" world
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED ANOMALY-DETECTION IN IMAGES
Out of the "Random Variable" world
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
Boracchi, Carrera, ICIP 2020
RECONSTRUCTION-BASED METHODS
Fit a statistical model to the observation to describe dependence, apply anomaly detection on the independent residuals.
Detection is performed by using a model β³ which can encode and reconstruct normal data:
β’ During training: learn the model β³ from training set ππβ’ During testing:
β Encode and reconstruct each test signal ππ through β³.
β Assess err(ππ), namely the residual between ππ and its reconstruction through β³
The rationale is that β³ can reconstruct only normal data, thus anomalies are expected to yield large reconstruction errors.
Boracchi, Carrera, ICIP 2020
MONITORING THE RECONSTRUCTION ERROR
Normal data are expected to yield values of err(ππ) that are low, while anomalies do not. This holds when the model β³ was specifically learned to describe normal data
Outliers can be detected by thresholding err(ππ)
π‘π‘
err(ππ)
β¦β¦
ππ1ππ0 ππ0
πΎπΎ
Boracchi, Carrera, ICIP 2020
RECONSTRUCTION-BASED METHODS
Popular models are:
β’ neural networks, in particular auto-encoders, for higher dimensional data
β’ projection on subspaces / manifolds
β’ dictionaries yielding sparse-representations
β’ autoregressive models for time series (ARMA, ARIMAβ¦)
Methods based on projections and dictionaries can be also interpreted as subspace methods
Boracchi, Carrera, ICIP 2020
RECONSTRUCTION-BASED METHODS
Autoencoders are neural networks used for data reconstruction (they learn the identity function)
The typical structure of an autoencoder is:
β¦
π π 1
π π ππ
π π 2
π π 3
π π 4
β¦
π π 1
π π ππ
π π 2
π π 3
π π 4
Input layer, ππ neurons
Output layer, ππ neurons
Hidden layer, ππ neuronsππ βͺ ππ
Encoder β° Decoder ππ
Boracchi, Carrera, ICIP 2020
RECONSTRUCTION-BASED METHODS
Autoencoders are trained to reconstruct all the samples in the training set. The reconstruction loss over the training set ππ is
β ππ = οΏ½ππβππ
ππ β ππ β° ππ 2
and training of ππ β° β is performed through standard backpropagation algorithms (e.g. SGD)
Remarks
β’ Typically ππ β° β does not provide perfect reconstruction, since ππ βͺ ππ.
β’ Regularization terms might be included in the loss function for the latent representation β° ππ to feature specific properties
Bengio, Y., Courville, A., Vincent, P. "Representation learning: A review and new perspectives". IEEE TPAMI 2013
Mishne, G., Shaham, U., Cloninger, A., & Cohen, I. Diffusion nets. Applied and Computational Harmonic Analysis (2017).
Boracchi, Carrera, ICIP 2020
MONITORING THE RECONSTRUCTION ERROR
Detection by reconstruction error monitoring (AE notation)
Training (Monitoring the Reconstruction Error):
1. Train the model ππ(β°(β )) from the training set ππ
2. Learn the distribution of reconstruction errors
err ππ = ππ β ππ β° ππ 2, ππ β ππ
over a validation set ππ, such that ππ β© ππ = β , and define a suitable threshold πΎπΎ
Testing (Monitoring the Reconstruction Error):
1. Perform encoding and compute the reconstruction error
err ππ = ππ β ππ β° ππ 2
2. Consider ππ anomalous when err ππ > πΎπΎ
Boracchi, Carrera, ICIP 2020
OUTLINE ON SEMI-SUPERVISED APPROACHES
Out of the "Random Variable" world
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
β’ Extended models
Boracchi, Carrera, ICIP 2020
SUBSPACE METHODS
The underlying assumption is that
β’ normal patches live in a subspace that can be identified by ππ
β’ anomalies can be detected by projecting test patches in such subspace and by monitoring the reconstruction error (distance with the projection)
NormalDataData
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
SUBSPACE METHODS
A few example of models used for describing normal patches:
β’ Orthogonal basis: normal patches can be expressed by a few selected basis elements (Fourier, Wavelets..)
β’ PCA: normal patches live in the linear subspace of the first components
β’ Robust PCA: defined on the β1 distance to be insensitive to outliers in normal data
β’ Kernel PCA: normal patches live in a non-linear manifold
β’ Dictionaries yielding sparse representations
β’ Random projectionsNormalDataData
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
SUBSPACE METHODS: PCA-BASED MONITORING
Anomaly detection based on PCA (and similar techniques):
1. Compute the projection on the subspace,ππβ² = ππππππ, ππ β βππΓππ, ππ βͺ ππ
which is the projection over the first ππ principal components and a way to reduce data-dimensionality.
2. Monitor the reconstruction error:err π¬π¬ = ππ β ππππππππ 2
which is the distance between ππ and its projection ππππππππ over the subspace of normal patches
2. [bis] The projection along the last principal component, is also a good anomaly score, as it becomes large at anomalies.
NormalDataData πππππ
Boracchi, Carrera, ICIP 2020
M. Elad "Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing",Springer, 2010
SUBSPACE METHODS: SPARSE REPRESENTATIONS
Basic assumption: normal data live in a union of low-dimensional subspaces of the input space
β’ The model learned from ππ is a matrix: the dictionary π·π·.
β’ Each signal is decomposed as the sum of a few dictionary atoms (representation is constrained to be sparse).
β’ Atoms represent the many building blocks that can be used to reconstruct normal signals.
β’ There are typically more atoms than the signal dimension (redundant dictionaries).
β’ Effective as long as the learned dictionary π·π· is very specific for normal data
Boracchi, Carrera, ICIP 2020
DICTIONARIES YIELDING SPARSE REPRESENTATIONS
Dictionaries are just matrices: π·π· β βππΓππ
π·π· =
Boracchi, Carrera, ICIP 2020
DICTIONARIES YIELDING SPARSE REPRESENTATIONS
Dictionaries are just matrices: π·π· β βππΓππ
Each column of π·π· is an atom:
β’ lives in the input space βππ
β’ it is one of the building blocks that have been learned to reconstruct the input signal in the training set ππ
ππ
π·π· =
Boracchi, Carrera, ICIP 2020
SPARSE REPRESENTATIONS
Let ππ β βππ be the input signal, a sparse representation is
ππ = οΏ½ππ=1
ππ
πΌπΌππ π π ππ
a linear combination of few dictionary atoms {π π ππ}, i.e., most of coefficients are such that πΌπΌππ = 0
An illustrative example in case of our patches
= 0.7 β +0.1 β β0.2 β
Boracchi, Carrera, ICIP 2020
SPARSE REPRESENTATIONS IN MATRIX EXPRESSION
Let ππ β βππ be the input signal, a sparse representation is
ππ = οΏ½ππ=1
ππ
πΌπΌππ π π ππ = π·π·πΆπΆ, π·π· β βππΓππ
a linear combination of few dictionary atoms {π π ππ} and πΆπΆ 0 < πΏπΏ, i.e. only a few coefficients are nonzero, i.e. πΆπΆ is sparse.
ππ πΆπΆπ·π·
This vectorπΆπΆ = [πΌπΌ1, β¦ , πΌπΌππ]
is sparse
= β
Overcomplete / Redundant dictionary
Boracchi, Carrera, ICIP 2020
SPARSE REPRESENTATIONS IN MATRIX EXPRESSION
Let ππ β βππ be the input signal, a sparse representation is
ππ = οΏ½ππ=1
ππ
πΌπΌππ π π ππ = π·π·πΆπΆ, π·π· β βππΓππ
a linear combination of few dictionary atoms {π π ππ} and πΆπΆ 0 < πΏπΏ, i.e. only a few coefficients are nonzero, i.e. πΆπΆ is sparse.
ππ πΆπΆπ·π·This vector
πΆπΆ = [πΌπΌ1, β¦ , πΌπΌππ]is sparse
Undercompletedictionary
Carrera D., Rossi B., Fragneto P., Boracchi G. "Online Anomaly Detection for Long-Term ECG Monitoring using Wearable Devices", Pattern Recognition 2019
Boracchi, Carrera, ICIP 2020Pati, Y.; Rezaiifar, R.; Krishnaprasad, P. Orthogonal Matching Pursuit: recursive function approximation with application to wavelet decomposition. Asilomar Conf. on Signals, Systems and Comput. 1993
SPARSE CODINGβ¦
Sprase Coding: computing the sparse representation for an input signal ππ w.r.t. π·π·
It is solved as the following optimization problem, (e.g. via the Orthogonal Matching Pursuit, OMP)
πΆπΆ = argminππββππ
π·π·ππ β π π 2 s. t. ππ 0 < πΏπΏ
In this illustration πΆπΆ = [0.7, 0, 0, 0.1, 0, 0, 0,β0.2]
ππ β βππ πΆπΆ β βππ
ππ
0.7 0 0 0.1 0 0 0 β0.2πΆπΆ =
Boracchi, Carrera, ICIP 2020Pati, Y.; Rezaiifar, R.; Krishnaprasad, P. Orthogonal Matching Pursuit: recursive function approximation with application to wavelet decomposition. Asilomar Conf. on Signals, Systems and Comput. 1993
SPARSE CODINGβ¦
Sprase Coding: computing the sparse representation for an input signal ππ w.r.t. π·π·
Or through an alternative formulation based on β1 minimization
πΆπΆ = argminππββππ
π·π·ππ β π¬π¬ ππππ + ππ ππ 1, ππ > 0
ππ β βππ πΆπΆ β βππ
Boracchi, Carrera, ICIP 2020
Aharon, M.; Elad, M. Bruckstein, A. K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation IEEE TSP, 2006
β¦ AND DICTIONARY LEARNING
Dictionary Learning: estimate π·π· from a training set of normal signals ππ β βππ
It is solved as the following optimization problem typically via block-coordinates descent (e.g. KSVD algorithm)
π·π·,ππ = argminπ΄π΄β βππΓππ, ππβ βππΓππ
π΄π΄π΄π΄ β ππ 2 s. t. ππππ 0 < πΏπΏ, βππππ
ππ = {ππππ, β¦ ππππ} π·π· β βππΓππ
β¦ .ππππ π·π·β¦ .
Boracchi, Carrera, ICIP 2020
SPARSE REPRESENTATION: RECONSTRUCTION ERROR
Anomalies can be directly detected by performing the sparse coding of test signals and then analysing the reconstruction error
err(ππ) = π·π·πΆπΆ β ππ 2
As in all reconstruction-based techniques
Carrera D., Rossi B., Zambon D., Fragneto P., Boracchi G. "ECG Monitoring in Wearable Devices by Sparse Models" in Proceedings of ECML-PKDD 2016, 16 pages
Carrera D., Rossi B., Fragneto P., Boracchi G. "Online Anomaly Detection for Long-Term ECG Monitoring using Wearable Devices" Pattern Recognition 2019
Low err ππ High err ππ
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION DURING SPARSE CODING
Anomalies can be directly detected during the sparse coding stage, by adopting a special loss during optimization.
A set of test signals is modeled as:
ππ = π·π·ππ + πΈπΈ + ππ
where ππ is sparse, ππ is a noise term, and πΈπΈ is a matrix having most columns set to zero. Columns ππππ β ππ indicate anomalies, as they do not admit a sparse representation w.r.t. π·π·
A. Adler, M. Elad, Y. Hel-Or, and E. Rivlin, βSparse coding with anomaly detectionβ Journal of Signal Processing Systems, vol.79, no. 2, pp. 179β188, 2015.
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION DURING SPARSE CODING
Anomalies can be detected by solving (through ADMM) the following sparse coding problem
argminππ,πΈπΈ
12
ππ β π·π·ππ β πΈπΈ πΉπΉ2 + ππ ππ 1 + ππ πΈπΈ 2,1
.. and identifying as anomalies the signals corresponding to columns of πΈπΈ that are nonzero.
A. Adler, M. Elad, Y. Hel-Or, and E. Rivlin, βSparse coding with anomaly detectionβ Journal of Signal Processing Systems, vol.79, no. 2, pp. 179β188, 2015.
Data-fidelity for normal data Sparsity Group sparsityregularization, only a fewcolumns can be nonzero
Boracchi, Carrera, ICIP 2020
OUTLINE ON SEMI-SUPERVISED APPROACHES
β’ Detrending/Filtering for time-series
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
β’ Extended models
Boracchi, Carrera, ICIP 2020
TYPICAL APPROACH: MONITORING FEATURES
Feature extraction: meaningful indicators to be monitored which have a known / controlled response w.r.t. normal data
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Input signal
FeatureExtraction
Featurevector
Anomaly/Changedetection
ππ π‘π‘ β βππ
ππ βͺ πππππ‘π‘ β βππ
οΏ½ππ0 ππ π‘π‘ βΆ ππ
The customary statisticalframework for anomaly detection
Feature Extraction: signal processing, a priori information, learning methods
Signals Random variables
Boracchi, Carrera, ICIP 2020
MONITORING FEATURE DISTRIBUTION
Normal data are expected to yield features ππ that are i.i.d. and follow an unknown distribution ππ0.
Anomalous data do not, as they follow ππ1 β ππ0.
We are back to our statistical framework and we can
β’ learn οΏ½ππ0 from a set features extracted from normal data
β’ detect anomalous data by extracting features ππ associated to each input π π , and then testing whether οΏ½ππ0 π₯π₯ < πΎπΎ
π‘π‘
ππ(π‘π‘)
οΏ½ ππ0ππ
β¦β¦β¦
πΎπΎ
Boracchi, Carrera, ICIP 2020
MONITORING FEATURE DISTRIBUTION
Normal data are expected to yield features ππ that are i.i.d. and follow an unknown distribution ππ0.
Anomalous data do not, as they follow ππ1 β ππ0.
We are back to our statistical framework and we can
β’ learn οΏ½ππ0 from a set features extracted from normal data
β’ detect anomalous data by extracting features ππ associated to each input π π , and then testing whether οΏ½ππ0 π₯π₯ < πΎπΎ
π‘π‘
ππ(π‘π‘)
οΏ½ ππ0ππ
β¦β¦β¦
πΎπΎ
Or by adopting any other statistical toolto detect anomalies in ππ
Boracchi, Carrera, ICIP 2020
FEATURE EXTRACTION
Data dimensionality can be reduced by extracting features
Good features should:
β’ Yield a stable response w.r.t. normal data
β’ Yield unusual response on anomalies
Examples of features seen so far:
β’ Reconstruction error err(ππ)
β’ representation coefficients (ππππππ,πΆπΆ, β¦ )
β¦ but these are not the only
V. Chandola, A. Banerjee, V. Kumar. "Anomaly detection: A survey". ACM Comput. Surv. 41, 3, Article 15 (2009), 58 pages.
Boracchi, Carrera, ICIP 2020
FEATURE EXTRACTION APPROACHES
There are two major approaches for extracting features:
Expert-driven (hand-crafted) features: computational expressions that are manually designed by experts to distinguish between normal and anomalous data
Data-driven features: features characterizing normal data are automatically learned from training set of normal samples ππ
Boracchi, Carrera, ICIP 2020
OUTLINE ON SEMI-SUPERVISED APPROACHES
β’ Detrending/Filtering for time-series
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
β’ Data-driven Features: extended models
Boracchi, Carrera, ICIP 2020
EXAMPLES OF EXPERT-DRIVEN FEATURES
Expert-driven features: each patch of an image π π π¬π¬ππ = {π π ππ + π’π’ ,π’π’ β π°π°}
Example of features are:
β’ the average,
β’ the variance,
β’ the total variation (the energy of gradients)
These can hopefully distinguish normal and anomalous patches, considering also how anomalous regions will be (e.g. flat or without edges)
Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEETransactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472
Boracchi, Carrera, ICIP 2020
OUTLINE ON SEMI-SUPERVISED APPROACHES
β’ Detrending/Filtering for time-series
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
β’ Data-driven Features: extended models
Boracchi, Carrera, ICIP 2020
EXAMPLES OF DATA-DRIVEN FEATURES
Analyze each patch of an image π π π¬π¬ππ = {π π ππ + π’π’ ,π’π’ β π°π°}
and determine whether it is normal or anomalous.
Data driven features: expressions to quantitatively assess whether test patches conform or not with the model, learned from normal data.
Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEETransactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472
Boracchi, Carrera, ICIP 2020
AUTOENCODERS AS FEATURE EXTRACTORS
Autoencoders can be also used in feature-based monitoring schemes, monitoring as feature the hidden/latent representation of the input signal
β¦
π π 1
π π ππ
π π 2
π π 3
π π 4
β¦
π π 1
π π ππ
π π 2
π π 3
π π 4
Input layer, ππ neurons
Output layer, ππ neurons
Hidden layer, ππ neuronsππ βͺ ππ
Encoder β° Decoder ππ
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION BY MONITORING FEATURE DISTRIBUTION
Detection by feature monitoring (AE notation)
Training (Monitoring Feature Distribution):
β’ Learn the autoencoder ππ(β°(β )) from the training set ππ
β’ Fit a density model οΏ½ππ0 to the encoded features
{β°(ππ), ππ β ππ}over a validation set ππ, such that ππ β© ππ = β
β’ Define a suitable threshold πΎπΎ for οΏ½ππ0(ππ)
Testing (Monitoring Feature Distribution):
β’ Encode each incoming signal ππ through β°
β’ Detect anomalies if οΏ½ππ0 β°(ππ) < πΎπΎ
Boracchi, Carrera, ICIP 2020
ANOMALY DETECTION BY MONITORING PCA PROJECTIONS
Compute the projection on the subspace,ππβ² = ππππππ, ππ β βππΓππ, ππ βͺ ππ
which is the projection over the first ππ principal components and a way to reduce data-dimensionality.
Monitor the projections ππππππ by a suitable statistical technique (e.g. density based), as when monitoring β°(ππ)
NormalDataData ππππππππ
Boracchi, Carrera, ICIP 2020
SPARSE REPRESENTATIONS AS FEATURE EXTRACTORS
To assess the conformance of ππππ with π·π· we solve the following
Sparse coding:
πΆπΆ = argminππββππ
π·π·ππ β π¬π¬ ππππ + ππ ππ 1, ππ > 0
which is the BPDN formulation and we solve using ADMM.
The penalized β1 formulation has more degrees of freedom in the reconstruction, the conformance of ππ with π«π« have to be assessed monitoring both terms of the functional
Boyd S., Parikh N., Chu E., Peleato B., Eckstein J., "Distributed optimization and statistical learning via the alternatingdirection method of multipliers" 2011
Boracchi, Carrera, ICIP 2020
FEATURES EXTRACTED FROM SPARSE CODING
Features then include both the reconstruction error
err ππ = π·π·πΆπΆ β π¬π¬ ππππ
and the sparsity of the representation
πΆπΆ ππ
Thus obtaining a data-driven feature vector ππ = π·π·πΆπΆ β π¬π¬ ππππ
πΆπΆ 1
Boracchi, Carrera, ICIP 2020
DENSITY-BASED MONITORING
Normal patches:
Anomalies
Boracchi, Carrera, ICIP 2020Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472
FEATURES EXTRACTED FROM SPARSE CODING
Training:
β’ Learn from ππ the dictionary π·π·
β’ Compute the sparse representation w.r.t. π·π·, thus features ππover the validation set ππ, such that ππ β© ππ = β
β’ Learn from ππ, the distribution οΏ½ππ0 of normal features vectors ππand the threshold πΎπΎ.
The model for anomaly detection is (π·π·, οΏ½ππ0, πΎπΎ)
Testing:
β’ Perform sparse coding of a test signal ππ, thus get the feature vector ππ
β’ Detect anomalies when οΏ½ππ0 ππ < πΎπΎ
Boracchi, Carrera, ICIP 2020Carrera D., Manganini F., Boracchi G., Lanzarone E. "Defect Detection in SEM Images of Nanofibrous Materials", IEEE Transactions on Industrial Informatics 2017, 11 pages, doi:10.1109/TII.2016.2641472
FEATURES EXTRACTED FROM SPARSE CODING
Training:
β’ Learn from ππ the dictionary π·π·
β’ Compute the sparse representation w.r.t. π·π·, thus features ππover the validation set ππ, such that ππ β© ππ = β
β’ Learn from ππ, the distribution οΏ½ππ0 of normal features vectors ππand the threshold πΎπΎ.
The model for anomaly detection is (π·π·, οΏ½ππ0, πΎπΎ)
Testing:
β’ Perform sparse coding of a test signal ππ, thus get the feature vector ππ
β’ Detect anomalies when οΏ½ππ0 ππ < πΎπΎ
This is rather a flexible solution and can be adapted whenoperating conditions changes (e.g. different zooming level)
Boracchi, Carrera, ICIP 2020
Boracchi, Carrera, ICIP 2020Detections
Boracchi, Carrera, ICIP 2020
OUTLINE ON SEMI-SUPERVISED APPROACHES
β’ Detrending/Filtering for time-series
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
β’ Data-driven Features: extended models
Boracchi, Carrera, ICIP 2020Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. βDeconvolutional networksβ CVPR 2010
CONVOLUTIONAL SPARSITY
Data-driven features from convolutional sparse models
ππ βοΏ½ππ=1
ππ
π π ππ β πΆπΆππ , s. t. πΆπΆππ is sparse
where the whole image ππ is entirely encoded as the sum of ππconvolutions between a filter π π ππ and a coefficient map πΆπΆππβ’ Translation invariant representation
β’ Few small filters are typically required
β’ Filters exhibit very specific image structures
β’ Easy to use filters having different size
Boracchi, Carrera, ICIP 2020
EXAMPLE OF LEARNED FILTERS
Training Image Learned Filters
Boracchi, Carrera, ICIP 2020
EXAMPLE OF FEATURE MAPS
Coefficient maps FiltersTest Image
Boracchi, Carrera, ICIP 2020
FEATURES FROM CONVOLUTIONAL SPARSE MODEL
The standard convolutional sparse coding
οΏ½πΆπΆ = argminπΆπΆ ππ
οΏ½ππ=1
ππ
π π ππ β πΆπΆππ β π¬π¬ππ
ππ
+ πποΏ½ππ=1
ππ
πΆπΆ 1
leads to the following feature vector for each image region in ππ:
ππππ =
οΏ½ππ
οΏ½ππ=1
ππ
π π ππ β οΏ½πΆπΆππ β π¬π¬ππ
ππ
οΏ½ππ=1
ππ
οΏ½ππ
οΏ½πΆπΆππ
β¦but, anomaly detection performance are rather poor
Boracchi, Carrera, ICIP 2020
SPARSITY IS TOO LOOSE A CRITERION FOR DETECTION
These two (normal and anomalous) patches exhibit same sparsity and
reconstruction error
Coefficientm
apsnorm
alpatchCoefficient
maps
anomalous
patch
Boracchi, Carrera, ICIP 2020D. Carrera, G. Boracchi, A. Foi and B. Wohlberg , βDetecting Anomalous Structures by Convolutional Sparse Models β IEEE IJCNN 2015
FEATURES FROM CONVOLUTIONAL SPARSE MODEL
Add the group sparsity of the maps on the patch support as an additional feature
π₯π₯ππ =
οΏ½ππ
οΏ½ππ=1
ππ
π π ππ β οΏ½πΆπΆππ β π¬π¬ππ
ππ
οΏ½ππ=1
ππ
οΏ½ππ
οΏ½πΆπΆ1
οΏ½ππ=1
ππ
οΏ½ππ
οΏ½πΆπΆ2
Boracchi, Carrera, ICIP 2020
ANOMALY-DETECTION PERFORMANCE
On 25 different textures and 600 test images (pair of textures to mimic normal/anomalous regions)
Best performance achieved by the 3-dimensional feature indicators
Achieve similar performance than steerable pyramid specifically designed for texture classification
Boracchi, Carrera, ICIP 2020
OUTLINE ON SEMI-SUPERVISED APPROACHES
β’ Detrending/Filtering for time-series
β’ Reconstruction-based methods
β’ Subspace methods
β’ Feature-based monitoring
β’ Expert-driven Features
β’ Data-driven Features
β’ Data-driven Features: extended models
Dictionary Adaptation: Counteracting Domain Shift in Anomaly Detection
Boracchi, Carrera, ICIP 2020
DATA-DRIVEN MODELS AND ONLINE MONITORING
A challenge often occurring when performing online monitoring
Test data might differ from training data: need of adaptation
Defects have to be detected at different zooming levels, that might not be present in the training set.
Boracchi, Carrera, ICIP 2020
Pan, S. J., and Yang, Q. "A survey on transfer learningβ IEEE TKDE 2010
S. Shekhar, V. M. Patel, H. V. Nguyen, & R. Chellappa, βGeneralized domain-adaptive dictionaries,β CVPR 2013
MODEL ADAPTATION
In the machine-learning literature these problems go under the name of transfer learning / domain adaptation
Transfer Learning (TL): adapt a model learned in the source domain (e.g. heartbeats at a given heartrate / fibers at a certain zoom level) to a target domain (e.g. heartbeats at an higher heartrate / fibers zoomed in or out)
Many TL methods have been designed for supervised / semi-supervised / unsupervised methods, depending on the availability of (annotated) data in the source and target domains.
In our settings, no labels in the target data are provided
Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016
DOMAIN ADAPTATION ON QUALITY INSPECTION
SEM images can be acquired at different zooming levels
Solution:
β’ Synthetically generate training images at different zooming levels
β’ Learn a dictionary π·π·ππ at each scale
β’ Combine the learned dictionaries in a multiscale dictionary π·π·
π·π· = [ π·π·1 π·π·2 π·π·3 π·π·4 ]
Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016
DOMAIN ADAPTATION ON QUALITY INSPECTION
SEM images can be acquired at different zooming levels
Solution:
β’ Synthetically generate training images at different zooming levels
β’ Learn a dictionary π·π·ππ at each scale
β’ Combine the learned dictionaries in a multiscale dictionary π·π·
β’ Sparse-coding including a penalized, group sparsity term
πΆπΆ = argminππββππ
12
ππ β π·π·ππ2
2+ ππ ππ ππ + πποΏ½
ππ
ππ 2
Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016
DOMAIN ADAPTATION ON QUALITY INSPECTION
SEM images can be acquired at different zooming levels
Solution:
β’ Synthetically generate training images at different zooming levels
β’ Learn a dictionary π·π·ππ at each scale
β’ Combine the learned dictionaries in a multiscale dictionary π·π·
β’ Sparse-coding including a penalized, group sparsity term
β’ Monitor a tri-variate feature vector
ππ =
ππ β π·π·πΆπΆ ππππ
πΆπΆ 1
οΏ½ππ
πΆπΆππ 2
Boracchi, Carrera, ICIP 2020Carrera D., Boracchi G., Foi A. and Wohlberg B. "Scale-invariant Anomaly Detection With multiscale Group-sparse Models" ICIP 2016
DOMAIN ADAPTATION ON QUALITY INSPECTION
Performance on SEM image dataset acquired at 4 different zooming levels (A,B,C,D). It is important to include group-sparsity regularization also in the sparse coding stage
Boracchi, Carrera, ICIP 2020
DOMAIN ADAPTATION FOR ONLINE ECG MONITORING
β’ Dictionary has to be learned from each user
β’ Training set ππ of ECG tracings for training can be only acquired in resting conditions
β’ During daily activities heart-rate changes and do not match the learned dictionary
The heartbeats get transformed when the heart rate changes: learned models have to be adapted according to the heart rate.
Boracchi, Carrera, ICIP 2020
DOMAIN ADAPTATION FOR ONLINE ECG MONITORING
We propose to design linear transformations πΉπΉππ1,ππ0 to adapt user-specific dictionaries
π·π·π’π’,ππ1 = πΉπΉππ1,ππ0 β π·π·π’π’,ππ0 , πΉπΉππ0,ππ1 β βππΓππ
Surprisingly these transformations can be learned from a publicly available dataset containing ECG recordings at different heart rates from several users
User-independent transformations enable accurate mapping of user-specific dictionaries
Carrera D., Rossi B., Fragneto P., and Boracchi G. "Domain Adaptation for Online ECG Monitoringβ ICDM 2017,
οΏ½π·π·π’π’,ππ1
= οΏ½
π·π·π’π’,ππ0πΉπΉππ1,ππ0
Boracchi, Carrera, ICIP 2020
DOMAIN ADAPTATION FOR ONLINE ECG MONITORING
We propose to design linear transformations πΉπΉππ1,ππ0 to adapt user-specific dictionaries
π·π·π’π’,ππ1 = πΉπΉππ1,ππ0 β π·π·π’π’,ππ0 , πΉπΉππ0,ππ1 β βππΓππ
Surprisingly these transformations can be learned from a publicly available dataset containing ECG recordings at different heart rates from several users
User-independent transformations enable accurate mapping of user-specific dictionaries
Carrera D., Rossi B., Fragneto P., and Boracchi G. "Domain Adaptation for Online ECG Monitoringβ ICDM 2017,
οΏ½π·π·π’π’,ππ1
= οΏ½
π·π·π’π’,ππ0πΉπΉππ1,ππ0
Reference-based approaches
Boracchi, Carrera, ICIP 2020
Zontak, M., Cohen, I.: Defect detection in patterned wafers using anisotropic kernels." Machine Vision and Applications 21(2), 129{141 (2010)
REFERENCE BASED-METHODS IN QUALITY INSPECETION
In some cases anomalies can be detected by comparing
β’ the target, namely the image to be tested
β’ against a reference, namely an anomaly-free image
refe
renc
e
targ
et
Boracchi, Carrera, ICIP 2020
Zontak, M., Cohen, I.: Defect detection in patterned wafers using anisotropic kernels." Machine Vision and Applications 21(2), 129{141 (2010)
REFERENCE BASED-METHODS IN QUALITY INSPECETION
In some cases anomalies can be detected by comparing
β’ the target, namely the image to be tested
β’ against a reference, namely an anomaly-free image
refe
renc
e
targ
et
Boracchi, Carrera, ICIP 2020
REFERENCE BASED-METHODS: E-CHUCKING
The e-chuck is adopted to safely maneuver, position and block the silicon wafer during production.
The device is configured by means of markingmaterials and the markers should conformto a reference template
Electrodes
Template
Good (normal) Bad (anomalous)
Boracchi, Carrera, ICIP 2020
CHANGE DETECTION IN REMOTE SENSING
A key problem in remote sensing is to detect changes, due to manmade or natural phenomenon, by comparing multiple (satellite) images at different times.
In the remote sensing literature this is typically referred to as change detection
reference target
Boracchi, Carrera, ICIP 2020
L. T. Luppino, F. M. Bianchi, G. Moser, S. N. Anfinsen, βUnsupervised Image Regression for Heterogeneous Change Detectionβ IEEE Transactions on Geoscience and Remote Sensing (2019)
MULTIMODAL, REFERENCE BASED ANOMALY DETECTION
Non trivial when direct comparison is prevented:
β’ Reference and target might not be aligned nor easy to register with a global transformation
β’ Reference and target might be from different modalities / resolution / view
Multispectral vs SAR images California flood 2017
Temperature Map from CFD
IR
RGB
Part3: Deep learning for anomaly Detection in images
Boracchi, Carrera, ICIP 2020
IMAGE CLASSIFICATION
The problem: assigning to an input image ππ one label ππ from a fixed set of πΏπΏ categories Ξ
ππ
ππ
βwheelβ 65%, βtyreβ 30%..
βcastleβ 55%, βtowerβ 43%..
Ξ = {"wheel", "cars" β¦ . .β¦β¦"castle", "baboon", β¦ }
Boracchi, Carrera, ICIP 2020
DEEP LEARNING AND IMAGE CLASSIFICATION
Since 2010 ImageNet organizes ILSVRC (ImageNet Large Scale Visual Recognition Challenge)
Classification error rate (top 5 accuracy):
β’ In 2011: 25%
β’ In 2012: 16% (achieved by a CNN)
β’ In 2017: < 5% (for 29 of 38 competing teams, deep learning)
Deep learning
Boracchi, Carrera, ICIP 2020
DEEP LEARNING AND IMAGE CLASSIFICATION
Deep Learning boasted image classification performance, thanks to
β’ Advances in parallel hardware (e.g. GPU)
β’ Availability of large annotated dataset (e.g. the ImageNet project is a large database visual recognition over 14M hand-annotated images in more than 20K categories)
Boracchi, Carrera, ICIP 2020
CONVOLUTIONAL NEURAL NETWORKS (CNN)
The typical architecture of a convolutional neural network
Convolution filters are learned for the
classification task at hand
Thresholding + Downsampling
(ReLu + Maxpooling)
Fully connected NeuralNetwork providing as
output the classscores
Boracchi, Carrera, ICIP 2020
THE OUTPUT OF A CNN
0.02
0.01
0.11
0.81
β¦
β¦
β¦
0.1
TrainedCNN
"Castle" probability
"Wheel" probability
Boracchi, Carrera, ICIP 2020
CNN AS DATA-DRIVEN FEATURE EXTRACTOR
Extract high-level features from pixel data Classify
Boracchi, Carrera, ICIP 2020
CNN AS DATA-DRIVEN FEATURE EXTRACTOR
Extract high-level features from pixel data Classify
The feature extractor and the classifier are jointly learned in a end-to-end fashion
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED APPROACHES
β’ CNN as data-driven feature extractor
β’ Transfer learning
β’ Self-supervised learning
β’ Autoencoders
β’ Domain-based
β’ Generative models
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED APPROACHES
β’ CNN as data-driven feature extractor
β’ Transfer learning
β’ Self-supervised learning
β’ Autoencoders
β’ Domain-based
β’ Generative models
Boracchi, Carrera, ICIP 2020
THE THREE STEP IN ANOMALY DETECTION IN IMAGES
Recall the three major ingredients
β’ Feature extraction
β’ Anomaly score
β’ Decision rule
Boracchi, Carrera, ICIP 2020
CNN AS DATA-DRIVEN FEATURE EXTRACTOR
Extract high-level features from pixel data Classify
The feature vector extracted from the last layer can be modeled as a random vector
Boracchi, Carrera, ICIP 2020
TRANSFER LEARNING
Idea:
β’ Use a pretrained network πΆπΆπΉπΉπΉπΉ (e.g. AlexNet), that was trained for a different task and on a different dataset
β’ Throw away the last layer(s)
β’ Use the reduced πΆπΆπΉπΉπΉπΉ ππ to build a new dataset πππππ from ππππ:
ππππβ² = {ππ ππππ , ππππ β ππππ}
β’ Train your favorite anomaly detector on πππππ to define the anomaly score and the decision rule
Boracchi, Carrera, ICIP 2020Napoletano P., Piccoli F., Schettini R., "Anomaly Detection in Nanofibrous Materials by CNN-Based Self-Similarity", Sensors 2018
TRANSFER LEARNING
β’ Features extracted from a πΆπΆπΉπΉπΉπΉ, i.e., ππ(ππ) is typically very large for deep networks (e.g. ResNET). Reduce data-dimensionality by PCA defined on a set of normal features
β’ Anomalies can be detected by measuring distance w.r.t. normal features, possibly using clustering to speed up performance.
Boracchi, Carrera, ICIP 2020
TRANSFER LEARNING
Pros: pretrained networks are very powerful models, since theyusually trained on datasets with million of images
Cons:
β’ the network is not trained on normal data. Meaningfulstructures in normal images might not be successfully capturedby network trained on images from a different domain (e.g. medical vs natural images)
β’ The anomaly score and the CNN are not jointly learned, whilethe end-to-end learning strategy is the key to achieveimpressive results in supervised tasks.
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED APPROACHES
β’ CNN as data-driven feature extractor
β’ Transfer learning
β’ Self-supervised learning
β’ Autoencoders
β’ Domain-based
β’ Generative models
Boracchi, Carrera, ICIP 2020
SELF-SUPERVISED LEARNING
With transfer learning we use a model trained on a different dataset (e.g., Imagenet) to address a different task (e.g., classification).
The idea of self-supervised learning is to train a model on the target dataset but solving a different task, for which we can easily obtain the labels.
Boracchi, Carrera, ICIP 2020
SELF-SUPERVISED LEARNING
We can build a labeled dataset for multiclass classification from normal data
β’ Consider a set of ππ transformation π―π― = {ππ1, β¦ , ππππ}
β’ Apply each transformation ππππ to every π¬π¬ β ππππ:
ππππππππππ = ππππ ππ , ππ ππ β ππππ, ππ = 1, β¦ ,ππ}
β’ Train a πΆπΆπΉπΉπΉπΉ on ππππππππππβ’ The output of the last layer of the πΆπΆπΉπΉπΉπΉ is used as feature
vector
Boracchi, Carrera, ICIP 2020
SELF-SUPERVISED LEARNING
We can build a labeled dataset for multiclass classification from normal data
β’ Consider a set of ππ transformation π―π― = {ππ1, β¦ , ππππ}
β’ Apply each transformation ππππ to every π¬π¬ β ππππ:
ππππππππππ = ππππ ππ , ππ ππ β ππππ, ππ = 1, β¦ ,ππ}
β’ Train a πΆπΆπΉπΉπΉπΉ on ππππππππππβ’ The output of the last layer of the πΆπΆπΉπΉπΉπΉ is used as feature
vector
We can avoid using a pre-trained model, and train a CNN directly on our dataset
Boracchi, Carrera, ICIP 2020
SELF-SUPERVISED LEARNING
ππ2 ππ3ππ4
ππ1 ππ2 ππ3 ππ4
Example:
β’ ππππ contains only images representing digit 3
β’ π―π― contains rotations and horizontal/vertical flips
ππ1
Boracchi, Carrera, ICIP 2020
SELF-SUPERVISED LEARNING
Example:
β’ ππππ contains only images representing digit 3
β’ π―π― contains rotations and horizontal/vertical flips
ππ1ππ2 ππ3
ππ4
ππ1 ππ2 ππ3 ππ4
Training set of normal data
Labeled training set with ππ classes
Boracchi, Carrera, ICIP 2020
SELF-SUPERVISED LEARNING
We can use the second last layer as feature vector and train an anomaly detector
Problem: we have to split our training set in two sets. The first set is used to train the classifier, the second one to train the anomaly detector
SoftmaxConvolution Pooling Fully connected
ππ
Boracchi, Carrera, ICIP 2020Golan, El-Yaniv, βDeep Anomaly Detection Using Geometric Transformationsβ, NeurIPS 2018
SELF-SUPERVISED LEARNING
Another approach is to compute feed the network with the all the trasformed versions ππππ(ππ) of the test sample ππ to obtain {πΆπΆππ}ππ=1ππ
SoftmaxConvolution Pooling Fully connected
ππππ(ππ)πΆπΆππ
Boracchi, Carrera, ICIP 2020Golan, El-Yaniv, βDeep Anomaly Detection Using Geometric Transformationsβ, NeurIPS 2018
SELF-SUPERVISED LEARNING
Another approach is to compute feed the network with the all the trasformed versions ππππ(ππ) of the test sample ππ to obtain {πΆπΆππ}ππ=1ππ
π π πππ π π π π π ππ = 1 β1πποΏ½ππ=1
ππ
πΆπΆππ ππ
Where πΆπΆππ ππ is the posterior probability of label ππ given ππππ(ππ)
SoftmaxConvolution Pooling Fully connected
ππππ(ππ)πΆπΆππ
Boracchi, Carrera, ICIP 2020Golan, El-Yaniv, βDeep Anomaly Detection Using Geometric Transformationsβ, NeurIPS 2018
SELF-SUPERVISED LEARNING
The set of transformation has to be properly chosen:
β’ if during training the trained classifier cannot discriminate the transformed samples, it does not extract meaningful feature for anomaly detection
β’ Non-geometric transformations (Gaussian blur, gamma correction, sharpening) might eliminate important features and are less performing than geometric ones
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED APPROACHES
β’ CNN as data-driven feature extractor
β’ Transfer learning
β’ Self-supervised learning
β’ Autoencoders
β’ Domain-based
β’ Generative models
Boracchi, Carrera, ICIP 2020
AUTOENCODERS (REVISITED)
Autoencoders can be trained directly on normal data by minimizing the reconstruction loss:
Encoder β° Decoder ππ
οΏ½ππβππππ
ππ β ππ β° ππ 2
ππ πΆπΆ πππ
Boracchi, Carrera, ICIP 2020
AUTOENCODERS (REVISITED)
We can fit a density model (e.g. Gaussian Mixture) on πΆπΆ = β°(ππ):
πΆπΆ βΌοΏ½ππ
ππππππππππ,Ξ£ππ ,
Where ππππππ,Ξ£ππ is the pdf of π©π© ππππ, Ξ£ππ
Encoder β° Decoder ππ
ππ πΆπΆ πππ
Boracchi, Carrera, ICIP 2020
EM-ALGORITHM FOR GAUSSIAN MIXTURES
Estimation of Gaussian Mixture parameters {ππππ ,ππππ , Ξ£ππ} from a training set πΆπΆππ ππis typically performed via EM-algorithm, that iterates the E and M steps
β’ E-step: compute the membership weights πΎπΎππ,ππ for each training sample πΆπΆππ
πΎπΎππ,ππ =ππππππππππ,Ξ£ππ(πΆπΆππ)
βππ ππππππππππ,Ξ£ππ(πΆπΆππ)
Bishop, βPattern recognition and machine learningβπΆπΆ
πΎπΎ1 βΌ 1πΎπΎ2 βΌ 0
Boracchi, Carrera, ICIP 2020
EM-ALGORITHM FOR GAUSSIAN MIXTURES
Estimation of Gaussian Mixture parameters {ππππ ,ππππ , Ξ£ππ} from a training set πΆπΆππ ππis typically performed via EM-algorithm, that iterates the E and M steps
β’ E-step: compute the membership weights πΎπΎππ,ππ for each training sample πΆπΆππ
πΎπΎππ,ππ =ππππππππππ,Ξ£ππ(πΆπΆππ)
βππ ππππππππππ,Ξ£ππ(πΆπΆππ)
Bishop, βPattern recognition and machine learningβπΆπΆ
πΎπΎ1 βΌ 0πΎπΎ2 βΌ 1
Boracchi, Carrera, ICIP 2020
EM-ALGORITHM FOR GAUSSIAN MIXTURES
Estimation of Gaussian Mixture parameters {ππππ ,ππππ , Ξ£ππ} from a training set πΆπΆππ ππis typically performed via EM-algorithm, that iterates the E and M steps
β’ E-step: compute the membership weights πΎπΎππ,ππ for each training sample πΆπΆππ
πΎπΎππ,ππ =ππππππππππ,Ξ£ππ(πΆπΆππ)
βππ ππππππππππ,Ξ£ππ(πΆπΆππ)
Bishop, βPattern recognition and machine learningβπΆπΆ
πΎπΎ1 βΌ12
πΎπΎ2 βΌ12
Boracchi, Carrera, ICIP 2020
EM-ALGORITHM FOR GAUSSIAN MIXTURES
Estimation of Gaussian Mixture parameters {ππππ ,ππππ , Ξ£ππ} from a training set πΆπΆππ ππis typically performed via EM-algorithm, that iterates the E and M steps
β’ E-step: compute the membership weights πΎπΎππ,ππ for each training sample πΆπΆππ
πΎπΎππ,ππ =ππππππππππ,Ξ£ππ(πΆπΆππ)
βππ ππππππππππ,Ξ£ππ(πΆπΆππ)
β’ M-step: update the parameters of the Gaussian Mixture
ππππ = 1ππβππ πΎπΎππ,ππ
ππππ = βππ πΎπΎππ,πππΆπΆππΞ£πππΎπΎππ,ππ
Ξ£ππ = βππ πΎπΎππ,ππ πΆπΆππβππππ πΆπΆππβππππ ππ
Ξ£πππΎπΎππ,ππ
Bishop, βPattern recognition and machine learningβ
Boracchi, Carrera, ICIP 2020
AUTOENCODERS (REVISITED)
We can compute the likelihood of a test sample ππ as:
β ππ = οΏ½ππ
ππππππππππ,Ξ£ππ(β° ππ ) ,
Encoder β° Decoder ππ
ππ πΆπΆ πππ
Boracchi, Carrera, ICIP 2020
AUTOENCODERS (REVISITED)
We can compute the likelihood of a test sample ππ as:
β ππ = οΏ½ππ
ππππππππππ,Ξ£ππ(β° ππ ) ,
Encoder β° Decoder ππ
ππ πΆπΆ πππ
The autoencoder and the Gaussian Mixture are not jointly learned!
Boracchi, Carrera, ICIP 2020
JOINT LEARNING OF AUTOENCODER AND DENSITY MODEL
Idea: given a training set of N samples use a NN to predict the membership weights of each sample
ππ πΆπΆ πππ
πΆπΆ πΈπΈ
Zong et al, βDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detectionβ, ICLR 2018
Estimation Network
Boracchi, Carrera, ICIP 2020
JOINT LEARNING OF AUTOENCODER AND DENSITY MODEL
Idea: given a training set of N samples use a NN to predict the membership weights of each sample
ππ πΆπΆ πππ
πΆπΆ πΈπΈ
Zong et al, βDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detectionβ, ICLR 2018
Estimate the GM parameters as:
ππππ =1πΉπΉοΏ½ππ
πΎπΎππ,ππ
ππππ =βππ πΎπΎππ,πππΆπΆππ
Ξ£πππΎπΎππ,ππ
Ξ£ππ =βππ πΎπΎππ,ππ πΆπΆππ β ππππ πΆπΆππ β ππππ ππ
Ξ£πππΎπΎππ,ππ M-stepE-step
Estimation Network
Boracchi, Carrera, ICIP 2020
DEEP AUTOENCODING GAUSSIAN MIXTURE MODEL
Minimize the loss:
minοΏ½ππ
ππ β ππ β° ππ2
2+ ππβ(β° ππ )
Where
β πΆπΆ = β logοΏ½ππ
ππππππππππ,Ξ£ππ(πΆπΆ)
Additional regularizations has to be imposed on Ξ£ππ to avoid trivial solution
β(β° ππ ) can be used an anomaly score for a sample ππ
Zong et al, βDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detectionβ, ICLR 2018
Boracchi, Carrera, ICIP 2020
SOME REMARKS
β’ The estimation network introduces a regularization that helps to avoid local optima of the recontruction error
β’ The autoencoder is then able to extract meaningful feature from normal data
β’ Density estimation enables anomaly detection, but it is a more complicated task
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED APPROACHES
β’ CNN as data-driven feature extractor
β’ Transfer learning
β’ Self-supervised learning
β’ Autoencoders
β’ Domain-based
β’ Generative models
Boracchi, Carrera, ICIP 2020
SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED
We want to find an hypersphere that, in the feature space, encloses most of the normal data
Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)
Boracchi, Carrera, ICIP 2020
SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED
We want to find an hypersphere that, in the feature space, encloses most of the normal data
We expect that anomalous data lie outside the sphere
Tax, D. M., Duin, R. P. "Support vector domain description". Pattern recognition letters, 20(11), 1191-1199 (1999)
Boracchi, Carrera, ICIP 2020
SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED
Typically the sphere is computed in a high (possibly infinite) dimensional feature space
Feature are defined using kernels
β’ Polinomial kernel
β’ Gaussian kernel
ππ
Boracchi, Carrera, ICIP 2020Ruff et al, βDeep One-Class Classificationβ, ICML 2018
SUPPORT VECTOR DATA DESCRIPTION (SVDD) REVISITED
Idea: can we learn the feature from normal data using a neural network?
ππ
Boracchi, Carrera, ICIP 2020Ruff et al, βDeep One-Class Classificationβ, ICML 2018
SOFT-BOUNDARY DEEP SVDD
Minimize the loss:
minππ,π½π½
ππ2 +1πππΉπΉ
οΏ½ππ=1
ππ
max{0, πππ½π½ ππππ β ππ ππ β ππ2} + ππ π½π½ 2
β’ The samples ππππ such that πππ½π½ ππππ is inside the sphere do not contribute to the loss
β’ ππ provides a bound on the False Positive Rate
β’ ππ π½π½ 2is a regularization term
A test sample ππ is anomalous if πππ½π½ ππ β ππ > ππ
ππ
ππ
Boracchi, Carrera, ICIP 2020Ruff et al, βDeep One-Class Classificationβ, ICML 2018
SOFT-BOUNDARY DEEP SVDD
Minimize the loss:
minππ,π½π½
ππ2 +1πππΉπΉ
οΏ½ππ=1
ππ
max{0, πππ½π½ ππππ β ππ ππ β ππ2} + ππ π½π½ 2
Remarks:
β’ Some contraints must be imposed on the network πππ½π½ to avoid trivial solutions:
β’ No bias terms
β’ Unbounded activations
β’ ππ is not optimized but has to be precomputed from data
β’ ππ must be different from ππ0 = ππ0 ππ
Boracchi, Carrera, ICIP 2020Ruff et al, βDeep One-Class Classificationβ, ICML 2018
A SIMPLER FORMULATION: DEEP SVDD
minπ½π½
+1πΉπΉοΏ½ππ=1
ππ
πππ½π½ ππππ β ππ 2 + ππ π½π½ 2
Cons:
β’ No bound on the FPR provided by ππ
β’ A threshold has to be chosen for the anomaly score:
πππ½π½ ππ β ππ 2
Boracchi, Carrera, ICIP 2020
SEMI-SUPERVISED APPROACHES
β’ CNN as data-driven feature extractor
β’ Transfer learning
β’ Self-supervised learning
β’ Autoencoders
β’ Domain-based
β’ Generative models
Boracchi, Carrera, ICIP 2020
GENERATIVE MODELS
Goal:
generative models generate, given a training set of images (data) ππ, other images (data) that are similar to those in ππ
Boracchi, Carrera, ICIP 2020
WHAT FOR GENERATIVE MODELS?
β’ Generative models can be used for data augmentation, simulation and planning
β’ Realistic samples for artwork, super-resolution, colorization, etc.
β’ You are getting close to the βholy grailβ of modeling the distribution of natural images
Radford, et al βUnsupervised representation learning with deep convolutional generative adversarial networks. ICLR 2016
Boracchi, Carrera, ICIP 2020
GENERATIVE ADVERSARIAL NETWORKS (GAN)
The GAN approach:
Do not look for an explicit density model ππππ describing the manifold of natural images.
Just find out a model able to generate samples that looks liketraining samples ππ β βππ
Instead of sampling from ππππ, just use:
β’ Sample a seed from a known distribution πππ§π§β’ Feed this seed to a learned transformation that generates
realistic samples, as if they were drawn from ππππ
Use a neural network to learn this transformation
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
Boracchi, Carrera, ICIP 2020
GENERATIVE ADVERSARIAL NETWORKS (GAN)
The GAN approach:
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
π§π§ βΌ πππ§π§
Generative
Network
Draw a sample from the noise distribution
Boracchi, Carrera, ICIP 2020
GENERATIVE ADVERSARIAL NETWORKS (GAN)
The GAN approach:
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
π§π§ βΌ πππ§π§
Generative
Network
Draw a sample from the noise distribution
Boracchi, Carrera, ICIP 2020
GENERATIVE ADVERSARIAL NETWORKS (GAN)
The GAN solution: Train a pair of neural networks with differenttasks that compete in a sort of two player game.
These models are:
β’ Generator π’π’ that produces realistic samples e.g. taking as input some random noise. π’π’ tries to fool the discriminator
β’ Discriminator ππ that takes as input an image and assess whether it is real or generated by π’π’
Train the two and at the end, keep only π’π’
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
Boracchi, Carrera, ICIP 2020
GAN Architecture
π§π§ βΌ πππ§π§
π’π’
ππ Real/Fake
Generated image
Real image fromthe training set ππ
Boracchi, Carrera, ICIP 2020
Using GAN
π§π§ βΌ πππ§π§
π’π’
ππ Real/Fake
Generated image
Real image fromthe training set ππ
Discriminator ππ is completely useless and as such dropped. After a successful GAN training, ππ is not able to distinguish the real/fake
Boracchi, Carrera, ICIP 2020
GAN
Both ππ and π’π’ are conveniently chosen as Neural Networks
Setting up the stage
Our networks take as input:
β’ ππ = ππ(ππ)β’ π’π’ = π’π’ ππ
ππ β βππ is an input image (either real or generated by π’π’) and ππ β βππ issome random noise to be fed to the generator.
Our network give as output:
ππ β :βππ β [0,1]
the posteriori for an input to be a true image (1)
π’π’ β :βππ β βππ
the generated imageGoodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
Boracchi, Carrera, ICIP 2020
GAN TRAINING
A good discriminator is such:
β’ ππ ππ is maximum when π π β ππ
β’ 1 βππ ππ is maximum when π π was generated from π’π’
β’ 1 βππ π’π’ ππ is maximum when ππ βΌ ππππ
Training ππ consists in maximizing the binary cross-entroy
maxππ
Eπ π βΌππππ logππ ππ + Eπ§π§βΌππππ log(1 βππ π’π’ ππ )
Boracchi, Carrera, ICIP 2020
GAN TRAINING
A good discriminator is such:
β’ ππ ππ is maximum when π π β ππ
β’ 1 βππ ππ is maximum when π π was generated from π’π’
β’ 1 βππ π’π’ ππ is maximum when ππ βΌ ππππ
Training ππ consists in maximizing the binary cross-entroy
maxππ
Eπ π βΌππππ logππ ππ + Eπ§π§βΌππππ log(1 βππ π’π’ ππ )
This has to be 1 since π π βΌ ππππ, thus images are real
This has to be 0 since π’π’ ππis a generated (fake) image
Boracchi, Carrera, ICIP 2020
GAN TRAINING
A good discriminator is such:
β’ ππ ππ is maximum when π π β ππ
β’ 1 βππ ππ is maximum when π π was generated from π’π’
β’ 1 βππ π’π’ ππ is maximum when ππ βΌ ππππ
Training ππ consists in maximizing the binary cross-entroy
maxππ
Eπ π βΌππππ logππ ππ + Eπ§π§βΌππππ log(1 βππ π’π’ ππ )
A good generator π’π’ is the one which makes ππ to fail
minπ’π’
maxππ
Eπ π βΌππππ logππ ππ + Eπ§π§βΌππππ log(1 βππ π’π’ ππ )
Boracchi, Carrera, ICIP 2020
ILLUSTRATION
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
ππ realππ fakeππ(ππ)
ππ
ππ
Boracchi, Carrera, ICIP 2020
MNIST
Generated samples
nearest sample to the second-last
column
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
Boracchi, Carrera, ICIP 2020
INTERPOLATION IN THE LATENT SPACE
We can interpolate between two points in the latent space and obtain smooth transitions from a digit to another one
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, ... & Bengio, Y. Generative adversarial nets NIPS 2014
Boracchi, Carrera, ICIP 2020
GAN FOR ANOMALY DETECTION
Idea: let us train a GAN on normal data. We expect that the generator π’π’ cannot generate any anomalous sample ππ.
Problem: Given a test sample ππ how can we determine if it could be generated by π’π’?
π’π’
Latent space Manifold β³ of normal data
Anomalous data
Boracchi, Carrera, ICIP 2020
ANOGAN
Project the test sample ππ on the manifold β³ by solving the optimization problem:
οΏ½ππ = minππ
π’π’ ππ β ππ + ππ log(1 βππ π’π’ ππ )
β’ π’π’ ππ β ππ ensures that ππ is well approximated by the generator
β’ log(1 βππ π’π’ ππ ) ensures that the projection π’π’ οΏ½ππ is similar to a real (normal) sample
π’π’
Latent space Manifold β³ of normal data
οΏ½ππ
Boracchi, Carrera, ICIP 2020Schlegl et al, "Unsupervised anomaly detection with generative adversarial networks to guide marker discovery", IPMI 2017
ANOGAN
Project the test sample ππ on the manifold β³ by solving the optimization problem:
οΏ½ππ = minππ
π’π’ ππ β ππ + ππ log(1 βππ π’π’ ππ )
β’ π’π’ ππ β ππ ensures that ππ is well approximated by the generator
β’ log(1 βππ π’π’ ππ ) ensures that the projection π’π’ οΏ½ππ is similar to a real (normal) sample (since π’π’ fools ππ)
Anomaly score:
π π πππ π π π π π ππ = π’π’ οΏ½ππ β ππ + ππ log(1 βππ π’π’ οΏ½ππ )
Boracchi, Carrera, ICIP 2020
ANOGAN
Project the test sample ππ on the manifold β³ by solving the optimization problem:
οΏ½ππ = minππ
π’π’ ππ β ππ + ππ log(1 βππ π’π’ ππ )
β’ π’π’ ππ β ππ ensures that ππ is well approximated by the generator
β’ log(1 βππ π’π’ ππ ) ensures that the projection π’π’ οΏ½ππ is similar to a real (normal) sample (since π’π’ fools ππ)
Anomaly score:
π π πππ π π π π π ππ = π’π’ οΏ½ππ β ππ + ππ log(1 βππ π’π’ οΏ½ππ )
We need to solve an optimization problem for each test sample!
Schlegl et al, "Unsupervised anomaly detection with generative adversarial networks to guide marker discovery", IPMI 2017
Boracchi, Carrera, ICIP 2020Donahue et al, "Adversarial feature learning", ICLR 2017
BIDIRECTIONAL GAN
ππ
β°(ππ)
π’π’
β°
π’π’(ππ)
ππ
πππ’π’ ππ , ππ
ππ,β° ππ
Latent Space Data SpaceGenerator
Encoder
Discriminator
Boracchi, Carrera, ICIP 2020Donahue et al, "Adversarial feature learning", ICLR 2017
BIDIRECTIONAL GAN
ππ
β°(ππ)
π’π’
β°
π’π’(ππ)
ππ
πππ’π’ ππ , ππ
ππ,β° ππ
Latent Space Data SpaceGenerator
Encoder
Discriminator
minπ’π’,β°
maxππ
β(ππ,β°,π’π’)
β ππ,β°,π’π’ = Eπ π βΌππππ logππ ππ,β°(ππ + Eπ§π§βΌππππ log(1 βππ π’π’ ππ , ππ )
Boracchi, Carrera, ICIP 2020Donahue et al, "Adversarial feature learning", ICLR 2017
BIDIRECTIONAL GAN
ππ
β°(ππ)
π’π’
β°
π’π’(ππ)
ππ
πππ’π’ ππ , ππ
ππ,β° ππ
Latent Space Data SpaceGenerator
Encoder
Discriminator
It can be proved that on the manifold β³ (i.e. on normal data):
β° = π’π’β1
Boracchi, Carrera, ICIP 2020Zenati et al, "Efficient GAN-based anomaly detection", Workshop ICLR 2018
ANOGAN IMPROVED
We can use BiGAN to efficiently invert the generator in AnoGAN:
οΏ½ππ = minππ
π’π’ ππ β ππ + ππ log(1 βππ π’π’ ππ )
οΏ½ππ = β°(ππ)
Boracchi, Carrera, ICIP 2020Zenati et al, "Efficient GAN-based anomaly detection", Workshop ICLR 2018
ANOGAN IMPROVED
We can use BiGAN to efficiently invert the generator in AnoGAN:
οΏ½ππ = minππ
π’π’ ππ β ππ + ππ log(1 βππ π’π’ ππ )
οΏ½ππ = β°(ππ)
π π πππ π π π π π ππ = π’π’ οΏ½ππ β ππ + ππ log 1 βππ π’π’ οΏ½ππ =
= π’π’(β° ππ ) β ππ + ππ log 1 β ππ π’π’ β°(ππ)
Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018
AUTOENCODER AS GENERATIVE MODELS
Denoising Autoencoder
Discriminatorβ
ππ
ππ + π©π©(0,ππ2) ππ πππ [0,1]
β’ β tries to reconstruct sample ππ from its noisy version οΏ½ππ=ππ +π©π©(0,ππ2)
β’ ππ tries to discriminate between noise-free and reconstructed samples
π’π’β°
Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018
AUTOENCODER AS GENERATIVE MODELS
Denoising Autoencoder
Discriminatorβ
ππ
ππ + π©π©(0,ππ2) ππ πππ [0,1]
β’ β tries to reconstruct sample ππ from its noisy version οΏ½ππ=ππ +π©π©(0,ππ2)
β’ ππ tries to discriminate between noise-free and reconstructed samples
π’π’β°
Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018
AUTOENCODER AS GENERATIVE MODELS
Denoising Autoencoder
Discriminatorβ
ππ + π©π©(0,ππ2) ππ πππ [0,1]
minβ
maxππ
EππβΌππππ logππ ππ + EοΏ½ππβΌππππ+π©π©(0,ππ2) log(1 βππ β οΏ½ππ )
πππ’π’β°
Boracchi, Carrera, ICIP 2020Sabokrou et al, "Adversarially learned one-class classifier for novelty detection", CVPR 2018
AUTOENCODER AS GENERATIVE MODELS
Denoising Autoencoder
Discriminatorβ
ππ + π©π©(0,ππ2) ππ πππ [0,1]
minβ
maxππ
EππβΌππππ logππ ππ + EοΏ½ππβΌππππ+π©π©(0,ππ2) log(1 β ππ β οΏ½ππ )
We expect that β can successfully reconstruct (thus, fool ππ) only normal sample:
π π πππ π π π π π ππ = 1 βππ(β ππ )
πππ’π’β°
Boracchi, Carrera, ICIP 2020Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019
AUTOENCODER AS GENERATIVE MODELS
The generator π’π’ may be able to generate samples also of anomalous class
In this case it would be impossible to use π’π’ to discriminate between normal and anomalous samples
π’π’
β°This may happen if the β° maps anomalous samples in region of the latent space that were not explored during training
Boracchi, Carrera, ICIP 2020Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019
AUTOENCODER AS GENERATIVE MODELS
The generator π’π’ may be able to generate samples also of anomalous class
In this case it would be impossible to use π’π’ to discriminate between normal and anomalous samples
π’π’
β°This may happen if the β° maps anomalous samples in region of the latent space that were not explored during training
Idea: we can enforce a known distribution on the latent space
Boracchi, Carrera, ICIP 2020Zong et al, βDeep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detectionβ, ICLR 2018
AUTOENCODER AS GENERATIVE MODELS
ππ πΆπΆ πππ
πΆπΆ πΈπΈ
Estimate the GM parameters as:
ππππ =1πΉπΉοΏ½ππ
πΎπΎππ,ππ
ππππ =βππ πΎπΎππ,πππΆπΆππ
Ξ£πππΎπΎππ,ππ
Ξ£ππ =βππ πΎπΎππ,ππ πΆπΆππ β ππππ πΆπΆππ β ππππ ππ
Ξ£πππΎπΎππ,ππ
Estimation Network
Boracchi, Carrera, ICIP 2020
AUTOENCODER AS GENERATIVE MODELS
The generator π’π’ may be able to generate samples also of anomalous class
In this case it would be impossible to use π’π’ to discriminate between normal and anomalous samples
Idea: we can enforce a known distribution on the latent space
This can be done using a latentdiscriminator on the latent space π’π’
β°
Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019
Boracchi, Carrera, ICIP 2020
MUCH MORE COMPLICATED ARCHITECTURE
Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019
Boracchi, Carrera, ICIP 2020
MUCH MORE COMPLICATED ARCHITECTURE
Perera et al, "OCGAN: One-class novelty detection using GANs with constrained latent representations", CVPR 2019
The posterior probability of the classifier is used as anomaly score
Concluding Remarks
Boracchi, Carrera, ICIP 2020
A FEW CONCLUDING REMARKS
Nowadays, anomaly detection problems are ubiquitous in engineering and applied sciences.
The presented general framework encompasses most of algorithms in the literature, which often boil down to
β’ Feature extraction
β’ Definition of suitable statistics (anomaly score)
β’ Applying decision rules to a set of random variables.
Boracchi, Carrera, ICIP 2020
A FEW CONCLUDING REMARKS
When data are characterized by complex structures, as in case of images and signals, the feature extraction phase is the most critical one.
Data-driven models provide meaningful representations of images, and these can be used to find good feature for anomaly detection.
Nowadays the most powerful algorithms for feature extraction are based on deep learning, and in particular Convolutional Neural Networks
Boracchi, Carrera, ICIP 2020
A FEW CONCLUDING REMARKS
The key of the success of CNNs is the end-to-end learning, and in our case the feature extractor and the anomaly score are jointly learned.
Still, anomaly-detection methods based on deep learning rely on decision rule, which has to be defined according to some statistical criteria, e.g. to guarantee a fixed false positive rate.