Anomaly Detection and Automatic Labeling with Deep Learning
-
Upload
adam-gibson -
Category
Data & Analytics
-
view
2.694 -
download
0
Transcript of Anomaly Detection and Automatic Labeling with Deep Learning
![Page 1: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/1.jpg)
Anomaly Detection with Deep Learning
November 2017
1
![Page 2: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/2.jpg)
Founded 2014Funding $6.3M OSS 3,700 Github Forks
300,000+ downloads/mo.Team 35 employees; 25 engineers; 7 PhDs
SKYMIND OVERVIEW
![Page 3: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/3.jpg)
OUR BOOK GIVEAWAY!(pub. Aug. 2017)
![Page 4: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/4.jpg)
SKIL スカイマインドスキルOur Production Deep Learning Solutionディープラーニングソリューション
confidential 4
![Page 5: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/5.jpg)
• There are 5 main approaches to doing anomaly detection• Probabilistic-based• Distance-based• Domain-based• Reconstruction-based• Information Theoretic-based
• All of these methods have some sort of drawback that prevents them from being applicable to any type of data. They are either:
• Have built-in assumptions of the data (like Gaussian Mixture Models)• Require specific domain knowledge• Only detects certain patterns of anomalies• Not suitable for data with high temporal dependencies• Not suitable for multivariate data or computationally infeasible at our scale
• Multiple approaches are necessary to have a comprehensive detection pipeline
Anomaly Detection Approachesアノマリー検出アプローチ
5
![Page 6: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/6.jpg)
Anomaly Detection Exampleアノマリー検出例
6
![Page 7: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/7.jpg)
Cluster Based Methods1クラスターベース方式1
Cluster based methods workby creating a dictionary of non-anomalous data and finding theone that best matches the actualdata at inference time.
If the pattern was never seen before the reconstruction willbe very different than the actualdata.
クラスタベース方式が機能する非アノマリーデータ辞書を作成し、推論時における実際データに最も一致するものを見つける。
再構築前は従来絶対に見られなかったパターンであれば、実際とは大きく異なるデータであるといえる。
confidential 7
![Page 8: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/8.jpg)
Cluster Based Methods2クラスターベース方式2
Dataデータ
Best Reconstruction最適な再生
Difference違い
8
![Page 9: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/9.jpg)
Join Raw data TransformFeed groups into Autoencoder and save
reconstruction error of center
エンコーダへ送り込んで中心的再構築エラーを保存
Input Data Reconstruction
Example workflow for anomaly detectionアノマリー検出のための作業例
9
![Page 10: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/10.jpg)
Example of VAE detecting anomaliesVAE (Variational Autoencoders 変分オートエンコーダー)検出アノマリー例
Low reconstruction error低い再生エラー
High reconstruction error低い再生エラー
confidential 10
![Page 11: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/11.jpg)
Data Processing Step 3: Trainingデータプロセシング ステップ3:トレーニング
The autoencoder will be trained to recreate the input data as closely as possible (one row at a time)
11
![Page 12: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/12.jpg)
Data Processing Step 4: Rankingデータプロセシング ステップ4:ランキング
The trained autoencoder will then be run on the new data and we will store the error (sum of mean squared difference per column per row) in a Ranking engine.Unusual patterns will have a high reconstruction error
Output (Reconstruction)Input Data
Output (Reconstruction)
Reconstruction
-
Input
= 2 Error Table
12
![Page 13: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/13.jpg)
Problem: No Free lunch課題:ノーフリーランチ定理
Wolpert’s no free lunch theorem states that there is no single machine learning algorithm that can performwell on every task. Deep Learning is itself a set of very different techniques that are good or bad at variousproblems.
The system will have to use various algorithms to detect different types of anomalies and possible different root causes.
Class imbalance will be a problem at the beginning of the system’s lifetime. Labeled data will be overshadowedby the unlabeled data and the Anomaly detectors will not be able to improve for some time. One possiblesolution is to use Pseudo labeling to label all data from a trained classifier. These labels will be very noisy at first so they might have to be deployed in stages.
confidential 13
![Page 14: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/14.jpg)
• Systems need to be able to handle terabytes of data at minimum• The system is required to train a large number of neural networks within a short
time-frame. GPU servers are a cost effective way to achieve the computation resources necessary.
• Due to space considerations GPU servers are storage inefficient and the system will employ the Hadoop File System and Spark on commodity servers to meet the storage requirements.
• The system must scale to larger problems.
System Requirementsシステム要求事項
confidential 14
![Page 15: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/15.jpg)
Use kmeans and tsne highlighting clusters to label data points• Uses the representation from the autoencoder to automatically groupdata• TSNE visualization allows highlighting and automatic labeling• Use KNN and VPTrees to sample the hidden activations learned from the neural
net to interactively label
Automatic Labeling
confidential 15
![Page 16: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/16.jpg)
• Autoencoders can be trained to identify causes of certain kinds of behavior• “Spikes” in reconstruction error on time series can be used in detecting problems
in infrastructure as well as in network monitoring (dropped connections, unusually high latency
• Use KNN and VPTrees to sample the hidden activations learned from the neural net to interactively label
Root cause Analysis
confidential 16
![Page 17: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/17.jpg)
• The system will not use a redundant environments since disaster recovery is not a requirement.
• Inside an environment the system will have redundant hardware and can tolerate the loss of one GPU node and one App node without service degradation.
• This system does not employ a remote backup strategy because all data is ephemeral and can be recreated from the data on S3.
Design Considerations for Production
confidential 17
![Page 18: Anomaly Detection and Automatic Labeling with Deep Learning](https://reader033.fdocuments.us/reader033/viewer/2022052116/5a662b147f8b9a214f8b5ed1/html5/thumbnails/18.jpg)
THANK YOUありがとうございました。
18