ATHMoS: A Satellite Anomaly Detection Framework Microservice … .pdf · 2017-06-23 · ATHMoS: A...

Post on 23-May-2020

8 views 0 download

Transcript of ATHMoS: A Satellite Anomaly Detection Framework Microservice … .pdf · 2017-06-23 · ATHMoS: A...

ATHMoS:

A Satellite Anomaly Detection Framework

– Microservice Architecture

Corey O‘MearaGerman Space Operation Center

ESAW 2017

June 21st, 2017

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 1

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 2

We want to compare new telemetry data withpast data to see how its behavior has changed

Past Telemetry Todays Telemetry

General Idea of Anomaly Detection

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 3

Why Automated Telemetry Checking?

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 4

Big Data, Big Computations…

1. Noise Extraction and De-Trending

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 5

2. Sub-Cluster Dimension Approximation

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 6

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 7

3. Anomaly Score Computation

Raw Data

Smoothed

Data

Noise Data

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 8

4. Deep Learning NN: Time Series Forecasting

90-min Prediction

45-min Recent Data

ATHMoS: System Architecture Requirements

High Availability

Necessary to ensure 24/7 monitoring in the operational environment

Scalability:

The resulting system should be easily scalable for more

computational power

Performance:

The complete chain from import over processing to storing the data

should complete in a few minutes for a days worth of telemetry data

Modifiability:

The processing chain needs to be easily modifiable to allow for

individual mission-specific adjustments

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 9

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 10

What Are Microservices? (a.k.a. „fine-grained“ SoA)

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 11

TelemetryDB: Apache Cassandra

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 12

Used by (among many others):

• Distributed database

• Supports multiple datacenters

• Highly available and scalable

• Developed for Big Data @Facebook

Apache Cassandra

Data Processing: Apache Spark

Apache Spark

• Distributed computational engine

• Highly scalable and asynchronous calculations

• „Next Generation“ Hadoop (~100 times faster)

• Allows real-time processing (data streams)

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 13

Used by (among many others):

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 14

Language: Scala

Akka (Scala)

• Framework to develop fault tolerant services

• Both OOP and Functional Programming

• Asyncronous communication

• Typesafe better version of Python (imo)

Used by (among many others):

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 15

Message Broker: Apache Kafka

Apache Kafka

• Originally developed at LinkedIn

• Open Sourced in 2011

• Distributed messaging (pub/sub) system

• By far, the fastest w.r.t messages/sec

Used by (among many others):

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 16

ATHMoS Microservices Overview

Microservice Orchestration: DC/OS

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 17

Deployment Infrastructure

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 18

x2 x6

x2

Cluster Details

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 19

App Scheduler

DC/OS Dashboard

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 20

App Launcher and Scheduler

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 21

Cluster Details

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 22

Allows for a Reactive

System

App Resource Monitoring > O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 23

Application Instance CPU

Usage

Application Instance RAM

Usage

Application Instance Cached

Memory Usage

Application Instance Network

Usage

Total System Usage Statistics

ATHMoS as a Reactive System

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 24

The distributed nature of the system allows us to define limits for CPU,

RAM, network connections/traffic for each component of the software

such that if it is under heavy load it will elastically scale that individual

component

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 25

Outlook: Microservice Template

Infrastructure LayerDomain LayerApplication

Layer

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 26

Outlook: Continuous Integration and Delivery

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 27

In Conclusion…read this!

Thanks for your attention!

@cpomeara

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 28

References

1. O’Meara, C., Schlag, L., Faltenbacher, L., Wickler, M., “ATHMoS: Automated Telemetry

Health Monitoring System at GSOC using Outlier Detection and Supervised Machine

Learning,” Proceedings of the AIAA SpaceOps 2016 Conference, May 2016.

2. “Building Microservices”, Sam Newman, O’Reilly Media, Inc., 2015.

3. http://martinfowler.com/articles/microservices.html

4. https://www.linkedin.com/pulse/20141128054428-13516803-monolithic-vs-microservice-

architecture

5. http://www.lab41.org/transformers-rdd-in-disguise/

6. https://www.infoq.com/articles/apache-kafka

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 29

Backup Slides

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 30

Frontend Prototype

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 31

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 32

Frontend Prototype

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 33

Frontend Prototype

Example: Battery Voltage Drop

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 34

Triggered OOL

Battery Voltage Anomaly first

detected when OOL

triggered during an image

acquisition datatake during

eclipse

Example: Battery Voltage Drop

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 35

ATHMoS correctly identified

not only the anomaly where

the OOL was triggered but

anomalies of the same type

that occurred more than 1

month in advance

OOL

July 2015

Outlie

r S

core

(0-1

00%

)

Example: Battery Voltage Drop

> O'Meara • ATHMoS > 21.06.17www.DLR.de/rb • Slide 36

Outlie

r S

core

(0-1

00%

)

June 2015

ATHMoS correctly identified

not only the anomaly where

the OOL was triggered but

anomalies of the same type

that occurred more than 1

month in advance