Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB...

22
Institute of Flight System Dynamics MATLAB EXPO 2016 May 10 th , 2016 Analysis of Operational Flight Data in Hadoop using MapReduce and the MATLAB Distributed Computing Server (MDCS) MATLAB EXPO 2016 Lukas Höhndorf Javensius Sembiring, Robin Karpstein, and Florian Holzapfel

Transcript of Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB...

Page 1: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

1

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Analysis of Operational Flight Data

in Hadoop using MapReduce

and the MATLAB Distributed Computing Server

(MDCS)

MATLAB EXPO 2016

Lukas Höhndorf

Javensius Sembiring, Robin Karpstein, and Florian Holzapfel

Page 2: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

2

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Outline

• Mission of the Flight Safety Group

• Big Data Concepts Available in MATLAB

• MATLAB Distributed Computing Server (MDCS)

• Application

• Summary

quick access recorder data

Lat 6.000

Lon 100.345

Page 3: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

3

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Europe aims at less than one accident per ten million

flights (i.e. accident probability of 10-7 per flight).

ICAO DOC 9859

Flightpath 2050

Airlines are required to implement a Safety Management

System (SMS)

SMS requires operators also to define their own

Acceptable Level of Safety (ALoS).1

21 accident10 million

commercial flights

Definition of ALoS within ICAO DOC 9859:

“The minimum level of safety performance […] of a service provider, as defined

in its safety management […] .”

Background

Page 4: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

4

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Predictive Analysis:

Making quantitative statements about the future state

based on previous experience and knowledge.

Classical statistical approach

𝑷 𝑨𝒄𝒄𝒊𝒅𝒆𝒏𝒕 =𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑐𝑐𝑖𝑑𝑒𝑛𝑡𝑠

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑙𝑖𝑔ℎ𝑡𝑠

Runway overrun example

𝑷 𝑨𝒄𝒄𝒊𝒅𝒆𝒏𝒕 =0

400 000= 0

?

General Concept

Page 5: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

5

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

General Concept

Contributing Factors

Weight

Speed

Wind

Flaps

Incident Probability

i.e. Overrun Probability

Fre

qu

en

cy

e.g. OverrunIncident Model

Stop margin

Image: IATA

Start of

Braking

… …

Advanced

Statistical

Methods

Model and Method Goal

Page 6: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

6

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

General Concept

Page 7: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

7

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

General Concept

To quantify accident probabilities in aviation, there

are a lot of computations and interactions with a

huge volume and different type of data necessary.

Page 8: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

8

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Flight Data

Source: http://www.sagem.com/aerospace/commercial-aircraft/information-system/aircraft-condition-monitoring-system-acms

• Up to 2500 variables (depending on the aircraft

and the airline) are recorded.

• Frequency usually between ¼ Hertz and 8 Hertz,

depending on the variable.

• Number of the recorded variables increased

significantly in the last years.

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4x 10

4 Altitude

Time [s]D

ata

[ft

]

Page 9: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

9

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Big Data

• Nowadays, many people talk about Big Data.

To describe Big Data, the 4 V’s are common:

• Volume

• Velocity

• Variety

• Veracity

• Goal of the project:

• Learn about existing Big Data concepts available in MATLAB

• Apply these concepts to flight data analysis

Source: https://www.ucl.ac.uk/big-data/bdi/images/data-head.jpg

Page 10: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

10

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Hadoop

Datastore

HDFS

Reduce

Node

Node

Node Data

Data

Data

Map

ReduceMap

ReduceMap

Map Reduce

Map

Map

Reduce

Reduce

Source: MATHWORKS

Page 11: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

11

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Hadoop - Ecosystem

Additional software packages for Hadoop:

• Flume - Efficiently collecting, aggregating, and moving large amounts of log data

• Sqoop - Transfering data between relational databases and Hadoop

• Hbase - Open source, non-relational, distributed database

• Hive - Data Warehouse for providing data summarization, query, and analysis

• Oozie - Server-based workflow scheduling system to manage Hadoop jobs

• Pig - High-level platform for creating MapReduce programs used with Hadoop

• Spark - Open source cluster computing framework

• …

Page 12: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

12

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

HDFS

• A distributed file system designed to run on commodity hardware.

• HDFS provides high throughput access to application data and is

suitable for applications that have large data sets.

• Files are cut apart in chunks and distributed among various data nodes.

Source: http://www.cloudera.com/content/dam/cloudera/product-assets/hdfs-data-distribution.png

Page 13: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

13

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

MATLAB Interface to HDFS

• From MATLAB, direct access to HDFS using datastore objects is very

convenient.

Source: http://hadoop.apache.org/

MATLAB datastore – Object

Access data with MATLAB read

(observe analogy between HDFS chunk size and

MATLAB ReadSize)

Page 14: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

14

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

MapReduce

• MapReduce is a programming technique for analyzing data sets that do

not fit in memory

• MapReduce consists of two parts: Map and Reduce

• Map: The Map algorithms are applied to individual chunks of the big data

file (compare chunk structure of HDFS)

• Reduce: The results off all the applications of Map are combined in an

application of a Reduce function.

• Typical example: Finding minimum value of a (very big) array

outds = mapreduce(ds,mapfun,reducefun,mr);

Page 15: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

15

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

MATLAB Distributed Computing Server (MDCS)

• MDCS lets you run computationally

intensive MATLAB programs and

Simulink models on computer

clusters, clouds, and grids, enabling

you to speed up computations and

solve large problems.

• Since R2014b, MDCS can be directly

connected to Hadoop (using the

Hadoop Job Scheduler)

• However, configure the MATLAB Job

Scheduler (MJS) proved to be more

convenient in our situation.Source: http://www.mathworks.com/help/mdce/index.html,

http://de.mathworks.com/cmsimages/62006_wl_mdcs_fig1_wl.png

Page 16: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

16

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

MATLAB Distributed Computing Server (MDCS)

Datastore

map.m

reduce.m

HDFS

Node Data

MATLAB

Distributed

Computing

Server

Node Data

Node Data

Map Reduce

Map Reduce

Map Reduce

Hadoop

Source: MATHWORKS

Page 17: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

17

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Application

Given: Flight data as time series and stored in Hadoop

(File size depends on many factors; currently up to 5 GB)

Goal: Detect departure and arrival airports and runways using MapReduce

and the MDCS

Source: Google Earth, Data SIO, NOAA, U.S. Navy, NGA, GEBCO, Image Landsat

Flight Data

Page 18: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

18

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Application

• 1st Step: “Roughly” detect location of lift

off and touchdown

→ Map

• 2nd Step: Compare coordinates with a

navigation database to obtain airports

and runways

→ Reduce

Source: Google Earth

Page 19: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

19

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Air /Ground

Latitude Longitude

… … …

1 48°21'57.62"N 11°48'35.73"E

1 48°21'56.26"N 11°48'4.15"E

1 48°21'53.38"N 11°47'34.21"E

0 48°21'51.12"N 11°47'0.36"E

0 48°21'48.34"N 11°46'27.18"E

0 48°21'46.50"N 11°46'4.12"E

… … …

Application

Chunk 1

Chunk 2

Map

Is there a lift off or

touchdown in the

chunk?

Chunk 1: No

Chunk 2: Yes

Save Coordinates

for Lift Off /

Touchdown

Chunk 3 …

Reduce

• Aggregate

information from

all chunks

• Compare

coordinates with

navigation

database and

retrieve airports

and runways

Page 20: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

20

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Results

Location Duration per flight

Local ~ 36 seconds

Hadoop Cluster using Hadoop Job Scheduler ~ 48 seconds

Hadoop Cluster using MATLAB Job Scheduler ~ 6 seconds

Following scenarios were tested with MapReduce:1. Local (Intel Core i5 4x3.1 GHZ, 16 GB DDR RAM, MATLAB R2015a)

2. Hadoop Cluster using Hadoop Job Scheduler (due to complicated

configuration of the Hadoop Job Scheduler, MapReduce was conducted on

only 1 machine, Intel Core i7 4x3.4 GHZ, 32 GB DDR RAM, MATLAB

R2014b)

3. Hadoop Cluster using MATLAB Job Scheduler (whole cluster available, 2 x

Intel Core i7 4x3.4 GHZ, 1x Intel Core i7 2x3.4 GHZ, 32/16/4 GB DDR

RAM, MATLAB R2014b)

Page 21: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

21

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Our Experiences with the Big Data

Concepts Available in MATLAB

Summary

Combination of MapReduce and

MDCS for Parallelization is Possible

Hadoop Storage (HDFS) Accessible

from MATLAB Using Datastore

Hadoop Storage (HDFS) Beneficial for

Storing Huge Data Files

MapReduce Concept Available in

MATLAB Allows to Analyze Big Data

Files

Big Data

Concepts and

MATLAB

Page 22: Analysis of Operational Flight Data in Hadoop using ... Institute of Flight System Dynamics MATLAB EXPO 2016 May 10th, 2016 Analysis of Operational Flight Data in Hadoop using MapReduce

22

Institute of

Flight System Dynamics

MATLAB EXPO 2016

May 10th, 2016

Thank you for your attention

Lukas Höhndorf, [email protected]

Javensius Sembiring, [email protected]

Robin Karpstein, [email protected]

Florian Holzapfel, [email protected]

Ludwig Drees, [email protected]

Chong Wang, [email protected]

Phillip Koppitz, [email protected]

Joachim Siegel, [email protected]

Institute of Flight System Dynamics

Technische Universität München

Boltzmannstraße 15

D-85748 Garching bei München

Deutschland / Germany

Phone: +49 89 289-16080

Fax: +49 89 289-16058

Source: Google Earth, 2016 DigitalGlobe