Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan,...
Transcript of Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan,...
![Page 1: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/1.jpg)
1
![Page 2: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/2.jpg)
Data Analytics practice
[press space bar]2
![Page 3: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/3.jpg)
Agenda
1· Data Analytics
2· SafeClouds
3· Conclusions & challenges
What's new and what's not?
Quick wins
The Data Science practice
The learning problem
The project
The partners
The work programme
Scenarios, outcomes
3
![Page 4: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/4.jpg)
Data Analytics
4 . 1
![Page 5: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/5.jpg)
Data Analytics
What's new and what's not18th century
1920's
1980's
1990's
2000's
Future in aviation
Bayesian statistics
Parametric models
Highly non-linear relationships in real complex datasets
New analytical techniques, large data sets, high non-linearity
Machine learning concepts; Storage, Computing, Communications
Focus on processes that provide actionable analytics
4 . 2
![Page 6: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/6.jpg)
The Data Science practice in aviation
??Data Analytics
Individualisation trumps universalsIndividualisation trumps universals
Intangibles that appear to be completely intractable can be measured andIntangibles that appear to be completely intractable can be measured andpredictedpredicted
4 . 3
![Page 7: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/7.jpg)
The Data Science practice
4 . 4
![Page 8: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/8.jpg)
What's the learning problem?Data Analytics
4 . 5
![Page 9: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/9.jpg)
What's the learning problem?Data Analytics
4 . 6
![Page 10: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/10.jpg)
Building models with massive data
The data models and solving the inference problem have challenges:
· Multi-dimensionality, heterogeneity and incompleteness of data, volume of data, velocity,...
The discipline: Knowledge Discovery on massive dataThe discipline: Knowledge Discovery on massive data
· Model selection, including complexity/over-Ztting trade-offs
· Model running, including selection of training data, validation and testing
· Model deployment, including stability and trade-offs precision-accuracy-recall
Data Analytics
4 . 7
![Page 11: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/11.jpg)
Building KDD models with massive dataData Analytics
4 . 8
![Page 12: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/12.jpg)
5 . 1
![Page 13: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/13.jpg)
SafeClouds
data management, infrastructure, data protection, data mining tools, visualisation
Aviation safety knowledge discovery
Systematic identiZcation of hazards
Applied research - laboratory validation (TRL5)
5 . 2
![Page 14: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/14.jpg)
5 . 3
![Page 15: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/15.jpg)
SafeClouds research project
5 . 4
![Page 16: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/16.jpg)
Some scenarios of interest: Some scenarios of interest:
SafeClouds research project
Real time approach congestion monitoring
Proper separation with terrain
Level busts
Runway performance
Runway excursions
Unstable approaches
5 . 5
![Page 17: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/17.jpg)
EASASafeClouds research project
5 . 6
![Page 18: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/18.jpg)
SafeClouds outcomes
Questions
Scenarios descriptionSafeClouds platform
Datasets
Case Studies
Tools
Case Studies analyticsCase Studies analytics
Agile analyticsAgile analyticsmethodologymethodology
Outputs
5 . 7
![Page 19: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/19.jpg)
Next stepsNext steps
Consortium Agreement sign. inc. data protection & sharing - Sept '16
Grant Agreement signature - Sept '16
Project starts - early Oct '16
Consortium Coordinator - Paula López-Catalá, [email protected]
SafeClouds research project
5 . 8
![Page 20: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/20.jpg)
Conclusions & challenges
6 . 1
![Page 21: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/21.jpg)
ConclusionsData ingest
Cleanse
Fuse
Build Models
Build infrastructure
Secure
Enable the data Build/govern theplatform
Engage the business
Discover
Monitor
Deploy
Data sources
Complexity
Costs
Skill gap in ML-aviation
Reliance on IT
Trust / Privacy
Agile methodologies
ROI metrics
Change processes
Challenges
6 . 2
![Page 22: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/22.jpg)
Some thoughts on challenges
· Analytics Center of Excellence is not an IT organisation
· Data Science agile management is a must
· Reusable data & logic for governance and consistency
· Great tools for collaboration, visual tools.
6 . 3
![Page 23: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/23.jpg)
Closing thoughts
DifZcult to see "quick wins" or "low-hanging fruits"
Your model is not what your data scientists design,it’s what your engineers implement - translation business totechnical is key
Data Science is a craft - there is no Excel+++
6 . 4
![Page 24: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/24.jpg)
Thank you!
References
Annual Safety Review, EASA, 2016
Data, information and analytics as services, Delen & Demirkan, 2012
Data Science for business, Provost & Fawcett, 2013
European Big Data Value Strategic Research Agenda, 2015
Frontiers in Massive Data Analytics, National Academy of Sciences, 2013
Network analysis reveals patterns behind air safety events, 2014
The irrational effectiveness of mathematics in natural sciences, Wigner, 1960
The irrational effectiveness of data, Norwig, 2009; youtube.com/watch?v=yvDCzhbjYWs
SafeClouds documentation - to be published from October 2016 in www.SafeClouds.eu
Synchronisation likelihood in aircraft trajectories, Zanin, 2013
David Pérez - [email protected]
www.SafeClouds.eu
this presentation - slides.innaxis.org/2016.09.08.SafeClouds
6 . 5
![Page 25: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/25.jpg)
BackUp
7
![Page 26: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/26.jpg)
Hazards
A hazard can be considered as a dormant potential for harm
which is present in one form or another within the aviation system or its environment.
This potential for harm may be in the form of
- a natural hazard such as terrain, or
- a technical hazard such as wrong runway markings
8
![Page 27: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/27.jpg)
Data Analytics
Building KDD models with massive data
9
![Page 28: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/28.jpg)
The SafeClouds initiative
The SafeClouds research initiative is promoted by a complete spectrum of Aviation
and ICT European stakeholders to develop big data, data protection and data mining
tools for the improvement of aviation safety.
SafeClouds presents a project to develop aviation safety knowledge discovery
techniques from a large set of distributed datasets.
Novel systematic identiZcation of hazards and handling of data and processes
tailored to the requirements of aviation that are efZcient, effective and acceptable by
all the relevant parties in the aviation value-chain.
10
![Page 29: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/29.jpg)
Addressing the learning problem
11 . 1
![Page 30: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/30.jpg)
I - Feature extraction
II - Feature combinationMostly data management
Domain knowledge
Mostly mathDomain knowledge
Addressing the learning problem
Safety KDD research model
11 . 2
![Page 31: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/31.jpg)
Addressing the learning problem
Safety KDD research model
I - Feature extraction
II - Feature combination
Hazards and
Leading indicators
11 . 3
![Page 32: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/32.jpg)
Eurocontrol trafZc data - 10 months ECAC trafZc, 2min resolution
Low frequency of aviation safety events
Medium term data-driven prediction on LoS events?
KDD study on prediction of separation
1 Classical features describing the status of airspace
2 Complex network features
3 Historical trajectory likelihood-based features
Data Analytics
Building KDD models with massive data
11 . 4
![Page 33: Data Analytics practice - Amazon S3...Data, information and analytics as services, Delen & Demirkan, 2012 Data Science for business, Provost & Fawcett, 2013 European Big Data Value](https://reader034.fdocuments.us/reader034/viewer/2022042218/5ec4619117d06d7cdf35baea/html5/thumbnails/33.jpg)
Concepts
Recall literally is how many of the how many of the truetrue positives were positives were recalledrecalled, i.e. how manyof the correct hits were also found.
Precision is how many of the how many of the returnedreturned hits were hits were truetrue positive positive i.e. how many ofthe found were correct hits.
Accuracy is how many of the times the algorithms were correct, i.e. total truepositives plus true negatives
recall = TP / (TP + FN)precision = TP / (TP + FP)accuracy = (TP+TN)/ ALL
12