HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data...
Transcript of HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data...
![Page 1: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/1.jpg)
HPC meets Big DataDr. Putchong Uthayopas
Department of Computer Engineering,
Faculty of Engineering, Kasetsart University
Email: [email protected]
![Page 2: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/2.jpg)
Brief History of HPC Platform
• HPC on single processor
• HPC using Vector machine
• HPC with SMP, SIMD
• MPP and Cluster Computing
• GPU computing
• Heteronomous computing
Mostly build for compute-intensive application!
![Page 3: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/3.jpg)
![Page 4: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/4.jpg)
![Page 5: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/5.jpg)
![Page 6: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/6.jpg)
We are using Big Data All the time• How can google map know about the
traffic condition?
![Page 7: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/7.jpg)
7
Facebook Usage Statistics (March 2016)
• 1.09 billion daily active users
• 989 million mobile daily active users
• 1.65 billion monthly active users
• 1.51 billion mobile monthly active users
Source: http://newsroom.fb.com/company-info/
![Page 8: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/8.jpg)
![Page 9: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/9.jpg)
9
![Page 10: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/10.jpg)
![Page 11: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/11.jpg)
Google Open Image Dataset
• The Open Images Dataset
• YouTube-8M Dataset
• Google Books Ngrams
• Google Trends Datastore
https://www.infoworld.com/article/3131515/artificial-intelligence/4-google-data-sets-to-kickstart-machine-learning.html
![Page 12: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/12.jpg)
Store indefinitely Analyze See resultsGather data
from all sources
Iterate
New big data thinking: All data has value
All data has potential value
Data hoarding
No defined schema—stored in native format
Schema is imposed and transformations are done at query time (schema-on-read).
Apps and users interpret the data as they see fit
12
![Page 13: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/13.jpg)
What is Data Science?
• Data Science is the extraction of knowledge from large volumes of data that are structured or unstructured.
Source:Wikipedia
![Page 14: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/14.jpg)
14
K-Mean on iris data
Iris setosa
Iris versicolor
Iris virginica
SparkML kmean app: https://github.com/apache/spark/tree/branch-1.5/examples/src/main/python/mllib
Iris data: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
Ref: https://en.wikipedia.org/wiki/Iris_flower_data_set
![Page 15: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/15.jpg)
15
Ex. Iris K-mean
Source: http://stackoverflow.com/questions/6645895/calculating-the-percentage-of-variance-measure-for-k-means
![Page 16: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/16.jpg)
Kaggle• Titanic competition
– What is the factor involve in surviving Titanic
• Data set of the passengers has been provided
https://www.kaggle.com/c/titanic
![Page 17: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/17.jpg)
Deep Learning
![Page 18: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/18.jpg)
New Technology to Handle Big Data and Machine learning
• Hadoop/Spark Ecosystem
• GPU system
• GPU Cluster
• AI supercomputer using dense GPUs
![Page 19: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/19.jpg)
![Page 20: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/20.jpg)
![Page 21: HPC meets Big Data - NECTEC · 2018-05-30 · Store indefinitely Analyze See results Gather data from all sources Iterate New big data thinking: All data has value All data has potential](https://reader033.fdocuments.us/reader033/viewer/2022042223/5ec99e4581fedd21814d8988/html5/thumbnails/21.jpg)
Summary
• Scientific Research is rapidly changing to Data Intensive Research– Driving by Big data analytics and Machine learning
• Innovative Platform is needed that put data storage, and very high computing power in one place– Hadoop move computing to data not traditional
data move to computing
• Everything going to the CLOUD