BigData in Cloud computingViet-Trung Tran@Vietstack
Sunday 1 February 15
Bio
Viet-Trung Tran
https://www.facebook.com/groups/BigDataStartUp/
SoICT, Trendiction S.A Luxembourg, Microsoft Research Cambridge, INRIA France, BKAV
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Google trendsGoogle MapReduce paper 2014
Sunday 1 February 15
BigData in science
Sunday 1 February 15
Sunday 1 February 15
The Data Science: The 4th Paradigm for Scientific Discovery
Last few decades
Thousand years ago
Today and the Future
Last few hundred years
2
22.
34
acG
aa
Κ−=###
$
%
&&&
'
( ρπ
Simulation of complex phenomena
Newton’s laws, Maxwell’s equations…
Description of natural phenomena
Crédits: Dennis Gannon
Sunday 1 February 15
What’s BigData
Data has always been Big. The one aspect that differs now, if compared with the past, would be the sheer scale and accessibility of Data, which is the direct result of the super efficient speeds in which data can now be computed. Big Data is therefore an all-encompassing term for any collection of large data sets that were once difficult to process.
Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.
Sunday 1 February 15
Data mining -> BigData mining?
Sunday 1 February 15
Simplified BigData stack
Data analytics & visualization
Data processing frameworks (Streaming, MapReduce, BSP
model)
Data management systems BlobSeer
Sunday 1 February 15
BigData management
Sunday 1 February 15
NoSQL
Sunday 1 February 15
The last 25 years of commercial DBMS development can be summed up in a single phrase: "one size fits all". This phrase refers to the fact that the traditional DBMS architecture (originally designed and optimized for business data processing) has been used to support many data-centric applications with widely varying characteristics and requirements. In this paper, we argue that this concept is no longer applicable to the database market, and that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end
Sunday 1 February 15
Sunday 1 February 15
Why NoSQL“The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.” Eric Evans - Rackspace ACID does not scaleWeb applications have different needs
Scalability ElasticityFlexible schema/ semi-structured data
Geographically distributedWeb applications do not always need
Transaction
Strong consistencyComplex queries
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Big Data processing engines
MapReduce
Sunday 1 February 15
Sunday 1 February 15
Stream processing
Sunday 1 February 15
Large scale graph processing
Sunday 1 February 15
2012
Sunday 1 February 15
2014
Sunday 1 February 15
Vanilla Hadoop ecosystem
Sunday 1 February 15
Hortonworks data flatform
Sunday 1 February 15
Sunday 1 February 15
Hadoop ecosystem: Microsoft HDinsight
Sunday 1 February 15
BigData & CloudA Match made in heaven?
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Cloud features
Sunday 1 February 15
Data in the Clouds
As estimated by IDC, by 2020, about 40% data globally would be touched with Cloud Computing.
Cloud adoption is accelerating – the amount of data stored in Amazon Web Services (AWS) S3 cloud storage has jumped from 262 billion objects in 2010 to over 1 trillion objects at the end of the first second of 2012.
Sunday 1 February 15
While enterprises often keep their most sensitive data in-house, huge volumes of data such as social media data may be located externally.
It is a fact that data that is too big to process is also too big to transfer anywhere, so it’s just the analytical program which needs to be moved—not the data.
"You don't want to be shipping terabytes and petabytes around,". "Keep the data where it is, and then you move the analytics … to that data."
Sunday 1 February 15
Cloud enables BigDataSome of the first adopters of big data in cloud computing are users that deployed Hadoop clusters in highly scalable and elastic clouds: IBM, Azure, AWS
Cloud computing democratizes big data – any enterprise can now work with unstructured data at a huge scale.Analytics-as-a-service (AaaS) models for cloud-based big data analytics
Sunday 1 February 15
Drivers for big data on cloud adoptionCost reduction
Managing cloud-based big data is cost-effective, scalable, and fast to build.
Rapid provisioning/time to market
Faster provisioning is important for big data applications because the value of data reduces quickly as time goes by.
Flexibility/scalability
Big data analysis, especially in the life sciences industry, requires huge compute power for a brief amount of time. For this type of analysis, servers need to be provisioned in minutes.
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
BigData is not always Cloud-appropriate
Low latency realtime data
Virtualization overhead
Multi-tenancy overhead
Scalability
Lack of cloud computing features to support RDBMS
Availability
“Rain cloud” incorporates clouds
Data integrity/privacy
Data can only be accessed by authorized users
Currently, encryption is utilized by most researchers to ensure data privacy in the cloud
Sunday 1 February 15
NoSQL vs SQL in the Cloud
Sunday 1 February 15
Data security/peformance trade-offs
Distributed nodes
Distributed data
Internode communication
RPC over TCP/IP?
Encrypted IO?
Security/performance trade-offs
Sunday 1 February 15
Cloud Architecture for Big Data
Resource scheduling and SLA for Big Data on CloudStorage and computation management in Cloud for Big Data
Large-scale data intensive workflow in support of Big Data processing on Cloud
Multiple source data processing and integration on Cloud
Virtualisation and visualisation of Big Data on Cloud
Fault tolerance and reliability for Big Data processing on Cloud
MapReduce with Cloud for Big Data processing
Distributed file storage system with Cloud for Big Data
Inter-cloud technology for Big Data
Security, privacy and trust in Big Data processing on Cloud
Green, energy-efficient models and sustainability issues in Cloud for Big Data processing
Cloud infrastructure for social networking with Big Data
User friendly Cloud access for Big Data processing
Innovative Cloud data centre networking for Big Data
Wireless and mobility support in Cloud data centre for Big Data
Sunday 1 February 15
BigData use cases
Sunday 1 February 15
Security Analytics
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Thank you for your attention
Sunday 1 February 15
Sunday 1 February 15
8 big trends in big data analytics
http://www.computerworld.com/article/2690856/8-big-trends-in-big-data-analytics.html
Sunday 1 February 15
Reference
http://www.oracle.com/us/corporate/profit/big-ideas/012314-spasalapudi-2112687.html
https://gigaom.com/2014/10/15/cloud-computing-is-going-to-absorb-your-big-data-workloads-too/
Sunday 1 February 15
Classification of BigData
Sunday 1 February 15
Relationship between Cloud and BigData
Sunday 1 February 15
Sunday 1 February 15
Sunday 1 February 15
Open research issues
Data staging
Distributed storage systems: NoSQL, NewSQL
Data analysis
Data security
Sunday 1 February 15
In theory, Unfortunately, it’s not all good news.
DB administrators don’t have an easy ride. The NoSQL databases that have appeared in the last few years, with their key-value pairs, document stores, and missing schemas,
Sunday 1 February 15
Top Related