Technology Beyond the Trends: The role of research for applications in Cloud Computing
-
Upload
thiago-mosqueiro -
Category
Business
-
view
42 -
download
0
Transcript of Technology Beyond the Trends: The role of research for applications in Cloud Computing
Technology Beyond the Trends:
The role of research for applications in Cloud Computing
Thiago S Mosqueiro University of California San Diego
thmosqueiro.vandroiy.com
February 3rd, 2017@ Rady School of Management (UCSD)
Jaqueline J Brito University of California San Diego
github.com/jaquejbrito
Download slides at http://bit.ly/2kaeH57
Cloud computing• Commodity hardware is now cheap • There is more data than we can handle
• New technologies are proposed everyday • Easy access to state-of-art analytics & machine learning tools
Leading frameworks
• Processing is divided into two steps: the Map and the Reduce
• Thus, MR is not suitable for Machine Learning or interactive analysis
• However, it is a mature software and widely supported
• Research and user support is vastly available
• Hadoop MapReduce (MR) is a batch-oriented framework
Leading frameworks
• Spark is more flexible, and is functional
• Because of that, it gives much less freedom to Programmers (e.g. Data Scientists) by performing low-level optimizations
• It is also multi-purpose, intended for wide range of use cases (investigative, operational, analytics).
When launched, it was an big hit in the “Data Science”
community
Take home message
• Switching services consumes resources
• And almost all businesses highly depend on tech
• Startups and companies tend to follow trending technologies
• To foster a discussion about these topics, we show a study case in which simply changing from MapReduce to Spark does not give much gain
Example of join
SELECT SUM( S.revenue ), P.Name
FROM sales as S, product as P, costumer as C, date as D
WHERE D.year = 2016 AND C.address NOT LIKE ‘%USA%’ AND S.product_id = P.product_id AND S.costumer_id = C.costumer_id AND S.date_id = D.date_id
GROUP BY P.Name
What was the total revenue per product in 2016 sold to clients outside the USA?
Example of join
SELECT SUM( S.revenue ), P.Name
FROM sales as S, product as P, costumer as C, date as D
WHERE D.year = 2016 AND C.address NOT LIKE ‘%USA%’ AND S.product_id = P.product_id AND S.costumer_id = C.costumer_id AND S.date_id = D.date_id
GROUP BY P.Name
What was the total revenue per product in 2016 sold to clients outside the USA?
Example of join
SELECT SUM( S.revenue ), P.Name
FROM sales as S, product as P, costumer as C, date as D
WHERE D.year = 2016 AND C.address NOT LIKE ‘%USA%’ AND S.product_id = P.product_id AND S.costumer_id = C.costumer_id AND S.date_id = D.date_id
GROUP BY P.Name
What was the total revenue per product in 2016 sold to clients outside the USA?
Example: Sensor data
Huerta et al. Chem Intl Lab Systems, 2016
Picture by Kim S. Mosqueiro (Apr 2015) Huerta et al. Chem Intl Lab Systems, 2016
Comparing MapReduce and Spark
MapReduceSpark
Brito et al. ICCS, 2016
Comparing MapReduce and Spark
MapReduceSpark
Brito et al. ICCS, 2016
Therefore, although Spark’s
functional design is flexible,
it still generates too much
disk spill and shuffled data
Enhancing Spark solution
Brito et al. ICCS, 2016
MapReduce Spark
We employed two different techniques (Bloom Filters and Broadcast) to drop shuffled data and disk spill
Enhancing Spark solution
MapReduce Spark
Brito et al. ICCS, 2016
Resilience of these solutions
Brito et al. ICCS, 2016
In collaboration with
Jaqueline Joice Brito
Cristina Dutra Ciferri Ricardo Rodrigues Ciferri Irene Lujan-Rodriguez Jordi Fonollosa Ramon Huerta