Technology Beyond the Trends: The role of research for applications in Cloud Computing

32
Technology Beyond the Trends: The role of research for applications in Cloud Computing Thiago S Mosqueiro University of California San Diego thmosqueiro.vandroiy.com February 3rd, 2017 @ Rady School of Management (UCSD) Jaqueline J Brito University of California San Diego github.com/jaquejbrito Download slides at http://bit.ly/2kaeH57

Transcript of Technology Beyond the Trends: The role of research for applications in Cloud Computing

Technology Beyond the Trends:

The role of research for applications in Cloud Computing

Thiago S Mosqueiro University of California San Diego

thmosqueiro.vandroiy.com

February 3rd, 2017@ Rady School of Management (UCSD)

Jaqueline J Brito University of California San Diego

github.com/jaquejbrito

Download slides at http://bit.ly/2kaeH57

Cloud computing• Commodity hardware is now cheap • There is more data than we can handle

• New technologies are proposed everyday • Easy access to state-of-art analytics & machine learning tools

Cloud computing

Leading frameworks

• Processing is divided into two steps: the Map and the Reduce

• Thus, MR is not suitable for Machine Learning or interactive analysis

• However, it is a mature software and widely supported

• Research and user support is vastly available

• Hadoop MapReduce (MR) is a batch-oriented framework

Leading frameworks

• Spark is more flexible, and is functional

• Because of that, it gives much less freedom to Programmers (e.g. Data Scientists) by performing low-level optimizations

• It is also multi-purpose, intended for wide range of use cases (investigative, operational, analytics).

When launched, it was an big hit in the “Data Science”

community

Take home message

• Switching services consumes resources

• And almost all businesses highly depend on tech

• Startups and companies tend to follow trending technologies

• To foster a discussion about these topics, we show a study case in which simply changing from MapReduce to Spark does not give much gain

Example with Star Joins

Star schemas

Star schemas

Example: Business

Example of join

What was the total revenue in 2016 per product sold to clients outside the USA?

Example of join

SELECT SUM( S.revenue ), P.Name

FROM sales as S, product as P, costumer as C, date as D

WHERE D.year = 2016 AND C.address NOT LIKE ‘%USA%’ AND S.product_id = P.product_id AND S.costumer_id = C.costumer_id AND S.date_id = D.date_id

GROUP BY P.Name

What was the total revenue per product in 2016 sold to clients outside the USA?

Example of join

SELECT SUM( S.revenue ), P.Name

FROM sales as S, product as P, costumer as C, date as D

WHERE D.year = 2016 AND C.address NOT LIKE ‘%USA%’ AND S.product_id = P.product_id AND S.costumer_id = C.costumer_id AND S.date_id = D.date_id

GROUP BY P.Name

What was the total revenue per product in 2016 sold to clients outside the USA?

Example of join

SELECT SUM( S.revenue ), P.Name

FROM sales as S, product as P, costumer as C, date as D

WHERE D.year = 2016 AND C.address NOT LIKE ‘%USA%’ AND S.product_id = P.product_id AND S.costumer_id = C.costumer_id AND S.date_id = D.date_id

GROUP BY P.Name

What was the total revenue per product in 2016 sold to clients outside the USA?

Example: Sensor data

Huerta et al. Chem Intl Lab Systems, 2016

Picture by Kim S. Mosqueiro (Apr 2015) Huerta et al. Chem Intl Lab Systems, 2016

Example: Sensor data

Comparing solutions from Spark and

MapReduce

MapReduce’s pipeline

Newjobstarting…

Spark’s pipeline

Spark’s pipeline

In-memorycomputation

Therefore, although Spark’s

functional design is flexible,

it still generates too much

disk spill and shuffled data

Enhancing Spark solution

Brito et al. ICCS, 2016

MapReduce Spark

We employed two different techniques (Bloom Filters and Broadcast) to drop shuffled data and disk spill

Concluding remarks

In collaboration with

Jaqueline Joice Brito

Cristina Dutra Ciferri Ricardo Rodrigues Ciferri Irene Lujan-Rodriguez Jordi Fonollosa Ramon Huerta

We acknowledge support from

FAPESP grant 2012/13158-9

CNPq grant 234817/2014-3

Microsoft Azure grant MS-AZR-0036P

3ª Convocatoria de Proyectos de Cooperacion InteruniversitariaUAM–Banco Santander con EEUU (2015/EEUU/15)