The Stratosphere Platform: Big Data Analytics at...

Partners and Collaborations

Funding Organisations

The Stratosphere Platform: Big Data Analytics at Scale

Open Source Platform for Big Data analytics in massively parallel cluster and cloud environments. Combines key technologies of MapReduce, Compilers, Distributed Systems and Parallel Database Systems.

Enables scalable data management, machine learning and deep analysis of Big Data.

Declarative Analytics- Empower data analysts to use Big Data by unifying data andprogramming models in a declarative abstraction- Cross-optimize data extraction, querying and modeling

Data Streaming- Streaming semantics for advanced analytical functions- Modeling and managing distributed operator state- Optimizing data analysis program workloads

Iterative Processing- Fault tolerance and numerical stability for iterative algorithms- Advanced optimization techniques for iterative jobs

Software Stack

Enterprise Data Hub

Enterprise Cloud

Advanced Big Data Analytics

JDBC Local Files HDFS NoSQL Storage

YARN Direct EC2 Cluster Manager

Stratosphere Runtime

Stratosphere Optimizer

Java API

Scala API

MeteorScripting Language

Hadoop MR

Hive and others

SpargelGraph

Processing

Programming Model/Operators

reducemap crossjoin

cogroupiterateunion iteratedelta

Stratosphere extends the well-known MapReduce model with new operators. These operators represent many common data analysis tasks more naturally and efficiently. All operators will start working in memory and gracefully go out of core under memorypressure.

People with Big DataAnalytics Skills Today

Data Analysts

Systems Experts

Future Big Data Users

Example: KMeans Clustering (Scala) val dataPoints = DataSource(dataPointInput, DelimitedInputFormat(parseInput)) val clusterPoints = DataSource(clusterInput, DelimitedInputFormat(parseInput))

def computeNewCenters(centers: DataSet[(Int, Point)]) = { val distances = dataPoints.cross(centers). map computeDistance val nearestCenters = distances.groupBy { case (pid, _) => pid } .reduceGroup { ds => ds.minBy(_._2.distance) } .map asPointSum tupled val newCenters = nearestCenters groupBy { case (cid, _) => cid } .reduceGroup(sumPointSums) .map { case (cid, pSum) => cid -> pSum.toPoint() } return newCenters } val finalCenters = clusterPoints.iterate(numIterations, computeNewCenters)

Incremental Iterations can exploit sparse computation dependencies without sacrificing dataflow programming abstraction. The performance of incremental iterations in Stratosphere matches that of specialized engines.

Iterative Algorithms

Contact & Further Information

Current and Future Work

More Information:Project: http://stratosphere.euSource Code: http://github.com/stratosphere

Contact Us:

Database Systems and Information Management, Technische Universität Berlin

Participating in the Google Summer of Code 2014

- Cost-based optimizer choice of operators and shipping strategies.- In-memory pipelining of operators- Reduction of shipped and written data volume- Input Sampling to determine cardinalities

Built-In Optimizer

Big Data Analysts Still Hard to Find

e-mail: contact@stratosphere.eu

REDUCEaggregate

lineitem

supplier

output

MAPfilter

MAPproject

MATCHjoin

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 80 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 80 1 2 3 44 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 80 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 80 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 4 5 6 7 80 1 2 3 4 5 6 7 8

REDUCEaggregate

lineitem

supplier

output

MAPfilter

MAPproject

MATCHjoin

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 80 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 80 1 2 3 44 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

REDUCEaggregate

lineitem

supplier

output

MAPfilter

MAPproject

MATCHjoin

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 80 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 80 1 2 3 44 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

0 1 2 3 4 5 6 7 8

0 1 2 3 44 5 6 7 8

Our goal: empower data analysts to focus on their domain problem without requiring systems programming.

Machine Learning,

Statistics, Data analysis M

Applications

Big Data Analytics

Iterative Algorithms

Error Estimation

Active Sampling

Sketches

Curse of Dimensionality

Isolation

Convergence

Monte Carlo

Mathematical Programming

Linear Algebra

Stochastic Gradient Descent

Regression

Statistics

Hashing

Control flow

Parallelization

Query Optimization

Fault Tolerance

Relational Algebra/SQL

Scalability

Data Analysis Languages

Compilers

Memory Management

Memory Hierarchy

Dataflow

Hardware Adaption

Indexing

Resource Management Data Warehouses/OLAP/XQuery

The Stratosphere Platform: Big Data Analytics at...

Documents

Transcript of The Stratosphere Platform: Big Data Analytics at...

Mountain Waves entering the Stratosphere

Mesos: A Platform for Fine-Grained Resource Sharing in the Data … · 2019-02-25 · ations, and shown that Spark can outperform Hadoop by 10x in iterative machine learning workloads.

Synthetic Enterprise Application workloads · Enterprise Application Workloads . Test Profile – IOPS & RT levels of different synthetic workloads . 6 . Composite Enterprise . Workloads

Stratosphere Las Vegas

Stratosphere Hotel - Pegasus...Stratosphere Hotel CASE STUDY STRATOSPHERE HOTEL TRAVEL TRIPPER 370 LEXINGTON AVE SUITE 1601 NEW YORK, NY 10017 T: +1 212 683 6161 HELLO@TRAVERTRIPPER.COMCASE

Big Data Ecosystem & The Stratosphere Project

apesnerds.weebly.comapesnerds.weebly.com/.../climate_change_and_ozone_… · Web viewOzone Layer Depletion in Stratosphere Ozone is found naturally in the stratosphere! GOOD in stratosphere!

Stratosphere in Las Vegas

Release Candiate V 0 - Stratospherestratosphere.eu/assets/slides/stephan_ewen-stratosphere...New Features in a Nutshell • Declarative Scala Programming API • Iterative Programs

Chemistry of the Stratosphere

Troposphere stratosphere mesosphere. What is Ozone? Chemical Formula is O 3 Found in the Stratosphere (good) and Troposphere (bad). Stratosphere ozone.

1. Ozone in the stratosphere

Iterative development and agile methodsse.ewi.tudelft.nl/ti3115tu-2019/resources/02-iterative-development.pdf · Incremental and iterative development process •Iterative: we develop

Observations of Stratosphere-Troposphere Coupling … · Observations of Stratosphere-Troposphere Coupling During Major Solar Eclipses from FORMOSAT-3/COSMIC ... acteristic stratosphere-troposphere

Stratosphere - Next Generation Big Data Analytics Platform from … · 2014-05-14 · The 10 commandments for Big Data Analytics Stratosphere in one slide Stratosphere in one slide

I LI . Stratosphere I L

Temperature variations in the low stratosphere (50–200 hPa) …personalpages.to.infn.it/~ferrares/2008_temperature_variations.pdf · TEMPERATURE VARIATIONS IN THE LOW STRATOSPHERE

Stratellites - Satellites in Stratosphere

Stratosphere Woven Quilt - FreeSpirit Fabrics · Stratosphere collection create a woven rainbow look in this easy quilt. Collection: Stratosphere by FreeSpirit Fabrics Technique:

STRATOSPHERE TOWER Las Vegas