Big data and containers

Big Data and Containers

Charles Smith@charles_s_smith

Netflix / Lead the big data platform architecture team

Spend my time / Thinking how to make it easy/efficient to work with big data

University of Florida / PhD in Computer Science

Who am I?

“It is important that we know where we come from, because

if you do not know where you come from, then you don't

know where you are, and if you don't know where you are,

you don't know where you're going. And if you don't know

where you're going, you're probably going wrong.”

Terry Pratchett

Database Distributed Database Distributed Storage

Distributed Processing

Why do we care about containers?

Containers ~= Virtual Machines

Virtual Machines ~= Servers

Lightweight

fast to start

memory use

Secure

Process isolation

Data isolation

Portable

Composable

Reproducible

Everything old is new

Microservices and large architectures

Datastorage(Cassandra, MySQL, MongoDB, etc..)

Operational(Mesos, Kubernetes, etc...)

Discovery/Routing

What’s different about big data.

Data at rest

Data in motion

Customer Facing

Minimize latency

Maximize reliability

Data Analytics

Minimize I/O

Maximize processing

Ship computation to data

The questions you can answer aren’t predefined

Hive/Pig/MR

Presto

Metacat

Metastore

That doesn’t look very container-y(or microservicy-y for that matter)

Datastorage - HDFS (Or in our case S3)

Operational - YARN

Containers - JVM

So what happens when you want to do something else?

But is that really the way we want to approach containers?

What’s different about big data.

Running many different short-lived processes

Efficient container construction, allocation, and movement

Groups of processes having meaning

How we observe processes needs to be holistic

Processes need to be scheduled by data locality(And not just data locality for data at rest)

A special case of affinity (although possibly over time)

but...

We do need a data discovery service.(kind of… maybe… a namenode?)

SELECT

t.title_id,

t.title_desc,

SUM(v.view_secs)

view_history as v

join title_d as t on

v.title_id =

t.title_id

v.view_dateint > 20150101

GROUP BY 1,2;

LOAD LOAD

Discovery

Query Compiler

Query Planner

Metadata

Watcher

Bottom line

Containers provide process level security

The goal should be to minimize monoliths

This isn’t different from what we are doing already

Our languages are abstractions of composable-distributed processing

Different big data projects should share services

No matter what we do, joining is going to be a big problem

Questions?

Big data and containers

Data & Analytics

Transcript of Big data and containers

Big Data Storage and Analytics Platform · Updated Big Data Storage and Analytics Platform 1.0 2017/11/28 Page 19 of 21 deploy individual application packages as Docker containers

Containers (Data Structures)

Big Data Technology Big Data - aakritsubedi9.com.npaakritsubedi9.com.np/files/Big Data Technology.pdf · Big Data Technology Big Data 1"Big data" is a field that treats ways to analyze,

Smart Containers? Smarter Data? Smarter Industry.

Introduction to Big Data, Big Data Processing, and Big ...

Machine Learning for Big Data Analytics: Scaling In with Containers while Scaling Out on Clusters

2016 Big Data For Beginners Understanding SMART Big Data, Data Mining & Data Analytics2016 Big Data for Beginners Understanding SMART Big Data, Data Mining & Data Analytics

Containers: Best Practices & Data Management Services

Druckplan Agenda Tag1aws-de-marketing.s3-eu-central-1.amazonaws.com/Field Marketing/Summit-Berlin-2019...Serverless Containers / Microservices GDPR / Compliance / Security Big Data

· for executive: box big data ussuiu lla:ansnnns1ðxnu big data -big data big -wifñuiaÖ big data big data • hadoop big clouderâ manager hive impala big data 22 airuntju 2559

A Comparative Study of Containers and Virtual …A Comparative Study of Containers and Virtual Machines in Big Data Environment Qi Zhang1, Ling Liu 2, Calton Pu , Qiwei Dou3, Liren

Caterpillar Big Data Infrastructure Big Data, Data ...

Modular data centres in containers - Rittal

Cloud Object Storage | Store & Retrieve Data Anywhere ...s3.amazonaws.com/yourguide-production-assets/companies/brochur… · Workshop Containers Storage Containers • MCS containers

Big Data Analytics on Container-Orchestrated Systems · tem for running all containers. Thanks to companies like Docker, containers are be-coming a de facto standard for software

Data Structures/Containers Overviews Standard Containers plus properties.

Evaluation of Big Data Containers for Popular Storage ...kdas1/papers/AGU_container.pdfEvaluation of Big Data Containers for Popular Storage, Retrieval, and Computation Primitives

Portable Data Containers for Oracle

Big Data meets Big Data

Introduction to Big Data, Big Data Processing, and Big ...cis.csuohio.edu/~sschung/CIS660/Lecture1_IntroBigDataAnalyrics.pdf · What’s Big Data? From Wikipedia: • Big data is