Let's Talk Operations! (Hadoop Summit 2014)

13
Let’s Talk Operations! Allen Wittenauer

description

These are the introductory slides I used (in some form or another) for the Let's Talk Operations! sessions for the 2014 Hadoop Summits. No video for this one!

Transcript of Let's Talk Operations! (Hadoop Summit 2014)

Page 1: Let's Talk Operations! (Hadoop Summit 2014)

Let’s Talk Operations!Allen Wittenauer!

Page 2: Let's Talk Operations! (Hadoop Summit 2014)
Page 3: Let's Talk Operations! (Hadoop Summit 2014)

Twitter: @_a__w_ Email: aw @ apache.org!

Page 4: Let's Talk Operations! (Hadoop Summit 2014)

How many individual grids should I have?

Page 5: Let's Talk Operations! (Hadoop Summit 2014)

One big grid

Grid per project

• Pros!• Lower ops overhead!• One location for all data!

• Cons !• Dev and Prod on one

system

• Pros!• Capacity planning per project!

• Cons !• More headcount to maintain!• Multiple copies of data!• Data ingress is a mess

Page 6: Let's Talk Operations! (Hadoop Summit 2014)

Data Center

Production

ETL

Development

Page 7: Let's Talk Operations! (Hadoop Summit 2014)

ETL

Dev Prod

Base ETL Pull

Event FeedsDatabase Feeds

Base ETL Pull

Base ETL PullPost-Processed

Data

Page 8: Let's Talk Operations! (Hadoop Summit 2014)

DC2DC1

Production

ETL

Development

Page 9: Let's Talk Operations! (Hadoop Summit 2014)

How do I solve some common distcp issues?

Page 10: Let's Talk Operations! (Hadoop Summit 2014)

• Common issues!• Version incompatibilities!• Network bandwidth consumption!!

• Some tricks!• Use WebHDFS!

• All modern versions support it!• Read and write in both directions!

• Create a separate queue with hard limits!• Pull from larger, push from smaller

Page 11: Let's Talk Operations! (Hadoop Summit 2014)

Q&A

Allen  Wittenauer  Twitter:  @_a__w_ Email:  aw  @  apache.org  

Page 12: Let's Talk Operations! (Hadoop Summit 2014)

Bonus Slide!

Page 13: Let's Talk Operations! (Hadoop Summit 2014)

20 GB /, ... 200 GB task space (rest) HDFS

• root partitioning !!!!!

• non-root partitioning

5 GB swap 200 GB task space (rest) HDFS