Building a Hadoop Connector

20

description

This presentation was made during the HUG London Meetup: SQL and NoSQL on Hadoop – A look at performance. Speakers: Alex Bordei- Techie Product Manager at Bigstep, Calin Burloiu- Big Data Engineer at Avira and Radu Pastia - Big Data Team Leader at Avira. We worked with Avira to show how much throughput that can be squeezed from a Hadoop connector. Together we have benchmarked Couchdoop for performance and talked about the behavior you can expect and tweaks that can improve the performance of your big data setup. If you have any questions, we will be glad to provide you with any additional information.

Transcript of Building a Hadoop Connector

Page 1: Building a Hadoop Connector
Page 2: Building a Hadoop Connector

pastiaro.wordpress.com

@rpastia

Page 3: Building a Hadoop Connector
Page 4: Building a Hadoop Connector

Building a connector – The Wrong Way

Mapper Reducer

Page 5: Building a Hadoop Connector
Page 6: Building a Hadoop Connector

Building a connector – The Right Way

Mapper ReducerPartitioner

InputSplit

InputFormat

RecordReader

RecordWriter

OutputFormat

Page 7: Building a Hadoop Connector
Page 8: Building a Hadoop Connector
Page 9: Building a Hadoop Connector
Page 10: Building a Hadoop Connector

The InputFormat: From Input to Mapper--range 2014-09-01;2014-09-20

--number_of_mappers 4

2014-09-01 2014-09-022014-09-03

2014-09-04

2014-09-05

… … …

2014-09-06

2014-09-20

2014-09-01

2014-09-02

2014-09-05

.

.

.

Input Split 1

(2014-09-01-A; record A)

(2014-09-01-B; record B)

(2014-09-01-…; record …)

(2014-09-02-A; record A)

(2014-09-02-B; record B)

(2014-09-02-…; record …)

(2014-09-05-A; record A)

(2014-09-05-B; record B)

(2014-09-05-…; record …)

Record Reader 1

Mapper

Page 11: Building a Hadoop Connector
Page 12: Building a Hadoop Connector
Page 13: Building a Hadoop Connector
Page 14: Building a Hadoop Connector
Page 15: Building a Hadoop Connector
Page 16: Building a Hadoop Connector

The InputFormat: From Input to Mapper

--range 2014-09-01;2014-09-20

--number_of_mappers 4

2014-09-01 2014-09-022014-09-03

2014-09-04

2014-09-05

… … …

2014-09-06

2014-09-20

2014-09-01

2014-09-02

2014-09-05

.

.

.

Input Split 1

(2014-09-01-A; record A)

(2014-09-01-B; record B)

(2014-09-01-…; record …)

(2014-09-02-A; record A)

(2014-09-02-B; record B)

(2014-09-02-…; record …)

(2014-09-05-A; record A)

(2014-09-05-B; record B)

(2014-09-05-…; record …)

Record Reader 1

Mapper

Page 17: Building a Hadoop Connector
Page 18: Building a Hadoop Connector
Page 19: Building a Hadoop Connector
Page 20: Building a Hadoop Connector