Thomas schreiter Insight

20
Ingestion Comparison Thomas Schreiter Insight Data Engineering Fellow

Transcript of Thomas schreiter Insight

Page 1: Thomas schreiter Insight

Ingestion  Comparison

Thomas  Schreiter Insight  Data  Engineering  Fellow

Page 2: Thomas schreiter Insight

Ingestion  =  Message  Queuing  System

ProducerProducerProducersProducerProducerConsumers

Page 3: Thomas schreiter Insight

Research  Question:  How  fast  can  data  be  produced  into  Kinesis/Kafkaif  all  producers  run  on  only  one  node?

ProducerProducerProducers

1x  m3.medium

Page 5: Thomas schreiter Insight

Throughput  over  #producers

5

Page 6: Thomas schreiter Insight

6

0"

5000"

10000"

15000"

20000"

25000"

30000"

35000"

1" 2" 5" 10" 20" 50" 100" 200" 500"

Throughp

ut)[m

sg/sec])

Bulk)Size)[msg])

Throughput)over)Bulk)Size)

Ka)a"

Kinesis"

Page 7: Thomas schreiter Insight

ProducerProducerProducer.py

ProducerProducerProducer.py4x  m3.large

1x  m3.medium

1x  m3.medium

1  stream

“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …

“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …

Page 8: Thomas schreiter Insight

logger

metrics

ProducerProducerProducer.py

ProducerProducerProducer.py4x  m3.large

1x  m3.medium

1x  m3.medium

1  stream

“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …

“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …

Page 9: Thomas schreiter Insight

logger

metrics

“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …

“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …

ProducerProducerProducer.py

ProducerProducerProducer.py4x  m3.large

1x  m3.medium

1x  m3.medium

1x  m3.medium

1x  t2.micro

1  stream

Page 10: Thomas schreiter Insight

Engineering  Challenges

Install  scripts:  tried  to  automate  everything  ☺

Page 11: Thomas schreiter Insight

Engineering  Challenges

Install  scripts:  tried  to  automate  everything  ☺

broke  Kafka  installation  in  Week  2    ☹

Page 12: Thomas schreiter Insight

Engineering  Challenges

Install  scripts:  tried  to  automate  everything  ☺

broke  Kafka  installation  in  Week  2    ☹

and  again  in  Week  4          ☹ ☹ ☹

Page 13: Thomas schreiter Insight

but  Engineering  puzzles  are  really  fun    

☺☺☺

Page 14: Thomas schreiter Insight

And  I  read  Kafka  for  the  first  time

Page 15: Thomas schreiter Insight

Thomas  Schreiter [[email protected]]  

M.Sc.  +  B.Sc.  in  Computer  Science  @Karlsruhe  Institute  of  Technology,  Germany  Ph.D.  in  Transportation  @Delft  University  of  Technology,  The  Netherlands  

Before  Insight:  Research  Engineer  in  Transportation  @UC  Berkeley

Page 16: Thomas schreiter Insight
Page 17: Thomas schreiter Insight

AWS  Costs

17

Page 18: Thomas schreiter Insight
Page 19: Thomas schreiter Insight

Throughput  over  #partitionsThroughput  [#msg/sec]

0

300

600

900

1200

1  par__on 2  par__ons 3  par__ons 4  par__ons

Ka`aKinesis

Page 20: Thomas schreiter Insight

Older  resultsThroughput  [#msg/sec]

0

500

1000

1500

2000

1  par__on 2  par__ons 3  par__ons 4  par__ons

Ka`aKinesis