Thomas schreiter Insight
-
Upload
tschreiter -
Category
Documents
-
view
74 -
download
0
Transcript of Thomas schreiter Insight
Ingestion Comparison
Thomas Schreiter Insight Data Engineering Fellow
Ingestion = Message Queuing System
ProducerProducerProducersProducerProducerConsumers
Research Question: How fast can data be produced into Kinesis/Kafkaif all producers run on only one node?
ProducerProducerProducers
1x m3.medium
DEMO ….
Throughput over #producers
5
6
0"
5000"
10000"
15000"
20000"
25000"
30000"
35000"
1" 2" 5" 10" 20" 50" 100" 200" 500"
Throughp
ut)[m
sg/sec])
Bulk)Size)[msg])
Throughput)over)Bulk)Size)
Ka)a"
Kinesis"
ProducerProducerProducer.py
ProducerProducerProducer.py4x m3.large
1x m3.medium
1x m3.medium
1 stream
“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …
“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …
logger
metrics
ProducerProducerProducer.py
ProducerProducerProducer.py4x m3.large
1x m3.medium
1x m3.medium
1 stream
“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …
“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …
logger
metrics
“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …
“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …
ProducerProducerProducer.py
ProducerProducerProducer.py4x m3.large
1x m3.medium
1x m3.medium
1x m3.medium
1x t2.micro
1 stream
Engineering Challenges
Install scripts: tried to automate everything ☺
Engineering Challenges
Install scripts: tried to automate everything ☺
broke Kafka installation in Week 2 ☹
Engineering Challenges
Install scripts: tried to automate everything ☺
broke Kafka installation in Week 2 ☹
and again in Week 4 ☹ ☹ ☹
but Engineering puzzles are really fun
☺☺☺
And I read Kafka for the first time
Thomas Schreiter [[email protected]]
M.Sc. + B.Sc. in Computer Science @Karlsruhe Institute of Technology, Germany Ph.D. in Transportation @Delft University of Technology, The Netherlands
Before Insight: Research Engineer in Transportation @UC Berkeley
AWS Costs
17
Throughput over #partitionsThroughput [#msg/sec]
0
300
600
900
1200
1 par__on 2 par__ons 3 par__ons 4 par__ons
Ka`aKinesis
Older resultsThroughput [#msg/sec]
0
500
1000
1500
2000
1 par__on 2 par__ons 3 par__ons 4 par__ons
Ka`aKinesis