Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot:...

21
Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar

Transcript of Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot:...

Page 1: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Pinot: Realtime OLAP for 530 Million Users

Presented By: Sushant Raikar

Page 2: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Linkedin

Page 3: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Challenge

100 QPS -> 10000 QPS

Page 4: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Pinot

● Single system for all OLAP needs

● Ability to serve upto 50000QPS

● Flexible

● Near-Realtime data

Page 5: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

History

RDBMS

Page 6: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

History

RDBMS

Cubing

Offline

Page 7: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

History

RDBMS

Cubing

Offline Lambda

Page 8: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

History

RDBMS

Cubing

Offline Lambda

Pinot

Page 9: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Data in Pinot

Page 10: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Architecture

Controller

Page 11: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Architecture

Controller Object Store

Page 12: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Architecture

Controller Object Store

Servers

Page 13: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Architecture

Controller Object Store

Servers

Brokers

Queries

Page 14: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Uploading Data

1. HTTP POST Data Segment to Controller

2. Controller uploads it to persistent object store

3. Controller asks servers to download segment

4. Server downloads data from Object Store

5. Tells Controller that it finished downloading

6. Brokers update routing Tables

Controller Object Store

Servers

Brokers

1 2

3

4

5

6

Page 15: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Running Queries

1. Queries arrive at broker get analyzed

2. Routing table is picked

3. Multiple servers are contacted

4. Servers generate logical and physical query plans

and execute the query.

5. Broker merges data

Routing is important, especially for large clusters

● Multiple routing tables for same table

● Randomly select one routing table for a query

● Load balancingController Object

Store

Servers

Brokers

12

3

4

5

Page 16: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Hybrid Queries

To handle near real time queries

● "Online" Server○ Consumes from Kafka○ Stops based on time or number○ Contacts controller for

consensus ○ Forms indexes similar to offline

● Broker splits query○ Based on Time○ Into offline and online○ Merges result

SELECT SUM(foo) FROM T WHERE date > 'Jul 31'

SELECT SUM(foo) WHERE date > 'Jul 31' AND date < 'Aug 2'

Page 17: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Optimization: Star Tree Index

Page 18: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Optimization: Star Tree Query

Page 19: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Optimization: Star Tree Query

Page 20: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Results

● Single infrastructure for all OLAP needs

● Support for user facing analytical queries○ 50000 Queries per Second

● Query Flexibility○ High throughput, low complexity○ Low throughput, complex queries

● Pluggable Indexing○ Star Tree Index

Page 21: Million Users Pinot: Realtime OLAP for 530tozsu/courses/CS848/W19/presentations/Su… · Pinot: Realtime OLAP for 530 Million Users Presented By: Sushant Raikar . Linkedin. Challenge

Discussion

● Limitations of Pinot?

● How does Pinot differ from in-memory OLAP like Hyper or Hekaton?

● Pinot is becoming Apache, with support from companies like Uber, Slack.

What does Linkedin gain or lose from this move?

● Is Pinot basically druid with better optimizations?

● Comments about the paper