Data Storage Infra at LinkedIn
Transcript of Data Storage Infra at LinkedIn
![Page 1: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/1.jpg)
Data Storage Infra at LinkedIn
Yan YanStaff software engineer
![Page 2: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/2.jpg)
Today’s Agenda
1. LinkedIn Overview
2. Data Infra at LinkedIn
3. Espresso – Distributed Document Store
4. Ambry – Distributed Object Store
5. Venice – Derived Data Platform
6. Summary
![Page 3: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/3.jpg)
546 million users> 100 million MAU
Over 200 countries
![Page 4: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/4.jpg)
ADVANCE MY CAREER
Get the right job
![Page 5: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/5.jpg)
ADVANCE MY CAREER
Build meaningfulrelationships
![Page 6: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/6.jpg)
ADVANCE MY CAREER
Establish & manage my reputation
![Page 7: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/7.jpg)
ADVANCE MY CAREER
Research & contact people
![Page 8: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/8.jpg)
ADVANCE MY CAREER
Stay well informed
![Page 9: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/9.jpg)
Data Infra at LinkedIn
![Page 10: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/10.jpg)
Common Data Patterns
Activity that should be reflected immediately
Online
Activity that should be reflected “soon”
Nearline
ETL processing –generally updated in
batches
Offline
![Page 11: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/11.jpg)
UI
Business Service Layer
Data Service LayerEvent Buffer
OfflineStorage
Online Data Storage
StreamingPipeline
Offline Pipeline ETL
Nearline Data Storage
CDC
Data Analytics Platform
![Page 12: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/12.jpg)
Espresso: Distributed Document StoreL i n k e d I n O n l i n e D a t a S o l u t i o n
![Page 13: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/13.jpg)
Why Espresso
Scalability vs Features-set
GAP• Consistency is important
• K-V model does not align with full application needs
• Full Oracle data model and query complexity not needed
• Build a new data storage which is consistent, scalable, indexed, richer than K-V
Oracle
Scalability
Feat
ure
Set
Voldemort
GapESPRESSO
![Page 14: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/14.jpg)
Espresso’s Design Goals
• Scalable and elastic
• Read after write Consistency
• Structure data with schema
• Secondary indies
• Transactional updates to inter-related data
• Multi-datacenter support
• Seamless integration with nearline and offline systems
![Page 15: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/15.jpg)
Espresso - Architecture
Client
RouterRouterRouter
Storage Node
API Server
MySQL ZK
Helix
Kafka Data Replicator
Storage Node
API Server
MySQL
Storage Node
API Server
MySQL
SnapshotService
BackupStorage Streaming
Process
Remote Data Center
Hadoop
Offline Data Center
OnlineNear line
Offline
Control
Data Center
![Page 16: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/16.jpg)
Espresso – Rest-ful API
Get GET /database/table/resource_id
Create PUT /database/table/resource_id {record}
Update POST /database/table/resource_id {field:value}
Hierarchy Get GET /database/table/resource_id/sub-resource_id
Query GET /database/table/resource_id?query=“field:pattern”
![Page 17: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/17.jpg)
Transactional Updates
Update records sharing the sameresource_id in different tables• Multipost
• /database/table1/id1 {field:value}
• /database/table2/id1/sub-id1 {field:value}
![Page 18: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/18.jpg)
Espresso – MySQL Mapping
Espresso DB
Table1
Table2
es_identity_1Table1
Table2
es_identity_2
es_identity_3
……
MySQLInstance1
es_identity_4Table1
Table2
es_identity_5
es_identity_6
……
MySQLInstance2
Distribute bypartition key
![Page 19: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/19.jpg)
Espresso – Data Distribution
Node 1
P1 P2
P3 P4
Node 3
P1 P2
P3 P4
Node 2
P1 P2
P3 P4
Master
Slave
Offline
P4:Master: Node1Slave: Node2
Node3
Node1Node2Node3
Helix
Live instances
External view
![Page 20: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/20.jpg)
Espresso – Cluster Expansion
Node1
P1 P2
P3 P4
Node3
P1 P2
P3 P4
Node2
P1 P2
P3 P4
P1:Master: Nod1Slave: Node2
Node3Offline: Node4
Node1Node2Node3Node4
Helix
Live instances
External view
Node4
P1 P2
P3
![Page 21: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/21.jpg)
Espresso – Node Failover
Node1
P2
P3 P4
Node3
P1
P3 P4
Node2
P1 P2
P4
P1:Master: Nod4Slave: Node2
Node3
Node1Node2Node3Node4
Helix
Live instances
External view
Node4
P1 P2
P3
![Page 22: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/22.jpg)
Espresso – Node Failover
Node1
P2
P3 P4
Node3
P1
P3 P4
Node2
P1 P2
P4
P1:Master: Nod4Slave: Node2
Node3
Node1Node2Node3Node4
Helix
Live instances
External view
Node4
P1 P2
P3
![Page 23: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/23.jpg)
Espresso – Node Failover
Node1
P2
P3 P4
Node3
P1
P3 P4
Node2
P1 P2
P4
P1:Master: Nod4Slave: Node2
Node1Node2Node4
Helix
Live instances
External view
Node4
P1 P2
P3
![Page 24: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/24.jpg)
Espresso – Node Failover
Node1
P2
P3 P4
Node2
P1 P2
P4
P1:Master: Nod4Slave: Node2
Node1Node2Node4
Helix
Live instances
External view
Node4
P1 P2
P3
P1
P4
P3
Kafka
![Page 25: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/25.jpg)
Ambry: Distributed Object StorageL i n k e d I n O n l i n e D a t a S o l u t i o n
![Page 26: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/26.jpg)
Object Storage Use Cases
Image, video, audio
Media
Docs, spreadsheets,slides
Documents
Database backup
Backup
JS, CSS, template
Static content
![Page 27: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/27.jpg)
Before Ambry
Media Server• Monolithic
• Not scalable
• No full control
• Expensive
![Page 28: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/28.jpg)
Ambry
Distributed object storage system• Immutable blobs
• Geo-distributed, horizontally scalable
• Unstructured data
• Multi-master
• Cost effective
![Page 29: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/29.jpg)
Ambry - Architecture
AmbryClient CDNs Http
Client
Http service
Routing service
Http service
Routing service
Http service
Routing service
ClusterManager
StorageSerivceStorageSerivceStorageService
AmbryClient CDNs Http
Client
Http service
Routing service
Http service
Routing service
Http service
Routing service
ClusterManager
StorageSerivceStorageSerivceStorageService
DataCenter1
DataCenter2
Cross-DCReplication
![Page 30: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/30.jpg)
Ambry - PUT Operation
Http service
Routing service
Client
StorageService
StorageService
StorageService
1
3
43
44
3
5
2
1. PUT data
2. Choose partition and Generateblob id
3. Write data to 3 replicas
4. Wait for at least 2 nodes torespond successfully
5. Reply blob id to client
![Page 31: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/31.jpg)
Ambry - GET Operation
Http service
Routing service
Client
StorageService
StorageService
StorageService
1
3
43
5
2
1. GET blob_id
2. Choose partition based onblob_id
3. Read from 2 replicas
4. Wait at least 1 node’s successfulresponse
5. Reply data to client
![Page 32: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/32.jpg)
Ambry - Large Blob PUT Operation
Large blob
blob1 blob2 … blobNMeta blob
blob_id1, blob_id2 … blob_idN
StorageService
StorageService
StorageService
StorageServiceStorageService
StorageServiceStorageService
StorageServiceStorageService
StorageServiceStorageServiceStorageService
Routing service
Client
Storage Service
Blob_id
1
2 3 N+1N+2
N+3
![Page 33: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/33.jpg)
Replication
• Multi-master replication
• Asynchronous
• Pull based
• Inter-colo and cross-coloreplication
![Page 34: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/34.jpg)
Ambry –Replication
…… BlobId:50
BlobId:30
BlobId:70
BlobId:40
640 700 770 850 900
Log
Offset Blob id
640 50
700 30
770 70
850 40
Journal
![Page 35: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/35.jpg)
Ambry –Replication
…… BlobId:50
BlobId:30
BlobId:70
BlobId:40
640 700 770 850 900
Node 1
…… BlobId:50
BlobId:30
BlobId:80
BlobId:90
640 700 770 890 940
Node 2
BlobsFrom Offset700
Blob Ids:30, 80, 90
Blob data:80,90
Get blobs80, 90
BlobId:80
BlobId:90
![Page 36: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/36.jpg)
Ambry –Replication
…… BlobId:50
BlobId:30
BlobId:70
BlobId:40
640 700 770 850 900
Nod1
…… BlobId:50
BlobId:30
BlobId:80
BlobId:90
640 700 770 890 940
Node2
BlobsFrom Offset700
Blob Ids:30, 70, 40, 80, 90
Blob data:70, 40
Get blobs70, 40
BlobId:80
BlobId:90
BlobId:70
BlobId:40
![Page 37: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/37.jpg)
Venice: Derived Data PlatformL i n k e d I n N e a r l i n e + O f f l i n e D a t a S o l u t i o n
![Page 38: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/38.jpg)
Kinds of Data
• Source of Truth
• Example use case:
• Profile
• Example systems:
• SQL
• Document Stores
• K-V Stores
Primary Data Derived Data
• Derived from computing primary data
• Example use case:
• People You May Know
• Example systems:
• Search Indices
• Graph Databases
• K-V Stores
![Page 39: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/39.jpg)
Derived Data Lifecycle Today
Apps
Events Buffer
Offline Storage
Batch Jobs
Online Storage
StreamProcessing
![Page 40: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/40.jpg)
Lambda Architecture + Venice
StreamProcessing
BatchProcessing
App
Kafka
Hadoop
![Page 41: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/41.jpg)
FeaturesVenice
• Dataset versioning
• High throughput ingestion fromHadoop and Samza
• Automatic cluster management
• Multi-DC, Multi-Cluster, Multi-Tenant
• Run as a service
![Page 42: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/42.jpg)
Venice Data Model
Store AVersion 3
Partition 2
……
Store B
……
Partition 1
R1 R2 R3
V2
V1
• Store
• Version
• Partition
• Replica
• Record
• Avro
![Page 43: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/43.jpg)
StorageNode
Venice Components
Router
Samza
Client
Push JobHadoop
Controller
Push data flow
Metadata operation
Read data flow
![Page 44: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/44.jpg)
Venice Batch Mode
StorageNode
Hadoop
Azkaban Job
Map
Reduce
StorageNodeStorageNode
StorageNodeStorageNodeStorageNode
VeniceController
VeniceController
VeniceController
Kafka Cluster
......4
2 31
5 5
6
VeniceController
VeniceController
VeniceRouter
8
7
![Page 45: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/45.jpg)
Venice Version Swapping
RouterStorev7
Data Source Kafka Topics Venice Processes
Hadoop Storev8
Storev6
Push Job
![Page 46: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/46.jpg)
Venice Version Swapping
RouterStorev7
Data Source Kafka Topics Venice Processes
Hadoop Storev8
Storev6
Push Job
![Page 47: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/47.jpg)
Venice Hybrid Mode
• Merge batch and streaming data
• Minimize application complexity
• Multi-version support
Goals Write-time merge
• Hadoop writes into store-version topics
• Samza writes into a Real-Time Buffer topic (RTB)
• The RTB gets replayed into store-version topics
![Page 48: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/48.jpg)
Venice Hybrid Mode
RouterSamza Storev7
Data Sources Kafka Topics Venice Processes
Hadoop Storev8
Push Job
![Page 49: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/49.jpg)
Venice Hybrid Mode
RouterSamza Storev7
Data Sources Kafka Topics Venice Processes
Storev8
Hadoop
![Page 50: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/50.jpg)
Summary
• Document store
• Online data
• Get/Put/Transactional
• Expansion and failover
Espresso Ambry Venice
• Object store
• Online immutable data
• Get/Put/Large blob PUT
• Multi-master replication
• K-V store
• Derived data
• Get/Push
• Batch + streaming
![Page 51: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/51.jpg)
Learn more: engineering.linkedin.com/blog
![Page 52: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/52.jpg)
Back up slides
![Page 53: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/53.jpg)
Online Data
• Member Profile Update
• Post to a Group
• Social Gestures (Comment/Like/Share)
![Page 54: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/54.jpg)
Nearline Data
• Standardization
• Search Index Update
• Network Update Stream
![Page 55: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/55.jpg)
Offline Data
• People You May Know
• Who Viewed My Profile
• Jobs You May Be Interested In
![Page 56: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/56.jpg)
Why Espresso
• Difficult/expensive to run at Internet scale
• Structured data schema
• Strong consistency support
Oracle Voldemort
• Simpler data model (K,V)
• Write availability
• Eventual Consistency
• Scales well and cheaply
![Page 57: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/57.jpg)
Espresso - Architecture
![Page 58: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/58.jpg)
Espresso – Cross-DC Replication
StorageNode
ClientClientRouter
StorageNode
StorageNode
Kafka ClusterData Replicator
StorageNode
ClientClientRouter
StorageNode
StorageNode
Kafka ClusterData Replicator
Datacenter 1 Datacenter 2
![Page 59: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/59.jpg)
Cross-DC Replication
• Boomerang elimination
• Conflict resolution• Last write wins
• Unique id generation• User-selectable options
• Data consistency checker
![Page 60: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/60.jpg)
Ambry – Data Distribution
P1 P2 P3
Node1
P3 P1 P2
Node2
P2 P3 P1
Node3
Partit ion Status
1 Read-write
2 Read-write
3 Read-write
![Page 61: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/61.jpg)
Ambry – Cluster Expansion
P1 P2 P3
Node1
P3 P1 P2
Node2
P2 P3 P1
Node3
Partit ion Status
1 Read-only
2 Read-only
3 Read-write
4 Read-write
5 Read-write
6 Read-write
P4 P5 P6
Node4
P6 P4 P5
Node5
P5 P6 P4
Node6
![Page 62: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/62.jpg)
Index Segment3
Index Segment3
Ambry –Storage Layout
…… BlobId:50
BlobId:30
BlobId:70
BlobId:40
400GB640 700 770 850 900
start offset in current index segment
log end offset
blob id offset TTL
id 30 700 ∞
id 40 850 1/1/16
id 70 770 ∞
Sorted byblob id
Index
Log
Start offset: 700End offset: 900
Index segment1
Index segment2Index segment3
![Page 63: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/63.jpg)
Storage Optimization
• O(1) I/O for writes
• Bloom filter for index segments
• Reply on OS page cache
• Zero copy for gets
![Page 64: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/64.jpg)
Read/Write APIVenice
• Derived data K-V store• Single Get
• Batch Get
• High throughput ingestion from:• Hadoop
• Samza
• Or both (hybrid)
![Page 65: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/65.jpg)
ScaleVenice
• Large scale• Multi-Datacenter
• Multi-Cluster
• Run “as a service”• Self-service onboarding
• Each cluster is multi-tenant
• Resource isolation
![Page 66: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/66.jpg)
TradeoffsVenice
All writes go through Kafka• Scalable
• Burst tolerant
• Asynchronous
• No native “read your writes” semantics
![Page 67: Data Storage Infra at LinkedIn](https://reader035.fdocuments.us/reader035/viewer/2022081419/62a0bc8d3653f628202199e9/html5/thumbnails/67.jpg)
Global Replication
Push Job
Controller
HadoopMirror Maker
Parent Controller
Datacenter Boundary
Storage Nodes
Mirror Maker
…
Mirror Maker
…