CouchbasetoHadoop_Matt_Michael_Justin v4
-
Upload
michael-kehoe -
Category
Documents
-
view
100 -
download
0
Transcript of CouchbasetoHadoop_Matt_Michael_Justin v4
Couchbase to Hadoop at LinkedinKafka is Enabling the Big Data Pipeline
2
• Define Problem DomainJustin Michaels | Solution Architect, Couchbase
• Use case at LinkedInMichael Kehoe | Site Reliability Engineer, Linkedin
• Supporting Technology Overview and DemoMatt Ingenthron | Senior Director, Couchbase
• Q&A
Agenda
4
Lambda Architecture
1
2
3
4
5
DATA
BATCH
SPEED
SERVE
QUERY
5
Lambda Architecture
Interactive and Real Time Applications
1
2
3
4
5
DATA
BATCH
SPEED
SERVE
QUERYHADOOP
COUCHBASESTORM
COUCHBASEBrokerCluster
Spout for Topic
Kafka Producers
Ordered Subscriptions
6
• Hadoop … an open-source framework written for distributed storage and distributed processing of very large data sets on commodity hardware
• Kafka … append only write-ahead log that records messages to a persistent store and allows subscribers to read and apply these changes to their own stores in an appropriate time-frame
• Storm … distributed framework that uses custom created "spouts" and "bolts" to define information sources and manipulations for processing of streaming data
• Couchbase … an open source, distributed NoSQL document-oriented database that is optimized for interactive applications with an integrated data cache and incremental map reduce facility
COMPLEX EVENT PROCESSING
Real TimeREPOSITORY
PERPETUALSTORE
ANALYTICALDB
BUSINESSINTELLIGENCE
MONITORING
CHAT/VOICESYSTEM
BATCHTRACK
REAL-TIMETRACK
DASHBOARD
TRACKING and COLLECTION
ANALYSIS ANDVISUALIZATION
REST FILTER METRICS
Use Case at Linkedin
10
• Site Reliability Engineer (SRE) at LinkedIn• SRE for Profile & Higher-Education• Member of LinkedIn’s CBVT• B.E. (Electrical Engineering) from
the University of Queensland,Australia
Michael Kehoe
• Kafka was created by LinkedIn• Kafka is a publish-subcribe system built as a distributed commit log• Processes 500+ TB/ day (~500 billion messages) @ LinkedIn
Kafka @ LinkedIn
• Monitoring• InGraphs
• Traditional Messaging (Pub-Sub)
• Analytics• Who Viewed my Profile• Experiment reports• Executive reports
• Building block for (log) distributibuted applications• Pinot• Espresso
LinkedIn’s uses of Kafka
Use Case: Kafka to Hadoop (Analytics)
• LinkedIn tracks data to better understand how members use our products
• Information such as which page got viewed and which content got clicked on are sent into a Kafka cluster in each data center
• Some of these events are all centrally collected and pushed onto our Hadoop grid for analy sis and daily report generation
Couchbase @ LinkedIn
• About 25 separate services with one or more clusters in multiple data centers
• Up to 100 servers in a cluster
• Single and Multi-tenant clusters
Use Case: Jobs Cluster
• Read scaling, Couchbase ~80k QPS, 24 server cluster(s)• Hadoop to pre-build data by partition• Couchbase 99 percentile latencies
Hadoop to Couchbase
• Our primary use-case for Hadoop Couchbase is for building (warming) / recovering Couchbase buckets
• LinkedIn built it’s own in-house solution to work with our ETL processes, cache invalidation procedures etc