Rebuilding Web Tracking Infrastructure for Scale
-
Upload
hadoop-summit -
Category
Technology
-
view
211 -
download
0
Transcript of Rebuilding Web Tracking Infrastructure for Scale
![Page 1: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/1.jpg)
Rebuilding Web Tracking Infrastructure for ScaleStephen OakleyPrincipal EngineerMarketo
![Page 2: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/2.jpg)
What is Marketo?
![Page 3: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/3.jpg)
Page 3Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
What is Web Tracking at Marketo?• Ingest web page visits and clicks on customer’s website• Trigger campaigns in response to web activity• Trigger real-time personalization of web experience• Provide lead level analytics for known leads• Provide aggregate analytics for all lead activity• Typically known leads < 10 % of all traffic
![Page 4: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/4.jpg)
Page 4Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Legacy Web Tracking Infrastructure
![Page 5: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/5.jpg)
Page 5Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Legacy Web Tracking Infrastructure
![Page 6: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/6.jpg)
Page 6Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Legacy Problems• Throughput limitations – 2 million activities per day• Processing delays can be on the order of hours
• Large customers cause web server brownouts• Web reporting does not scale• Fixed-sized clusters prohibit horizontal scaling• Brittle infrastructure prevents feature development
![Page 7: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/7.jpg)
The Vision
![Page 8: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/8.jpg)
Page 8Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Orion Initiative• Increase scale to support IoT for Marketers• Support billions of marketing activities each day• Trigger on activities in near real time (< 2 minute @ 99th %)
• Reduce operational costs• Improve multitenancy and QoS
![Page 9: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/9.jpg)
Requirements
![Page 10: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/10.jpg)
Page 10Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Business Requirements• 200 MM activities per customer per day• Near real-time web activity processing (SLA of < 1
minute lag)• Improve cost efficiency• Improve flexibility for feature enhancements
![Page 11: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/11.jpg)
Page 11Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Technical Requirements• Multitenancy support with brownout protections• Infrastructure must scale horizontally• Decouple web processing from downstream processing• Anonymous leads should cost next to nothing to track
![Page 12: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/12.jpg)
Architecture & Design
![Page 13: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/13.jpg)
Page 13Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
![Page 14: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/14.jpg)
Page 14Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
![Page 15: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/15.jpg)
Page 15Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Why Hbase + Phoenix?• Horizontally scalable• Leverages the Hadoop cluster for storage and scaling• Provides secondary indices for query patterns through
Phoenix• Natural integration with JDBC and Spark JDBC RDDs
![Page 16: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/16.jpg)
Page 16Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
![Page 17: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/17.jpg)
Page 17Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
![Page 18: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/18.jpg)
Page 18Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Why Spark Streaming?• Micro-batching provides sink-side efficiencies• This is especially important with MySQL touchpoints
• Great integration with Kafka • No strict real-time processing requirements• Great community and industry adoption
![Page 19: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/19.jpg)
Page 19Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Multitenancy• One topic per customer (sized by volume)• Traffic storms are isolated to a single customer
• Fairness/throttling is easy to control
• Spark Streaming job consumes from many topics• Allows us to turn a customer off under error conditions
• See “Elastic Streaming” by Neelesh Shastry – Spark Summit
![Page 20: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/20.jpg)
Page 20Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Making Spark Streaming Performant• Coalesce small partitions for the same customer• Aggressive caching of metadata (mostly from MySQL)• Heavily leverage Scala future composition for parallelism• Persist RDDs that are used for multiple outputs• e.g. write to Kafka and Activity Service
![Page 21: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/21.jpg)
Page 21Marketo Proprietary and Confidential | © Marketo, Inc. 05/02/2023
Making Anonymous Traffic Cheap• High costs of web traffic in legacy system• MySQL storage for all traffic• Down streaming processing of all events (even anonymous)
• V2 only processes and stores known traffic in MySQL• Defer triggering for anonymous data until promotion
![Page 22: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/22.jpg)
• Rolled out to our highest volume customers• Processing latencies < 30s (at 99.9th %)• Allowed key customers to scale from ~2MM/day to > 20
MM/day
Impact and Results
![Page 23: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/23.jpg)
• Mitigations of straggler effects on processing delays• Adding sessionization for web reporting• Scaling Kafka topics as customers increase volume• Globally distributed ingestion for a single customer
Future Work
![Page 24: Rebuilding Web Tracking Infrastructure for Scale](https://reader035.fdocuments.us/reader035/viewer/2022070516/586fde251a28ab18428b6af3/html5/thumbnails/24.jpg)
We’re Hiring! Http://Marketo.Jobs
Q & A