RealTime AdTech reporting & targeting with Apache Apex
-
Upload
ashish-tadose -
Category
Data & Analytics
-
view
44 -
download
8
Transcript of RealTime AdTech reporting & targeting with Apache Apex
Every ad.Every sales channel.Every screen.One platform.
Ashish TadoseSenior Data Architect @ PubMaticApache Apex committer
RealTime AdTechReporting & Targeting with Apache Apex
Agenda
§ AboutPubMatic§ Reporting&Targetingusecases§ ApacheApexoverview§ PubMatic’sStreamingusecaseswithApacheApex§ ComparingApacheApex§ Roadmap
2ProprietaryandConfidential
Confidential & Proprietary
AboutPubMatic
3
About PubMatic
ü PubMaticisaleadingmarketingautomationsoftwarecompanyforpublishers.
ü Throughreal-timeanalytics,yieldmanagement,andworkflowautomation,PubMaticenablespublisherstomakesmarterinventorydecisionsandimproverevenueperformance..
ü DrivesInnovationinAdTech
4ProprietaryandConfidential
5ProprietaryandConfidential
ad impressionsserved daily
bids processeddaily
data processeddaily
data undermanagement
data centeracross geography
40B+
350B+
50TB
10PB6
ScaleThatDrivesResults
General Analytics Dashboard Guaranteed Ad Server - Pacing engine
AdServering- Impression capping report - Floor recommendation
Inventory Discovery
Machine Learning
Reporting & Analytics
Platform
Ad – hoc Reports
Audience Reports
Brand ControlReporting
Usecases
LambdaArchitecture– Velocity&Volume
7
Data
Data Sink Batch Eg :
Hadoop
Batch write , random
read
Real Time e.g. Storm
Random read & write
Query &
Merge
Usecasesforarealtimesolution
ü Real-timereporting- Reportingofcriticalmetricsaroundcampaignmonetization
- Revenue,impression&clickinfo- Aggregatecounters&reporting
ontopNmetrics- LowlatencyqueryingusingKafka
inpub-submodel.
ü Real-timeMonitoring- Alertsondealtracking&monetization- Campaign&dealhealth
ü Real-timeLearning- Usingthelostbidinsightsforpricerecommendations.
ü AllocationEngine- Feedbacktoadservingforguaranteeddelivery&lineitempacing
8ProprietaryandConfidential
AdServer AdServer AdServer AdServer
Kafka Cluster
RealTime reporting data processing
Processing for AdServer Feedback
Confidential & Proprietary
ApacheApexOverview
9
10ProprietaryandConfidential
11ProprietaryandConfidential
12ProprietaryandConfidential
13ProprietaryandConfidential
14ProprietaryandConfidential
15ProprietaryandConfidential
16ProprietaryandConfidential
17ProprietaryandConfidential
Confidential & Proprietary
PubMatic’sstreamingusecasewithApacheApex
18
19ProprietaryandConfidential
ApacheApex@PubMatic
User Browser
AdServer
REST proxy
REST proxy
In prem
AWS Real-time architecture
Kafka Cluster
Kafka Cluster
Client logs
KafkaInput
(Auction logs)
Kafka Input
(Client logs)
CDN(Caching of
logs)
ETL operator ETL operator
Filter Operator Filter Operator
Dimensions Aggregator
Dimensions Aggregator
Dimensions Store
Query Query Result
Kafka Cluster
Auction Logs
Client logs
Middleware
Auction Logs
Client logs
Kafka Messages Kafka Messages
Decompress & Flatten
Decompress & Flatten
Filtered Events Filtered Events
Aggregates
Query from MW
Query Query Results
AdServer
RealTime Dashboard@PubMatic
21ProprietaryandConfidential
22ProprietaryandConfidential
RealTime Dashboard@PubMatic
Confidential & Proprietary
Comparing with other Streaming platforms
23
24ProprietaryandConfidential
25ProprietaryandConfidential
ApexvsSparkStreamingvsFlink
26ProprietaryandConfidential
Apache Apex Spark Streaming Apache FlinkRelease Recentlygraduatedfrom
ApacheincubationSparkStreamingFeb2014Sparkmajor1.0releasesinceJuly2014
Graduatedin2015Major1.0releaseinMarch2016
Commercialsupport DataTorrent DataBricksHortonworksClouderaMapR
dataArtisian
Companiesusing http://apex.apache.org/powered-by-apex.htmlGECapitalOneSilverSpringNetworksPubMaticThreatMetrixFacilitiesSuppliesRoyalBankofCanadaInfosysTechMahindraMammothDataCloudWickSynerzipTrace3LeadFerretTarget
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
LargesetofcompaniesareusingSparkhoweveronlyfewofthemareusingitforSparkStreamingasbelow
AsiaInfoBigIndustriesBaiduFaimdataKelkooLocalyticsOpentable
https://flink.apache.org/poweredby.htmlAlibaba.comBouyguesCapitalOneEricssonKingOttogroupResearchGateZalando
ApexvsSparkStreamingvsFlink
27ProprietaryandConfidential
ApacheApex SparkStreaming ApacheFlinkStreamingmodel Native– eventbasedstream
processingMicro-batching Native- eventdrivenstream
processing
API DeclarativeAPIsLowerlevel compositionalAPI
Declarative- higherorderfunctionsSystemoptimizestopologyitself
Declarative- higherorderfunctionsSystemoptimizestopologyitself
Latency VeryLOW HIGH VeryLOW
Throughput HIGH HIGH HIGH
Query-ableIn-memoryaggregatestore
InmemoryDimensionStoreoperator
NonativesupportCanbeachievedthroughDataFrames - notefficient
FrameworkManageddistributedin-memorystore
Aggregatestoresnapshotting HDHT– good NonativesupportCanbeachievedbysavingsDataFrames inparque format- notefficient
StateBackend holdsin-flightdataintheTaskManager’smemory.
ApacheCommunity OK GOOD GOOD
Resources• PubMatic- https://pubmatic.com/• PubMaticblog- http://pubmaticblog.com/• ApacheApex- http://apex.apache.org/• Subscribetoforums
• Apex- http://apex.apache.org/community.html• DataTorrent -https://groups.google.com/forum/#!forum/dt-users
28ProprietaryandConfidential