Getting started with amazon kinesis
-
Upload
jampp -
Category
Technology
-
view
142 -
download
0
Transcript of Getting started with amazon kinesis
![Page 1: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/1.jpg)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Warren Paull, Solution Architect, AWSPatricio Rocca, Chief Technology Officer, Jampp
July 2016
Getting Started withAmazon Kinesis
![Page 2: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/2.jpg)
What to expect from this session
• Streaming scenarios• Amazon Kinesis Streams overview• Amazon Kinesis Firehose overview • Firehose experience for Amazon S3 and Amazon Redshift
• Jampp – Our Journey with Amazon Kinesis
![Page 3: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/3.jpg)
Streaming Data Use Cases
AcceleratedIngest-
Transform-Load
Continual Metric
Generation
ResponsiveData
Analysis
1 2 3
![Page 4: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/4.jpg)
Amazon Kinesis Streams
Build your own custom applications that
process or analyze streaming data
Amazon Kinesis Firehose
Easily load massive volumes of streaming data into Amazon S3 and Amazon Redshift
Amazon Kinesis Analytics
Easily analyze data streams using
standard SQL queries
Amazon Kinesis: Streaming data made easyServices make it easy to capture, deliver, and process streams on AWS
In Preview
![Page 5: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/5.jpg)
Amazon Kinesis: Streaming data done the AWS wayMakes it easy to capture, deliver, and process real-time data streams
Pay as you go, no upfront costs
Elastically scalable
Right services for your specific use cases
Real-time latencies
Easy to provision, deploy, and manage
![Page 6: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/6.jpg)
Amazon Kinesis StreamsBuild your own data streaming applications
Easy administration: Simply create a new stream and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming big data using Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.
![Page 7: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/7.jpg)
Reading and Writing with Streams
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client Library+
Connector Library
Apache Storm
Amazon Elastic MapReduce
Sending Consuming
AWS Mobile SDK
KinesisProducerLibrary
AWS Lambda
Apache Spark
![Page 8: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/8.jpg)
Real-time streaming data ingestion
Custom-built streaming applications
Inexpensive: $0.014 per 1,000,000 PUT payload units
Amazon Kinesis StreamsManaged service for real-time processing
![Page 9: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/9.jpg)
We listened to our customers…
![Page 10: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/10.jpg)
Amazon Kinesis FirehoseLoad massive volumes of streaming data into Amazon S3 and Amazon Redshift
Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure.
Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations.
Seamless elasticity: Seamlessly scales to match data throughput w/o intervention.
Capture and submit streaming data to Firehose
Firehose loads streaming data continuously into S3 and
Amazon Redshift
Analyze streaming data using your favorite BI tools
![Page 11: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/11.jpg)
AWS Platform SDKs Mobile SDKs Kinesis Agent AWS IoT
Amazon S3
Amazon Redshift
• Send data from IT infra, mobile devices, sensors • Integrated with AWS SDK, agents, and AWS IoT• Batch, compress, and encrypt data before loads• Loads data into Amazon Redshift tables by using
the COPY command
• Pay-as-you-go: 3.5 cents / GB transferredAmazon Kinesis Firehose
Capture IT and app logs, device and sensor data, and more Enable near-real time analytics using existing tools
AmazonElasticSearch
![Page 12: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/12.jpg)
1. Delivery stream: The underlying entity of Firehose. Use Firehose bycreating a delivery stream to a specified destination and send data to it.• You do not have to create a stream or provision shards.• You do not have to specify partition keys.
2. Records: The data producer sends data blobs as large as 1,000 KB to adelivery stream. That data blob is called a record.
3. Data Producers: Producers send records to a delivery stream. Forexample, a web server that sends log data to a delivery stream is a dataproducer.
Amazon Kinesis Firehose Three simple concepts
![Page 13: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/13.jpg)
Amazon Kinesis Firehose console experience Unified console experience for Firehose and Streams
![Page 14: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/14.jpg)
Amazon Kinesis Firehose console (S3) Create fully managed resources for delivery without building an app
![Page 15: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/15.jpg)
Amazon Kinesis Firehose console (S3) Configure data delivery options simply using the console
![Page 16: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/16.jpg)
Amazon Kinesis Firehose console (Amazon Redshift)Configure data delivery to Amazon Redshift simply using the console
![Page 17: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/17.jpg)
Amazon Kinesis agentSoftware agent makes submitting data easy
• Monitors files and sends new data records to your delivery stream• Handles file rotation, check pointing, and retry upon failures• Preprocessing capabilities such as format conversion and log parsing• Delivers all data in a reliable, timely, and simple manner• Emits Amazon CloudWatch metrics to help you better monitor and
troubleshoot the streaming process• Supported on Amazon Linux AMI with version 2015.09 or later, or Red Hat
Enterprise Linux version 7 or later; install on Linux-based server environments such as web servers, front ends, log servers, and more
• Also enabled for Streams
![Page 18: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/18.jpg)
Amazon Kinesis Firehose or Amazon Kinesis Streams?
![Page 19: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/19.jpg)
Amazon Kinesis Streams is a service for workloads that requirescustom processing, per incoming record, with sub-1-secondprocessing latency, and a choice of stream processing frameworks.
Amazon Kinesis Firehose is a service for workloads that requirezero administration, ability to use existing analytics tools basedon S3, Amazon Redshift and Amazon Elasticsearch with datalatency of 60 seconds or higher.
![Page 20: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/20.jpg)
Amazon Kinesis Analytics
![Page 21: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/21.jpg)
Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL
Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.
Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies.
Scale elastically: Elastically scales to match data throughput without any operator intervention.
Connect to Amazon Kinesis streams, Firehose delivery
streams
Run standard SQL queries against data streams
Amazon Kinesis Analytics can send processed data to analytics tools so you can create alerts and respond in real time
In Preview
![Page 22: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/22.jpg)
Patricio Rocca | July 2016
Jampp Padawans journey with Kinesis
![Page 23: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/23.jpg)
About Jampp
We are a tech company that helps companies grow their mobile business by driving engaged users to their apps
Machine learning
Post-install event optimisation
Dynamic Product Ads and Segments
Data Science
Programmatic Buying
We are a team of 70 people, 30%in the engineering team.
Located in 6 cities across the US, Latin America, Europe and Africa
![Page 24: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/24.jpg)
About Real-Time Bidding
• Ad Impressions are available through an Exchange
• Demand Platforms have to bid in less the 100 ms
• The highest bid wins the impressions and shows the ad!
We do this 220,000 times per second
![Page 25: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/25.jpg)
Bid
Real-Time Bidding Workflow
Auction Win
Exchange Exchange
Publisher Publisher
Jampp Bidder
Jampp Machine Learning
Jampp Engagement
Segments Builder
IMPRESSION ;-)PLACEHOLDER
![Page 26: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/26.jpg)
Real-Time Tracking Workflow
In-App Event
TrackingPlatform
Jampp Client
Application
Jampp NodeJSTrackingPlatform
![Page 27: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/27.jpg)
• Build a retargeting platform that generates groups of users based on their in-app activity and a look-a-like machine learning model
• Process and enrich in-app events in less than 5 minutes to target users when they become dormant
• Build a scale-on-demand platform that lets our business grow without pain• Increase the platform’s monitoring, logging and alerting capabilities
Also…
Non tech people should be able to query granular data and aggregate it over large periods
Business Challenges
![Page 28: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/28.jpg)
• 700M events/300GB per day
• 1500% in-app events growthYoY
• Growth factor peaks out of tech team control since it depends on the sales team pacing ;-)
Data Scale
![Page 29: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/29.jpg)
Start your engines!
![Page 30: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/30.jpg)
The Phantom Menace (initial architecture)
![Page 31: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/31.jpg)
Cost Savings
• Kafka supports higher throughput and lower latency and there are tons of successful implementation cases but few are managing similar volume like Jampp
• EBS allocation per topic was hard to size correctly, tune and scale “on demand”
• Kafka maintainability required dedicated manpower
• Kafka security configuration was not flexible enough to add secured producers outside of the VPC
$ 2,848 $ 936
Tempted by the Dark Side
![Page 32: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/32.jpg)
A New Hope (final architecture)
![Page 33: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/33.jpg)
• Invested several days picking the partition key for evenly distributing data across shards
• Encoding protocol matters! Performed several benchmarks and MessagePack offered the best trade off between compression and serialization speed factor
Jedi Trial I
![Page 34: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/34.jpg)
• Write/Read Batching to reduce the HTTPS protocol overhead and costs
• Exponential backoff + Jitter to reduce the impact of in-app events bursts sent by the tracking platforms
• Increased Data Retention Period from 1 day (default) to 3 days on the raw data streams
Jedi Trial II
![Page 35: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/35.jpg)
• Firehose real time data-ingestion to S3and auto scaling capabilities flushes the data to S3/EMR cluster faster than ever letting our machine learning platform re-calculate user retargeting segments with higher frequency
• Encryption is a key success factor since we manage sensitive data contained on the in-app events
Jedi Trial III
![Page 36: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/36.jpg)
• EMR Cluster simplifies our data processing
• Spark ETLs are executed by Airflow, to enrich data, de-normalize and convert JSON to Parquet.
• ML predicts user conversion and separates users based on it. This process is implemented as a Python app that queries event data stored in Parquet files through PrestoDB
Jedi Trial IV
![Page 37: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/37.jpg)
• Airpal queries PrestoDB and simplifies access to data for non technical people
• Jupyter notebooks are used as templates to build frequently used queries and automate common analysis tasks
• Spark Streaming for real-time anomaly detection and fraud prevention
• Multiple Clusters (according to SLAs)
Jedi Trial V
![Page 38: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/38.jpg)
Jedi Knighting (from Padawan to Jedi Knight)
• Time is money
• Shards Read/Write limits... test your data volume first!
• Shard-based provisioning throughput let you scale on demand
• Exponential backoff + Jitter
• Batch and compress will save you tons of headaches and money
• Extended Data Retention pays off
• Kinesis helps you make the data pipeline much more reliable
• Kinesis + Lambda + Dynamo + EMR = <3
![Page 39: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/39.jpg)
Thanks!
geeks.jampp.com
![Page 40: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/40.jpg)
Please remember to rate this session under My Agenda on
awssummit.london
![Page 41: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/41.jpg)
http://blogs.aws.amazon.com/bigdata/
Thank You
![Page 42: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/42.jpg)
Appendix
![Page 43: Getting started with amazon kinesis](https://reader030.fdocuments.us/reader030/viewer/2022033102/586fe2c01a28ab18428b7cd3/html5/thumbnails/43.jpg)
Scenarios Accelerated Ingest-Transform-Load
Continual Metrics Generation
Responsive Data Analysis
Data Types IT logs, applications logs, social media / clickstreams, sensor or device data, market data
Ad/ Marketing Tech
Publisher, bidder data aggregation
Advertising metrics like coverage, yield, conversion
Analytics on userengagement with ads, optimized bid / buy engines
IoT Sensor, device telemetry data ingestion
IT operational metrics dashboards
Sensor operational intelligence, alerts, and notifications
Gaming Online customer engagement data aggregation
Consumer engagement metrics for level success, transition rates, CTR
Clickstream analytics, leaderboard generation,player-skill match engines
ConsumerEngagement
Online customer engagement data aggregation
Consumer engagement metrics like page views, CTR
Clickstream analytics, recommendation engines
Streaming data scenarios across segments1 2 3