(MBL314) Build World-Class Cloud-Connected Products: Sonos

48
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mark Morganstern, Manager, Test Engineering Devon Lazarus, Software Engineering October 2015 MBL314 Building World-class, Cloud-Connected Products How Sonos Leverages Amazon Kinesis

Transcript of (MBL314) Build World-Class Cloud-Connected Products: Sonos

Page 1: (MBL314) Build World-Class Cloud-Connected Products: Sonos

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Mark Morganstern, Manager, Test Engineering

Devon Lazarus, Software Engineering

October 2015

MBL314

Building World-class, Cloud-Connected Products

How Sonos Leverages Amazon Kinesis

Page 2: (MBL314) Build World-Class Cloud-Connected Products: Sonos

What to Expect from the Session

• What is Sonos?

• Sonos Data Pipeline V1

• Sonos Data Pipeline V2

• Transition planning and execution

• Takeaways and future ideas

Page 3: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos 3

What is Sonos?

Sonos is the smart

speaker system that

streams all your

favorite music to any

room, or every room.

Page 4: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos 4

What is Sonos?

Control your music with

one simple app, and fill

your home with pure,

immersive sound.

Page 5: (MBL314) Build World-Class Cloud-Connected Products: Sonos

5Sonos

A plethora of data sources

Page 6: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Firmware device logs

Application telemetry

Music service usage metrics

Cloud applications logs

Performance Indicators

Where does all this wonderful data come from?

Page 7: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Manufacturing tests and yields

Diagnostics

Customer support

Sales and marketing data

Diagnostics

Where does all this wonderful data come from?

Page 8: (MBL314) Build World-Class Cloud-Connected Products: Sonos

A note on privacy

We strive to provide the best experience possible for our

customers through the analysis of usage data, however we

also respect our customer’s right to privacy.

We only collect usage data from the households that OPT-

IN to provide the data.

Page 9: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V1

Page 10: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V1

Design goals

• Provide visibility into music service usage

• Secure, robust pipeline to minimize data loss

• Downstream processing should not affect data ingestion

Page 11: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V1

Collect Store Process Consume

Data Collector Initial SQS queue SQS queues Visualization

Page 12: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V1 Results

• Music service usage dashboards

Page 13: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V1 Results• Insight into the health of the music services on Sonos

Page 14: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V1

Challenges:

• Increased visibility of data throughout the company

• New data types required additional development

• Unable to reprocess the data after initial ingestion

• Costs became an obstacle to gathering more data

Page 15: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Page 16: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Design goals

• Move from aggregate reporting to event-based reporting

• Accept any type of data (text, binary, JSON, XML)

• Secure storage of raw data

• Simplify the pipeline and reduce costs

Page 17: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Bottom line:

We needed to be able to handle orders of magnitude more

throughput, by the end of 2015, with guaranteed delivery

and storage, near-linear scalability, under a sustainable

cost model.

Page 18: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

Data Collector Initial SQS queue SQS queues Visualization

Page 19: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

Data Collector Initial SQS queue SQS queues Visualization

Page 20: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

Collection service Storage service Processing engines Visualization

Page 21: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

• Decouple collection from storage and processing

• Optimize for raw throughput and scale

• Amazon Kinesis vs. Kafka

• Amazon Kinesis Producer Library vs.

AmazonKinesisAsyncClient

• Netty 4

Page 22: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Amazon Kinesis

• Max 1 MB message size

• Streams/partition keys

• 24-hour retention

• REST API/KPL

• Replication across 3 AZs

• AWS managed service

Sonos Data Pipeline V2

Kafka

• Configurable (default 1 MB)

• Topics/partition keys

• Configurable based on storage

• REST/low-level API

• Configurable• Sync/ACK within AZ

• Async across regions

• Self-hosted and managed

Page 23: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

Collection service Storage service Processing engines Visualization

Page 24: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

• Decouple storage from collection and processing

• Increase security of raw data

• Amazon S3 vs. Cassandra, HDFS, etc.

• Amazon Kinesis Consumer Library vs. Amazon Kinesis SDK

Page 25: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Implementing a ‘data lake’

• Disparate operational systems forward data in their own format

• Formats/schemas can change at any time

• Stores any data type in raw format

• Typically very large stores with a schemaless structure

• It is up to the “consumer” to know what they’re looking for

Sonos Data Pipeline V2

Page 26: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Amazon Kinesis Consumer

Library

• Java API

• Lease/shard management

• Payload aggregation

Sonos Data Pipeline V2

AmazonKinesisAsyncClient (SDK)

• Java API

• Developer’s choice

• Self-implemented

Page 27: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

# ReponoRecordProcessor.java

public class ReponoRecordProcessor implements IRecordProcessor {

...

@Overridepublic void processRecords(List<Record> records, IRecordProcessorCheckpointer chkptr) {

...

for (Record record : records) {bufferRecord(data, record);

}if (buffer.shouldFlush()) {

emit(chkptr, buffer.getRecords());

}

}

...

}

Page 28: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

Collection service Storage service Processing engines Visualization

Page 29: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

• Decouple processing from collection and storage

• Allow for flexibility in processing tool chain

• Apache Spark

• Support any ‘consumer’

Page 30: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

Collection service Storage service Processing engines Visualization

Page 31: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

• Decouple from collection and processing

• Allow for self-service

Page 32: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Collect Store Process Consume

Collection service Storage service Processing engines Visualization

Page 33: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Sonos Data Pipeline V2

Results:

• Increased traceability and better consistency across

pipeline driven by '1 source of truth'

• Self-service pipeline

• Linear scalability backed by EC2, Amazon Kinesis, and

Amazon S3

• 20x reduction of costs in the overall pipeline

Page 34: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning & execution

Page 35: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

Collect Store Process Consume

Houston? We have a problem…

Page 36: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

Collect Store Process Consume

Page 37: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

Enter kinesis-log4j-appender

https://github.com/awslabs/kinesis-log4j-appender

Page 38: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

# InitialHandler.java

public class InitialHandler implements ChannelUpstreamHandler {

private static final Logger LOGGER = Logger.getLogger(InitialHandler.class);

...

@Overridepublic void handleUpstream(ChannelHandlerContext ctx, ChannelEvent event) {

...

LOGGER.trace(message);

...

}

...

}

# log4j.properties

log4j.logger.com.sonos.InitialHandler=TRACE, KINESIS_FILE, KINESISlog4j.additivity.com.sonos.InitialHandler=false

log4j.appender.KINESIS.layout=org.apache.log4j.PatternLayoutlog4j.appender.KINESIS.layout.ConversionPattern=%d{ISO8601}\t%m

log4j.appender.KINESIS.streamName=sonos-data-pipeline-messages

Page 39: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

Collect Store Process Consume

Page 40: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

Collect Store Process Consume

• Decouple collection from storage and processing

• Optimize for raw throughput and scale

• Amazon Kinesis vs. Kafka

• Amazon Kinesis Producer Library vs.

AmazonKinesisAsyncClient

• Netty 4

Page 41: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Amazon Kinesis Producer

Library

• Java API

• Async/PutRecords by default

• Payload aggregation

• C++ IPC microservice

Transition planning

AmazonKinesisAsyncClient (SDK)

• Java API

• Developer’s choice

• Self-implemented

• Talks to Amazon Kinesis

HTTPS API

Page 42: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

@ChannelHandler.Sharablepublic class DataCollectionHandler extends SimpleChannelInboundHandler<ByteBuf> {

...

@Overrideprotected void channelRead0(ChannelHandlerContext ctx, ByteBuf message) throws Exception {

...

ListenableFuture<UserRecordResult> kinesisFuture =kinesisProducer.addUserRecord(configuration.KINESIS_STREAM_NAME,

guid.toString(),Unpooled.wrappedBuffer(message.nioBuffer());

Futures.addCallback(kinesisFuture, new UserRecordResultFutureCallback(guid, message));

...

}

...

}

Page 43: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Transition planning

Results:

• Increased collection performance well over 2700%

• Marked decrease in cost-per-billion events:

• V1: > $5,000

• V2 Transitional: + ~6%

• V2: ~$650

Page 44: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Future directions and takeaways

Page 45: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Future directions

• EOL Sonos Data Pipeline V1

• Amazon Kinesis failure modes

• Data collection: Scala or C++?

• Spark on Amazon EMR

Page 46: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Final takeaways

• Separation of concerns allows each service to specialize

in its task, reducing complexity and downtime

• Self-service analytics unlocks the research potential of

the whole company

• Amazon Kinesis gives us the streaming data pipeline

we’re looking for without operational overhead

Page 47: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Thank you!

Page 48: (MBL314) Build World-Class Cloud-Connected Products: Sonos

Remember to complete

your evaluations!