NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013
-
Upload
amazon-web-services -
Category
Technology
-
view
1.918 -
download
1
description
Transcript of NoSQL Revolution: Under the Covers of Distributed Systems at Scale (SPOT401) | AWS re:Invent 2013
@ksshams @swami_79
SPOT 401 - Leading the NoSQL Revolution:
under the covers of Distributed Systems @ scale
@ksshams @swami_79
what are we covering?
The evolution of large scale
distributed systems @ Amazon from
the 90’s to today
The lessons we learned and insights you can employ in your own distributed systems
@ksshams @swami_79
let’s start with a story about a little
company called amazon.com
@ksshams @swami_79
once upon a time...
(in 2000)
episode 1
@ksshams @swami_79
a thousand miles
away...
(seattle)
@ksshams @swami_79
amazon.com - a rapidly growing Internet based retail
business relied on relational databases
@ksshams @swami_79
we had 1000s of independent services
@ksshams @swami_79
each service managed its own state in RDBMs
@ksshams @swami_79
RDBMs are actually kind of cool
@ksshams @swami_79
first of all... SQL!!
@ksshams @swami_79
so it is easier to query..
@ksshams @swami_79
easier to learn
@ksshams @swami_79
they are as versatile as a swiss army knife
complex queries
key-value access
transactions analytics
@ksshams @swami_79
RDBMs are *very* similar to
Swiss Army Knives
@ksshams @swami_79
but sometimes.. swiss army knifes..
can be more than what you bargained for
@ksshams @swami_79
partitioning
easy re-
partitioning HARD..
@ksshams @swami_79
so we bought
bigger boxes...
@ksshams @swami_79
Q4 was hard-work at Amazon
benchmark new
hardware
migrate to new
hardware
repartition databases
pray ...
@ksshams @swami_79
RDBMs availability challenges..
@ksshams @swami_79
then.. (in 2005)
episode 2
@ksshams @swami_79
amazon dynamo predecessor to
dynamoDB
specialist tool :
•limited querying capabilities
•simpler consistency
replicated DHT with consistent hashing
optimistic replication
“sloppy quorum”
anti-entropy mechanism
object versioning
@ksshams @swami_79
dynamo had many benefits • higher availability
• we traded it off for eventual consistency
• incremental scalability
• no more repartitioning
• no need to architect apps for peak
• just add boxes
• simpler querying model ==>> predictable performance
@ksshams @swami_79
but dynamo was not perfect...
lacked strong consistency
@ksshams @swami_79
but dynamo was not perfect...
scaling was easier, but...
@ksshams @swami_79
but dynamo was not perfect...
steep learning curve
@ksshams @swami_79
but dynamo was not perfect...
dynamo was a product ... ==>> not a service...
@ksshams @swami_79
then.. (in 2012)
episode 3
@ksshams @swami_79
ADMIN
“Even though we have years of experience with large, complex
NoSQL architectures, we are happy to be finally out of the
business of managing it ourselves.” - Don MacAskill, CEO
• NoSQL database
• fast & predictable performance
• seamless scalability
• easy administration
DynamoDB
@ksshams @swami_79
services, services, services
@ksshams @swami_79
amazon.com’s experience with services
@ksshams @swami_79
how do you create a successful service?
@ksshams @swami_79
with great services, comes great responsibility
@ksshams @swami_79
Architect
Customer
@ksshams @swami_79
DynamoDB Goals and Philosophies
never compromise on durability scale is our
problem easy to use
scale in rps consistent and low
latencies
@ksshams @swami_79
how to build these large scale services?
@ksshams @swami_79
don’t compromise on durability…
@ksshams @swami_79
don’t compromise on… availability
@ksshams @swami_79
plan for success, plan for scalability
@ksshams @swami_79
@ksshams @swami_79
Fault tolerant design is key.. • Everything fails all the time
• Planning for failures is not easy
• How do you ensure your recovery strategies work correctly?
@ksshams @swami_79
Byzantine General Problem
@ksshams @swami_79
A simple 2-way replication system of a traditional database…
Primary Standby
Writes
@ksshams @swami_79 @ksshams @swami_79
P S
S is dead, need to trigger new
replica
P is dead, need to promote myself
P’
@ksshams @swami_79 @ksshams @swami_79
Improved Replication: Quorum
Replica
Replica
Writes Replica
Quorum: Successful write on a majority
Not so easy..
Replica B
Replica C
Writes from
client A Replica A
Replica D
New member in the
group
Should I continue to serve reads?
Should I start a new quorum?
Replica E Replica F
Reads and
Writes from
client B
Classic Split Brain Issue in Replicated systems leading to lost writes!
@ksshams @swami_79
Building correct distributed systems is not straight forward..
• How do you handle replica failures?
• How do you ensure there is not a parallel
quorum?
• How do you handle partial failures of replicas?
• How do you handle concurrent failures?
correctness is hard, but necessary
Formal Methods
Formal Methods
to minimize bugs, we must have a precise description of the design
Formal Methods
code is too detailed
how would you express partial failures or concurrency?
design documents and diagrams are vague & imprecise
Formal Methods
law of large numbers is your friend,
so design for scale
until you hit large numbers
@ksshams @swami_79
TLA+ to the rescue?
@ksshams @swami_79
PlusCal
@ksshams @swami_79
formal methods are necessary
but not sufficient..
@ksshams @swami_79
customer
@ksshams @swami_79
forget to test - no, serious don’t ly
embrace failure and don’t be surprised
simulate failures at unit
test level
fault injection testing
datacenter testing
network brown out testing
scale testing
testing is a lifelong journey
@ksshams @swami_79
testing is necessary but not sufficient..
@ksshams @swami_79
Architect
Customer
@ksshams @swami_79
release cycle
gamma simulate real
world
one box does it work?
phased deployment
treading lightly
monitor does it still
work?
@ksshams @swami_79
Monitor customer behavior
@ksshams @swami_79
measuring customer experience is key
don’t be satisfied by average - look at
99 percentile
@ksshams @swami_79
understand the scaling dimensions
@ksshams @swami_79
understand how your service will be abused
@ksshams @swami_79
let’s see these rules in action through a true story
@ksshams @swami_79
we were building distributed systems all over
amazon.com
@ksshams @swami_79
we needed a uniform and correct way to do
consensus..
@ksshams @swami_79
so we built a paxos lock library service
@ksshams @swami_79
such a service is so much more useful than just leader election.. it became a distributed
state store
@ksshams @swami_79
such a service is so much more useful than just leader election.. or a distributed state
store
wait wait.. you’re telling me if I poll,
I can detect node failure?
@ksshams @swami_79
we acted quickly - and scaled up our entire fleet
with more nodes
doh!!!!
we slowed consensus...
@ksshams @swami_79
understand the scaling dimensions
& scale them independently...
@ksshams @swami_79
State Store
a lock service has 3 components..
@ksshams @swami_79
State Store
they must be scaled independently..
@ksshams @swami_79
State Store
they must be scaled independently..
@ksshams @swami_79
State Store
they must be scaled independently..
Let’s Go Over The demo from this morning
stream ingestion
stream ingestion
stream ingestion
Real-time tweet analytics using DynamoDB
• Stream from Kinesis to DynamoDB
• What data do want in real-time? • (per-second, top words)
• How does DynamoDB help? • Atomic counters (per-word counts in that second)
• Indexed queries (top N word-counts in that second
Time Word Count
2013-10-13T12:00 Earth 9
2013-10-13T12:00 Mars 10
2013-10-13T12:00 Pluto 5
2013-10-13T12:03 Earth 8
Time Count Word
2013-10-13T12:00 5 Pluto
2013-10-13T12:00 9 Earth
2013-10-13T12:00 10 Mars
2013-10-13T12:03 8 Earth
WordCount Table Local Secondary Index
DynamoDB cost: $0.25 / hr
stream ingestion
Aggregate queries using Redshift
• Simple Redshift connector (buffer files, store in s3, call copy command)
• Manifest copy connector • 2 streams
• transaction table for deduplication
• manifest copy
Right tool for right job…
• Canal -> DynamoDB -> Redshift -> Glacier…
You are not done yet..
• Listen to customer feedback
• Iterate..
Example: DynamoDB
• Start with immediate needs of reliable, super scalable, low latency datastore
• Iterate • Developers wanted flexible query: Local Secondary Indexes
• Developers wanted parallel loads: Parallel Scans
• Mobile developers wanted direct access to their datastore: Fine-grained Access Control
• Mobile developers wanted geo-awareness: Geospatial library
• Developers wanted DynamoDB on their laptop: DynamoDB Local
• Developers wanted richer query: Global Secondary Indexes
• We will continue to innovate..
@ksshams @swami_79 @ksshams @swami_79
Sacred Tenets in Distributed Systems
plan for success – plan for scalability
don’t compromise durability for performance
plan for failures - fault -tolerance is key consistent performance
is important release - think of blast radius
insist on correctness
@ksshams @swami_79
understand scaling
dimensions
observe how service is
used
scalability
over features
strive for
correctness
relentlessly test
monitor like a hawk
@ksshams @swami_79
Please give us your feedback on this
presentation
Don’t miss SPOT 201!!!
SPOT 401
@ksshams @swami_79