Data at Scale - Michael Peacock, Cloud Connect 2012
-
Upload
michael-peacock -
Category
Technology
-
view
351 -
download
2
Transcript of Data at Scale - Michael Peacock, Cloud Connect 2012
![Page 1: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/1.jpg)
Data at Scale
Data problems and solutions with the connected world
![Page 2: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/2.jpg)
Michael Peacock
Web Systems Developer
Telemetry Team
Smith Electric Vehicles
Lead Developer
Occasional conference speaker
Technical Author
![Page 3: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/3.jpg)
• Worlds largest manufacturer of all electric commercial vehicles
• Founded in 1920• US facility opened 2009• US buyout in 2011
![Page 4: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/4.jpg)
Commercial electric vehicles?
![Page 5: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/5.jpg)
![Page 6: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/6.jpg)
![Page 7: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/7.jpg)
Electric Vehicles
• 16,500 – 26,000 lbs gross vehicle weight• Commercial Electric Delivery Trucks• 7,121 – 16,663 lbs payload• 50 – 240km• Top Speed 80km/h
![Page 8: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/8.jpg)
Electric Vehicles
• New, continually evolving, technology• Viability evidence required• Government research
![Page 9: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/9.jpg)
EV Data
• Performance analysis and metrics• Proving the technology: Government
research• Evaluating driver training conversions• Diagnostics, Service and Warranty Issues• Continuous Improvement
![Page 10: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/10.jpg)
![Page 11: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/11.jpg)
![Page 12: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/12.jpg)
Current Status
• ~500 telemetry enabled vehicles• Telemetry is now fitted as standard in our
vehicles• Our MySQL solution processes:
– 1.5 billion inserts per day– Constant minimum of 4000 inserts per second
![Page 13: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/13.jpg)
CANBus: 101
![Page 14: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/14.jpg)
CANBus and Telemetry
• Sample the buses: once per second• Only sample buses with useful
performance and diagnostic information on them
![Page 15: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/15.jpg)
![Page 16: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/16.jpg)
Vehicle Data• Drive train information:
– Motor speed– Pedal positions– Temperatures– Fault Codes
• Battery information:– Current, Voltage & Power– Capacity– Temperatures
![Page 17: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/17.jpg)
Connected World: The Problem
• Connected infrastructure– EV Charging stations– Utilities
• Home based telemetry– Smart Meters– Smart Homes
![Page 18: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/18.jpg)
Our problem
• Hundreds of connected devices, each with numerous sensors giving us 2,500 pieces of data per second per vehicle
• Broadcast time we can’t plan for• Vehicles rolling off the production line• New requirements for more data
![Page 19: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/19.jpg)
How it started
![Page 20: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/20.jpg)
Issue 1: Availability
![Page 21: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/21.jpg)
Issue 2: Capacity
Sometimes data is too
much to cope with
www.flickr.com/photos/eveofdiscovery/3149008295
![Page 22: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/22.jpg)
Issue 2: Capacity
![Page 23: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/23.jpg)
Option: Cloud Infrastructure
• Cloud based infrastructure gives:– More capacity– More failover– Higher availability
![Page 24: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/24.jpg)
Cloud Infrastructure: Problem
• Huge volumes of data inserts into a MySQL solution: sub-optimal on virtualised environments
• Existing enterprise hardware investment• Security and legal issues for us storing the
data off-site
![Page 25: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/25.jpg)
Cloud Infrastructure: Enabler
![Page 26: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/26.jpg)
www.flickr.com/photos/gadl/89650415/inphotostream
![Page 27: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/27.jpg)
AMQP
Advanced Message Queuing Protocol
![Page 28: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/28.jpg)
Queuing
• Downtime• Capacity• Maintenance Windows
![Page 29: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/29.jpg)
What if...
• Queuing allows us to cope with:– Downtime of our own systems– Capacity problems
• Queuing doesnt allow us to cope with:– An outage of a queuing infrastructure
![Page 30: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/30.jpg)
Buffer
www.flickr.com/photos/brapps/403257780
![Page 31: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/31.jpg)
Cloud based infrastructure
• Use a Message Queue to ensure data is only processed when you have the resources to process it
![Page 32: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/32.jpg)
SAN
• Backbone to most cloud-based systems• Powers our MySQL solution• Supports:
– Huge volumes of data– Lots of processing– Fast connection to your servers– Backups and snapshots
![Page 33: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/33.jpg)
SAN Tips
• When dealing with data on a huge scale every aspect of your application and infrastructure needs to be optimised, this includes your SAN – something which is commonly overlooked.
• http://www.samlambert.com/2011/07/how-to-push-your-san-with-open-iscsi_13.html
![Page 34: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/34.jpg)
New Architecture
![Page 35: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/35.jpg)
Speed: Stream Batch
• Streams of continuously flowing data can be difficult to process
• Turn the stream into small, quick batches
• MySQL: LOAD DATA INFILE
![Page 36: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/36.jpg)
Shard 1: Hardware
• As the amount of data increased, we hit a huge performance problem. This was solved by sharding at a hardware level.
• Each data collection device was given its own database, which could be on any number of separate machines, with a single database acting as a registry
![Page 37: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/37.jpg)
Rationalisation & Extrapolation
• Remember the CANBus– Always telling us information, which we
sample every second?– Do we always need that?
• Extrapolate and assume
![Page 38: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/38.jpg)
Getting information from data
• Vehicle performance information involves:– Looking at 20 – 30 data points for each
second of a vehicles operation in a day– Analysing the data– Performing calculations, which vary
depending on certain data points
• Getting this data was slow– How far did Customer A’s fleet travel last
week?
![Page 39: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/39.jpg)
Regular processing
• Instead of processing data on demand, process it regularly
• Nightly scheduled task to evaluate performance information
![Page 40: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/40.jpg)
Regular Processing: Problems
You need to pull the data out faster and faster than before!
![Page 41: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/41.jpg)
Shard 2: Tables
• All our data has a timestamp associated with it
• Looking up data for a particular day was slow. Very slow.
• We sharded the data again, this time with a table per week within a vehicles specific database
![Page 42: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/42.jpg)
![Page 43: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/43.jpg)
Sharding: Fallbacks and logic
• What about data before you implemented sharding?
• Which table do I need to look at?
![Page 44: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/44.jpg)
Aggregation
• With data segregated on a per vehicle and per week basis, lookups were much faster
• Performance calculations could be scheduled nightly, with a single record recorded for each vehicle for each day in a central database
• Allows for easy aggregation:– How far did my fleet travel last week?– How much energy did they use last month?
![Page 45: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/45.jpg)
![Page 46: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/46.jpg)
Backups and Archives
• SAN backups and snapshots• With date based sharding:
– Dump a table– Copy it elsewhere– Drop it / Flush it (if archiving)
![Page 47: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/47.jpg)
Outsource to the cloud
• Why waste resources doing things that cloud based services do better (where legal, security and privacy reasons allow?)
• Maps• Email delivery• Even phone integration
![Page 48: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/48.jpg)
Data Type Optimization
• When prototyping a system and designing a database schema, its easy to be sloppy with your data types, and fields
• DONT BE• Use as little storage space as you can
– Ensure the data type uses as little as you can– Use only the fields you need
![Page 49: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/49.jpg)
Sharding: An excuse
• Sharding was a large project for us, and involved extensive re-architecting of the system.
• We had to make changes to every query we have in our code
• Gave us an excuse to:– Optimise the queries– Optimise the indexes
![Page 50: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/50.jpg)
Query Optimization
• Run every query through EXPLAIN EXTENDED
• Check it hits the indexes• Remove functions like CURDATE from
queries, to ensure query cache is hit
![Page 51: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/51.jpg)
Index Optimization
• Keep it small• From our legacy days of one database on
one server, we had a column that told us which vehicle the data related to– This was still there...as part of an
index...despite the fact the application hadn’t required it for months
![Page 52: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/52.jpg)
Live data: dashboard
![Page 53: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/53.jpg)
Live data: Maps
![Page 54: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/54.jpg)
Live data
• Original database design dictated:• Each type of data point required a separate
query, sub-query or join to obtain
• Collection device and processing service dictated:• GPS Co-ordinates can be up to 6 separate
data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction
![Page 55: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/55.jpg)
Dashboards: Caching
• Don’t query if you don’t have to
• Cache what you can; access direct
• With message queuing its possible to route messages to two or more places: one to be processed and another to display the latest information directly
![Page 56: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/56.jpg)
Exporting data: Group
• Where possible group exports and reports together by the same shard/table/index
![Page 57: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/57.jpg)
Code considerations
• Race conditions• Number of concurrent requests – group
them
![Page 58: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/58.jpg)
Application Quality
• When dealing with lots of data, quickly, you need to ensure:– You process it correctly– You can act fast if there is a bug– You can act fast when refactoring
![Page 59: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/59.jpg)
Deployment
• When dealing with a stream of data, rolling out new code can mean pausing the processing work that is done
• Put deployment measures in place to make a deployment switch over instantaneous
![Page 60: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/60.jpg)
Technical Tips
• Measure your applications performance, data throughput and so on– A data at scale problem itself
• Use as much RAM on your servers as is safe to do so– We give 80% per DB server to MySQL of 100
– 140GB
![Page 61: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/61.jpg)
What do we have now?• Now we have a fast, stable reliable system• Pulling in millions of messages from a queue per
day• Decoding those messages into 1.5 billion data
points per day• Inserting 1.5 billion data points into MySQL per
day• Performance data generated, and grant
authority reports exported daily• More sleep on a night than we used to
![Page 62: Data at Scale - Michael Peacock, Cloud Connect 2012](https://reader037.fdocuments.us/reader037/viewer/2022110309/558974b5d8b42aa94a8b4705/html5/thumbnails/62.jpg)
Questions