EQR Reporting: Rails + Amazon EC2
-
Upload
jeperkins4 -
Category
Documents
-
view
117 -
download
1
description
Transcript of EQR Reporting: Rails + Amazon EC2
Platform 3: To Infinity and BeyondJanuary, 2009
Summit XI
Gartner’s Hype Cycle
-2-
Overview
• Architecture• Video• Reporting
-3-
Architecture: What’s that?
• The structures of the system– The externally visible parts and the relationships between them
-4-
Architecture: Goals
• Performance– Every page needs to yield a response within 5 seconds
• Availability/Reliability– Always there!
• Scalability– Dynamically add RAM/CPU– Dynamically add more servers
• Agile/Flexible– Can easily be adapted– Follow best practices
• Accuracy– No response left behind– Quality Assurance
-5-
Architecture: Performance
• How do we achieve great performance?– Using the right software
• Ruby on Rails– Twitter, LinkedIn, Hulu
– Good application design• Reporting has different needs than Authoring/Runtime
– Testing / Benchmarking / Tuning• Rails has lots of good built-in utilities to make these easy• We’re writing test code, right?
– Dedicating time for maintenance / new features• As data grows• As more complexity is brought in to application environment• As we get smarter
-6-
Architecture: Performance• Good Application Design – Separation of Concerns• Separating databases for Runtime and Reporting is a
Good thing!– Runtime is OLTP
• OLTP, refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and retrieval transaction processing. It has also been used to refer to processing in which the system responds immediately to user requests. - Wikipedia
– Reporting is OLAP• OLAP, is an approach to quickly provide answers to analytical queries
that are multi-dimensional in nature. Databases configured for OLAP employ a multidimensional data model , allowing for complex analytical and ad-hoc queries with a rapid execution time. - Wikipedia
– Analytical processing on Reporting doesn’t impact performance on Runtime (ie Active Surveys in the field) because they are physically different systems.
-7-
Architecture: Availability/Reliability• Co-location
– Uptime• eApps
– 99.98% over past 1000 days
• Colo4Dallas– Guarantees 100%, reality? 99%+
• Amazon Web Services – 99.95%
• Redundancy– Servers have different profiles for different services
• Databases• Web / Application servers• Proxy / Load balancing
– Server profiles are duplicated and online for… • Hardware failures • Load balancing during peak demand
-8-
Architecture: Scalability• Reporting
– www.eqrtools.com hosted at eApps • Runs on an $70/month plan (1.2 GB RAM Virtual Private Server)• Pre-packaged with Java, Rails, MySQL, mail server, etc.• Can upgrade package in minutes and add servers via web interface• Cancel anytime
– Amazon Web Services• S3 = Simple Storage Service• EC2 = Elastic Cloud Computing• CloudFront = Content Delivery Network
• Authoring/Runtime– Hosted at Colo4Dallas
• n Front End Web/Application servers• n Database servers
– Wowza• Streaming Video Service via Amazon EC2
-9-
Architecture: Amazon Web Services• Simple Storage Service (S3)
– In use at Equation with JTS for 2+ years– Expanding use for storing more stuff
• Images – plain, rollover, etc.• Documents – PDF reports• Videos• EC2 Machine Images
• Elastic Cloud Computing (EC2)– Provides ability to add servers (Linux/Windows flavors) for
specific services• i.e. Wowza Video Streaming• Grabs content from S3• Can be expanded to other uses – Rails application hosting/database
• CloudFront– Provides Content Delivery Network (CDN) to push to edge
• Content that we move into S3• Moves content closer to clients reducing network latency
-10-
Architecture: EC2 Simplified
• Virtual Machines/Servers– Scalability in two dimensions
• Use as many machines as you need• Various machine sizes available
• High availability• High bandwidth
-11-
Architecture: EC2 Instance Types
-12-
• EC2 supports different instance types – Small Instance
• 1.7 GB memory, 32-bit platform, I/O Performance: Moderate• 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)• 160 GB instance storage (150 GB plus 10 GB root partition)• Price: $0.10 per instance hour
– Large Instance• 7.5 GB memory, 64-bit platform , I/O Performance: High• 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)• 850 GB instance storage (2 x 420 GB plus 10 GB root partition)• Price: $0.40 per instance hour
– Extra Large Instance• 15 GB memory, 64-bit platform, I/O Performance: High• 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)• 1,690 GB instance storage (4 x 420 GB plus 10 GB root partition)• Price: $0.80 per instance hour
-13-
CloudFront: Content Delivery Network
• and how it works…
Amazon – CloudFront CDNCopies of files in S3 bucket are accessed/cached from edge servers around the world.
-14-
Amazon: CloudFront
Architecture: Amazon
• Benefits– No upfront investments
• No contract• No hardware to purchase, install/fit, maintain• Pay for what we use
– Offer variety of uses – Content hosting, machine hosting, streaming video
– Competitors often charge upfront and monthly fees and don’t offer one-stop-service
– We can dynamically add/remove machines as we need them
• Additional applications built on EC2 are also available…– Wowza Video Streaming– Jungle Disk (backup/recovery)– GigaVox Media (Podcast hosting)– Morph (Application hosting)– RightScale (Application hosting/monitoring)– Scalr (Load Balancing/farm)
-15-
Architecture: Quality Assurance
• Code Coverage
-16-
Architecture: Quality Assurance
• Example – Question controller
-17-
Rich Media: Audio, Images and Video
-18-
Serving Video is like… TV
• Content – (i.e. The Ad)
• Delivery– (i.e. Cable, Satellite, Rabbit ears)
• Viewer– (i.e. – The television box)
The Content: Preparation
• There are many source formats to video– AVI (early Windows format), Quicktime (.mov), Windows
Media, MPEG, Flash
• Files are large and not optimized for web delivery– Encoded for other mediums
Content conversion
• The Old Way– Sorensen Squeeze
• A desktop tool where we manually took a file and converted into multiple varying bitrate Flash files
– Uploaded file(s) to third party hosted Flash Video service
• The New Way– File uploader– ffmpeg (under the covers)
• An open source utility that has been wrapped with Ruby packages to provide compression in the P3 Application
• Media is compressed for optimal playback experience• Media is still formatted to flash
– Most commonly served format on Internet (> 92%)
– Converted file uploaded to Amazon • File resides in S3 folder• Streamed via Wowza server hosted on EC2 instance
Video: ffmpeg
• Still a bit of magic involved…– Reduce this, increase that…
-23-
Video: ffmpeg conversion
• But at least we’ve built tools!
-24-
Video: Delivery
• Progressive Download– Copy of video is made on your local temp drive and then
buffered back through the player as it downloads• Lacks IP protection
– ESPN– Video is sent to player over http from file system on host server– Some companies will block content
• by MIME type• video over http on port 80 is the easiest way to get past security
• Streaming– Video is streamed in real time from streaming video server
• No local copy made
– Near instantaneous playback– Uses rtmp protocol– Important to size/compress correctly for intended audience
-25-
Video: Delivery• Factors impacting Client reception
– Other programs running• How much available CPU/RAM does the respondent’s web-enabled
device have?
– Bandwidth• DSL, Cable, dialup?• Bandwidth varies during a video session (i.e. 30 second Ad)
-26-
Video: The Player
• The swf file– Hosted on server, embedded in page– Skinnable
• Remove controls
– Plays either progressive or streaming– JW Player is the most ubiquitous
-27-
P3 Reporting
-28-
Reporting: Online Analytical Processing (OLAP)
-29-
Reporting: The Update Algorithm
• Scheduled Batch– Go update all the surveys every x minutes…
• Open and recently closed
• On Demand– Update this survey now
• Real-time– Asynchronously, grab queued responses from a MQ with
updates from the Runtime
-30-
Reporting: On demand
-31-
Reporting: Key features
• View results by Question• Filtering
– By status– Compound filters based on question/choice sets
• Crosstabs– Question v Question crosstabs– Filter by status
• Quotas / Segments– View current / total counts
• Monitor survey progress– Total, Last day, Last hour…
-32-
Reporting: What’s left?
-33-
• More testing…• Report generation
– PDF– Other formats
• Email notification• More slicing/dicing tools• Migration to Scalr???• Beta with select clients• User feedback
– Incorporate into future releases