Post on 19-Jun-2015
Coursera + AWS CloudSearch
Frank Chen Software Engineer
About • Ed-Tech startup providing MOOCs
o Massive Open Online Courses
• New company -- launched 4/18/12 o Less than a year old.
• 215 free courses from 33 top universities o Princeton, Stanford, Penn, Duke, etc... o From Cryptography to Modern and Contemporary American
Poetry
• 2.5+ million users o We reached a million users faster than Facebook and
Pinterest.
• ~9 million course enrollments
Platform Scale • Moderate-sized (>10,000 concurrent users) • 65 concurrent courses running now, each with tens of
thousands of enrollments each • >600 "pretty heavy" PHP/Python dynamic pages served
per second sustained o Might make backend calls to services (e.g. CloudSearch or SES -->
want low latencies)
• Various other services (70 instances+ on EC2 running at the moment)
• Spiky traffic o People procrastinate on deadlines - spiky on the weekends
Stack • PHP / Python / Scala backed by MySQL • Runs on AWS completely • Utilizes lots of AWS services
o EC2 / ELB for servers o MySQL RDS for databases o S3 for video and static hosting o Cloudfront for video / asset hosting o SES for emails (>1 million emails everyday) o SQS for long running tasks (video encoding, gradebook generation,
etc...) o SNS for notification services o Route53 for DNS o CloudSearch for forum search
Why CloudSearch? • Big issue for us back in March / April. Solution then
didn't work o MySQL Full Text Search
§ LIKE %x% AS NATURAL LANGUAGE? § Really terrible results § MyISAM (eww...)
• Requirements: o Fast searches (we call backend APIs - don't want to keep the users
waiting too long) o Good results (need to be relevant - don't waste the students' time) o Low/no maintenance (we have enough instances to manage as is)
Why CloudSearch?
• Alternatives we looked at: o Apache Solr, Sphinx, fiddling with MySQL
• Then CloudSearch was announced... • Early general adopter - we started using
CloudSearch ~10 days after announcement o We didn't get any heads-up about CS before the public
announcement o Wrote the code to use CloudSearch and import over our
existing forum posts / comments in 2 or 3 days. § From decision to production! § Easy to use and great documentation
CloudSearch Uses
User facing forum search
CloudSearch Uses
• Analytics o Most frequent searches and other statistics about their courses
§ Informing instructors about this so they can clarify information
o Finding posts across forums § Easy for CloudSearch, hard normally because of sharded
scatter-gather problems • Old way: Querying 600 databases on 4 RDS servers? Not fun
§ Usage analysis § Unexpected use: Instructors often want to find all their own
posts so they can save / archive common answers
CloudSearch Scale
• Moderate scale
• ~1.5 million documents indexed o All forum posts and comments
• 50,000+ searches a day o Spikey! Depends on when homeworks are due.
Experience
GREAT!
We Want...
• "Did you mean..." o Lots of typos from non-native speakers
• Multilingual Tokenization / Search o We are starting to run courses in other languages...
• Find Similar Documents
Thank You! Questions?
frank@coursera.org