(SOV204) Scaling Up to Your First 10 Million Users | AWS re:Invent 2014
Scaling up to 40 m users (1)
-
Upload
gamele-ventures -
Category
Technology
-
view
127 -
download
3
Transcript of Scaling up to 40 m users (1)
Scaling up to & over 40M users
Scaling Software, Scaling Data & Scaling PeopleThe Wix Experience
About Wix
Wix in Numbers
• Wix was founded in 2006• 39M registered users from most countries• Over 1,000,000 new users every month• Over 1,000,000 new websites every month• Over 150 TByte of users media files
– More than 1 billion users media files– More than 1.5 TByte uploaded files daily
• Over 300 Servers in 2+1 datacenters + Google + Amazon• Over 100,000,000 API calls a day
Flash
Wix Initial Architecture
• Tomcat, Hibernate, Custom web framework– Built for fast development– Not considering performance, scalability, fast feature rollout, testing– It reflected the fact that we didn’t really know what is our business– We know that we will need to replace it when we grow.– However, we failed to understand how difficult that can be!
• Don’t worry about ‘building it right from the start’ – you won’t• Build for gradual re-write as you learn the problems and find the right
solutions
2006 2007 2008 2009 2010 2011 2012 2013
Flash
HTML 5
Wix(Tomcat)
MySQLDB
2006 2007 2008 2009 2010 2011 2012 2013
Flash
HTML 5
Editor & Public Segments
• The Challenge - Updates to our Server imposed downtime for our customer’s websites– Any Server or Database update has the potential of bringing down all Wix sites– Is a symptom of a larger issue
• The Server served two different concerns– Wix Users editing websites– Viewing Wix Sites, the sites created by the Wix editor
• The two concerns require different SLA– Wix Sites should never ever have a downtime! – Wix Sites should work as fast as possible, always! – However, an editing system does not require this level of SLA.
Editor & Public Segments
• The two concerns evolve independently – Releases of Editing feature should have no impact on
existing Wix sites operations!• Our Solution
– Split the Server into two Segments – Public and Editor• The Public segment targets serving websites for
Wix Users– Has mostly read-only usage pattern – only updated
when a site is published– Simple publishing system– Simple and readonly means it is easier to have higher SLA and DRP– MySQL used as NoSQL – single large table with XML text fields
• The Editor segment – Exposes the Wix Editing APIs, as well as user account and galleries
management APIs.– Has different release schedule compared to the Public segment
Public(Tomcat)
Public DB
Editor(Tomcat)
Editor DB
MySQL as NoSQL
• MySQL is a damn good NoSQL engine– Our public DB was (mainly) one huge table– Queries & Updates are by primary key– Instead of relations, we use text/xml or text/json columns– No updates for Blobs – immutable data– No Transactions
• Use indirection table to blob table– Insert a new blob value, update the pointer to the new blob, async delete
• MySql auto-generated keys cause problems– Locks on key generation– Require a single instance to generate keys
• We use GUID keys– Can be generated by any client– No locks in key value generation– Enabler for Master-Master replication
Wix on Managed Hosting
2006 2007 2008 2009 2010 2011 2012 2013
Flash
HTML 5
Co-Location Managed Hosting Cloud
Own and maintain your own hardware
Lease both hardware and maintenance
Instantly lease hardware
Provisioning == buy and deliver your new server
Overnight provisioning Instant provisioningUnlimited resources
Reliable software on reliable hardware
Reliable software on reliable hardware
Reliable software on unreliable hardware
Data Centers
2006 2007 2008 2009 2010 2011 2012 2013
Flash
HTML 5
• Austin (Managed Hosting)– Our first Data Center
• Chicago (Managed Hosting)– Data DRP, then Active Active with Austin
• Amsterdam (Managed Hosting)– The idea was 3xActive– However, it failed – it is too complex to have 3 Active data centers
(3 way replication)• Amazon, Google (Cloud)
– 2nd vendor, Service Disruption DRP
Chicago AmsterdamAmazon,GoogleAustin
2006 2007 2008 2009 2010 2011 2012 2013
Flash
HTML 5
Wix Media Segment
• The Challenge – Our static storage reached over 500 GByte of small files– The “upload to app server, post process files, copy to lighttpd server, serve by
lighttpd” pattern proved inefficient, slow and error prone– Disk IO became slow and inefficient as the number of files increased– We needed a solution we can grow with –
• HTTP connections• number of files
– We needed control over caching and Http headers• We needed dynamic image manipulations
– Rebuild a few millions of media files is not simple
• Our current architecture
Prospero – Wix Media Storage
x36TM x36
TM x32TM
x36TM x36
TM x32TM
Google Cloud Storage
Austin
Chicago
get 37D815B5.jpg
First fallback
Second fallback
CDNIf not in CDN
CDN
• Use a CDN!• CDN acts as a great connection manager
– We have CDN hit ratio’s of over 99.9%• Use the “Cache Killer” pattern
– http://static.wix.com/client/css/viewer.css?v=327– http://static.wix.com/client/1.3.2/css/viewer.css– Makes flushing files from the CDN redundant– Enabler for longer caching periods
• There are many vendors– We started with 1 CDN vendor– We are now working with two CDN vendors– Different CDN vendors have advantages at different geo
• Tune HTTP Headers per CDN Vendor– CDN Vendors interpret HTTP headers differently
2006 2007 2008 2009 2010 2011 2012 2013
Flash
HTML 5
Development Velocity
• The Challenge – Our codebase became large and entangled– Feature rollout became harder over time, requiring longer and longer manual
regression– The longer the regression was, the harder is became to make “a good release” – Strange full-table scans queries generated by Hibernate, which we still have no
idea what code is responsible for…• The solution
– Mid 2010 – Wix Framework – modern base libraries– Beginning 2011 – CI / CD / TDD techniques + DevOps culture + Automated
Deployment– Mid 2011 – Scala– SOA Architecture (not WSDL) Framework
CI / CD / TDD + DevOps
Scala
People are the key
• Train the people you already have– We sent our entire QA department to learn Java– Developers learn TDD and CI/CD methodologies.
• Hiring the right people is key to success– Hire only the best developers (only seniors)– Don’t count only on the interview, you need to test actual coding– Anyone who interviews can drop a candidate– Hire people who will challenge you (no “yes man”)– Get people you can trust with “root” access to production
• Never stop hiring– If we find an excellent person we will create a position for him even if we do
not have one open.• Wix is doubling its size every year
– Yes we are currently hiring.– We’re considering to start hiring and training junior developers.
CI / CD @ Wix – Release Process
• Make an RC– Runs build, unit-tests, integration tests
CI / CD @ Wix – Release Process
• Deploy as GA– Using Chef, Noah, Artifactory– Runs Self-Tests
CI / CD @ Wix – Release Process
• Monitor– Deployment, NewRelic, App-Info, Recent Events
• Rollback