George Palmer
26th May 2007
Overview
• Starting out
• Scaling the database
• Scaling the web server
• User clusters
• Caching
• Elastic architectures
• Links and Questions
George Palmer
26th May 2007
How you start out
• Shared Hosting• One web server and DB on same machine• Application designed for one machine• Volume of traffic will depend on host
DBWeb Server
Shared Hosting
George Palmer
26th May 2007
Two servers
• Possibly still shared hosting• Web server and DB on different machine• Minimal changes to code• Volume of traffic will depend on whether made it
to dedicated machines
DBWeb Server
George Palmer
26th May 2007
Scaling the database (1)
• DB setup more suited to read intensive applications (MySQL replication)
• Should be on dedicated hosts• Minimal changes to code
MasterDB
Web Server
Slave
Slave
Slave
George Palmer
26th May 2007
Scaling the database (2)
• DB setup more suited to equal read/write applications (MySQL cluster)
• Should be on dedicated hosts• Minimal changes to code
MasterDB
Web Server
MasterDB
MySQL Cluster
George Palmer
26th May 2007
Scaling the web server
• Web Server comprises of “Worker threads” that process work as it comes in
DBFarm
Worker thread
Worker thread
Worker thread
Worker thread
Web Server
George Palmer
26th May 2007
Load balancing
• App Server depends:– Rails (Mongrel, FastCGI)– PHP– J2EE
• Some changes to code will be required
DBFarm
App Server
App Server
App Server
Load balancer
George Palmer
26th May 2007
The story so far…
App Server
App Server
App Server
Load balancer MasterDB
Slave
Slave
Slave
• App servers continue to scale but the database side is somewhat limited…
George Palmer
26th May 2007
User Clusters
• For each user registered on the service add a entry to a master database detailing where their user data is stored– UserID– DB Cluster– Basic authorisation details such as username,
password, any NLS settings
George Palmer
26th May 2007
User Clusters (2)
App Server
MasterDB
User Cluster 1
UserCluster 2
User clusters are themselves one of the two database setups outlined earlier
SELECT * FROMusers WHERE username=‘Bob’AND …
user_id=91732db_cluster=2
George Palmer
26th May 2007
User Clusters (3)
• ID management becomes an issue– Best to use master DB id as user_id in user cluster or
uuid’s– If let cluster allocate then make sure use offset and
increment (not auto_increment)
• Other DBs such as session must reference a user by id and DB cluster
• Serious code changes may be required• Will want to have ability to move use users
between clusters
George Palmer
26th May 2007
Architecture so far• As number of app servers grow it’s a good idea
to add a database connection manager (eg SQLRelay)
• Extract out session, search, translation databases onto own machines
• Add background processor for long running tasks (so don’t block app servers)
• Use MySQL cluster (or equivalent) for any critical database– In replication setup can make a slave a backup
master
George Palmer
26th May 2007
Non-cached architecture
Load balancer
MasterDB
App Server 1
App Server 2
App Server 50
…
DB ConnectionManager
MasterDB
SessionDB
SearchDB
NLSDB
Master
SlaveSlaveSlave
Master
SlaveSlaveSlave
User Cluster
2
User Cluster
1
Static Files
BackgroundRB
George Palmer
26th May 2007
Issues
• Load balancer and database connection manager are single point of failure– Easy solved
• 2PC needed for some operations. For example a user wants to be removed from search database– 2PC not supported in rails
• Rails doesn’t support database switching for a given model– Can do explicitly on each request but expensive due to
connection establishment overhead– Can get round if using connection manager but a proper solution
is required (a few gems starting to emerge on this)
George Palmer
26th May 2007
Making the most of your assets
• In a lot of web applications a huge % of the hits are read only. Hence the need for caching:– Squid
• A reverse-proxy (or webserver accelerator)
– Memcached• Distributed memory caching solution
– Language specific caching• Eg rails fragment caching
George Palmer
26th May 2007
Squid
Squid
• Lookup of pages is in memory, storing of files is on disk• Can act also act as a load balancer• Pages can be expired by sending DELETE request to
proxy• Can program any load balancer to pick up pages cached
by your app servers (if you know the rules under which it operates)
App Server 1
App Server 2
Storage
In cache Not in cache
…
George Palmer
26th May 2007
Memcached
App ServerDB Farm
Memcached
Physical Machine
• Location of data is irrespective of physical machine• A really nice simple API
– SET– GET– DELETE
• In rails only a fews LOC will make a model cached• Also useful for tracking cross machine information – eg dodge user
behaviour
App Server
Memcached
Physical Machine
(Not in memcached)
George Palmer
26th May 2007
Cached architecture
• Introduce squid or nginx
• Introduce memcached– Can go on every machine that has spare
memory• Best suited to application servers which have high
CPU usage but low memory requirements
• Introduce language specific caching
George Palmer
26th May 2007
Cached architecture
Load balancer
MasterDB
App Server 1
App Server 2
App Server 50
…
DB ConnectionManager
MasterDB
SessionDB
SearchDB
NLSDB
Master
SlaveSlaveSlave
Master
SlaveSlaveSlave
User Cluster
2
User Cluster
1
MC
MC
MC
MC=memcached
BackgroundRB
Storage
George Palmer
26th May 2007
Cached architecture
• Wikipedia quote a cache hit rate of 78% for squid and 7% for memcached– So only 15% of hits actually get to the DB!!
• Performance is a whole new ball game but we recently gained 15-20% by optimising our rails configuration– But don’t get carried away - at some point the time
you spend exceeds the money saved
• Its very easy to scale this architecture down to one machine
George Palmer
26th May 2007
Elastic architectures
• Based upon Amazon EC2– Allow you to create server images and launch
instances on demand– Very cheap as you only pay for what you use
• Currently no way to mount Amazon S3– Strictly speaking there are a few projects ongoing…
• Still in Beta– We’ve had network performance issues
• An American VC was quoted as saying “Are you using EC2 for scaling? If not, you better have a good reason”
George Palmer
26th May 2007
Elastic architectures
Load balancer
App Server 1
App Server 2
App Server 3
MC
MC
MC
Monitor
EC2 CloudEC2 Cloud
App ServerImage
App Server 4MC produces
• WeoCeo now offer a similar service
High load
George Palmer
26th May 2007
How far can it go?
• For a truly global application, with millions of users - In order of ease:– Have a cache on each continent– Make user clusters based on user location
• Distribute the clusters physically around the world
– Introduce app servers on each continent– If you must replicate your site globally then
use transaction replication software, eg GoldenGate
George Palmer
26th May 2007
Useful Links
• http://www.squid-cache.org/
• http://nginx.net/
• http://www.danga.com/memcached/
• http://sqlrelay.sourceforge.net/
• http://railsexpress.de/blog/
George Palmer
26th May 2007
Questions?
Top Related