Surge 2010 - from disaster to stability - scaling my.opera.com

51
from disaster to stability the scaling challenges of my.opera.com Surge 2010 – Version 3

description

 

Transcript of Surge 2010 - from disaster to stability - scaling my.opera.com

Page 1: Surge 2010 - from disaster to stability - scaling my.opera.com

from disaster to stabilitythe scaling challenges of my.opera.com

Surge 2010 – Version 3

Page 2: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 3: Surge 2010 - from disaster to stability - scaling my.opera.com

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

1999

Page 4: Surge 2010 - from disaster to stability - scaling my.opera.com

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2001

Page 5: Surge 2010 - from disaster to stability - scaling my.opera.com

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2004

Page 6: Surge 2010 - from disaster to stability - scaling my.opera.com

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2007

Page 7: Surge 2010 - from disaster to stability - scaling my.opera.com

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20101 10 50 257 205 430

8871,640

2,500

5,500ServerskUsers

2009

Page 8: Surge 2010 - from disaster to stability - scaling my.opera.com

the current beta

Page 9: Surge 2010 - from disaster to stability - scaling my.opera.com

the situation2007

Page 10: Surge 2010 - from disaster to stability - scaling my.opera.com

crashes every day

too many connections!!!

Team?

NFS volume of doom

Page 11: Surge 2010 - from disaster to stability - scaling my.opera.com

monitoring

Page 12: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 13: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 14: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 15: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 16: Surge 2010 - from disaster to stability - scaling my.opera.com

➔ Efficient filesystem cache

➔ "Dogpile effect" AKA stampeding AKA ...

➔ Persistent db + memcached connections

➔ Soft counters

➔ Profiling, profiling, …

many improvements since then

Page 17: Surge 2010 - from disaster to stability - scaling my.opera.com

code profiling[DML] time=1237308152, user=, url=/tinh_yeu_cua_anh_b88/blog/index.dml/tag/...,name=XWA::User, variable=active, type=module, elapsed=0.068473, host=my.opera.com[DML] time=1237308152, user=, url=/community/,name=XWA::User, variable=, type=module, elapsed=0.015935, host=my.opera.com[DML] ...

Page 18: Surge 2010 - from disaster to stability - scaling my.opera.com

top time-intensive modules

XWA::User::Sidebar 2024.919s (27.2%, 0.28 s/call)XWA::User 1778.445s (23.9%, 0.09 s/call)XWA::User::Journal 1121.224s (15.1%, 0.24 s/call)XWA::User::Album 321.522s ( 4.3%, 0.17 s/call)XWA::User::Journal::Search 223.477s ( 3.0%, 20.32 s/call)XWA::User::Comments 188.011s ( 2.5%, 0.05 s/call)XWA::Skins 180.486s ( 2.4%, 0.49 s/call)XWA::User::JournalArchive 159.525s ( 2.1%, 4.43 s/call)XWA::User::Posts 146.644s ( 2.0%, 0.45 s/call)XWA::User::Picture 141.324s ( 1.9%, 0.10 s/call)XWA::Albums 93.740s ( 1.3%, 2.04 s/call)XWA::Journals 92.390s ( 1.2%, 2.37 s/call)

Page 19: Surge 2010 - from disaster to stability - scaling my.opera.com

many improvements since then

➔ YSlow?

➔ The Expires header is your friend!

➔ Hot MyISAM tables converted to InnoDB

➔ MySQL Master/Master setup

➔ Jet Profiler

Page 20: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 21: Surge 2010 - from disaster to stability - scaling my.opera.com

jet profiler

Page 22: Surge 2010 - from disaster to stability - scaling my.opera.com

scalability3

Page 23: Surge 2010 - from disaster to stability - scaling my.opera.com

1. avatars

Page 24: Surge 2010 - from disaster to stability - scaling my.opera.com

Avatars - 2007

75%/<user-name>/avatar.pl

/<user-name>/avatar.pl?xscale=8192 (!)

Page 25: Surge 2010 - from disaster to stability - scaling my.opera.com

my $sql = DBConnect('master');my %user = $sql->get( "SELECT a.blob, a.filename, FROM avatars a, users u WHERE u.user=? AND u.id=a.user", $user);$req->print( $user{'blob'} );

Avatars wtf!?

Page 26: Surge 2010 - from disaster to stability - scaling my.opera.com

Avatars - reloaded➔ Export to balanced fs (5 formats)

➔ Zero SQL queries

➔ Storage subsystem

➔ static.myopera.com was born

Page 27: Surge 2010 - from disaster to stability - scaling my.opera.com

resources(user uploads, binary blobs, ...)

Poolsor single servers

URLshttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_o.pnghttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_t.jpghttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_m.jpghttp://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_l.jpg

Page 28: Surge 2010 - from disaster to stability - scaling my.opera.com

+ x➔ Load

➔ Flexibility

➔ Static scales!

➔ HTTP::DAV

➔ Precomp URLs

Page 29: Surge 2010 - from disaster to stability - scaling my.opera.com

2. varnish

Page 30: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 31: Surge 2010 - from disaster to stability - scaling my.opera.com

VarnishMost popular RSS feeds

My Opera frontpage

Opera Mini approval

Datacenter emergencies

Page 32: Surge 2010 - from disaster to stability - scaling my.opera.com

VarnishMost popular RSS feeds

➔ /desktopteam/blog/

➔ Friends, Groups API

➔ No cookies (remove req.http.cookie)

Page 33: Surge 2010 - from disaster to stability - scaling my.opera.com

VarnishMy Opera frontpage

➔ Danger, Will Robinson!

➔ Mangle cookies

➔ Accept-Language headers

Page 34: Surge 2010 - from disaster to stability - scaling my.opera.com

VarnishOpera Mini 5.0 approval

➔ Global coverage

➔ Traffic surge (5x peak, 2x over 24h)

Page 35: Surge 2010 - from disaster to stability - scaling my.opera.com
Page 36: Surge 2010 - from disaster to stability - scaling my.opera.com

IT NEEDS TO BE OUTTOMORROW

!!!

THERE WILL BE A

PRESS RELEASE !

Page 37: Surge 2010 - from disaster to stability - scaling my.opera.com

VarnishOpera Mini 5.0 approval

➔ Global coverage

➔ Traffic surge (5x peak, 2x over 24h)

➔ No problems!

Page 38: Surge 2010 - from disaster to stability - scaling my.opera.com

Opera Mini “countup” trafficSubmittedto Apple StoreMarch, 23rd

ApprovedApril, 12th

Page 39: Surge 2010 - from disaster to stability - scaling my.opera.com

VarnishDatacenter emergencies

Page 40: Surge 2010 - from disaster to stability - scaling my.opera.com

Datacenter emergencies

files.myopera.com

User Files Storage SAN

DC1

Page 41: Surge 2010 - from disaster to stability - scaling my.opera.com

Datacenter emergencies

files.myopera.com

User Files Storage SAN

DC1

DC2

LVS + Varnish servers

Page 42: Surge 2010 - from disaster to stability - scaling my.opera.com

~ 1Gbit/s! Varnish

Page 43: Surge 2010 - from disaster to stability - scaling my.opera.com

+ x➔ Load

➔ Flexibility

➔ Instant scaling

➔ Chainsaw!

➔ Purging

Page 44: Surge 2010 - from disaster to stability - scaling my.opera.com

3. geodns

Page 45: Surge 2010 - from disaster to stability - scaling my.opera.com

geodns

Page 46: Surge 2010 - from disaster to stability - scaling my.opera.com

+ x➔ Prototype 1 week

➔ Geo-scaling

➔ Redundant

➔ Accuracy

➔ No DC feedback

➔ Monitoring

Page 47: Surge 2010 - from disaster to stability - scaling my.opera.com

Next steps➔ Search (Solr?)

➔ Batch activity feed

➔ Real connection pooling

➔ … and on ...

Page 48: Surge 2010 - from disaster to stability - scaling my.opera.com

Remember!➔ Team spirit is important

➔ Another level of indirection...

➔ Keep it simple

➔ Keep a log

Page 49: Surge 2010 - from disaster to stability - scaling my.opera.com

the heroeshttp://my.opera.com/devblog/about/http://my.opera.com/devblog/

Page 50: Surge 2010 - from disaster to stability - scaling my.opera.com

any questions? ?

Page 51: Surge 2010 - from disaster to stability - scaling my.opera.com

handout download:

thanks!

http://tinyurl.com/surge2010-cosimo