Post on 07-Jan-2017
Scaling TokopediaPast, Present, Future
Once Upon a TimeIn Jakarta, Jan 2009
1 Product Guy and 1 Half Engineer
as co-founder
Never have experience to manage a high traffic website
Don’t have business background AT ALL
Perl as back end Build our own perl framework
Apache Mod Perl
Oracle Express Edition
Hm… looks like we need a better front end designer
AwStats and A little bit Google Analytic
CBN
apache server oracle server
Network Topology
Apps Topology
Internet
apache server
oracle server
http req http resp
sql
2 co-founder 1 real engineer
1 cust care
Hooray, WE LAUNCH!!
IDR 33 Mio of GMVin the first month
WE ARE SLOW!!!
* We didn’t have storage * pictures uploaded is stored on the same machine * Web page & static content is served by single apache * We didn’t use CDN * We didn’t even know what is CDN
WHY??
Network Topology
CBN
apache appserver
oracle serverapache staticserver
Apps Topology
Internet
apache app server
oracle server
http req http resp
sql
Internet
apache upload / statis server
oracle server
http upload http resp
sql
Internet
apache upload / statis server
http req http resp
access web page upload pictures read staticpicturescss + js
We are back in business
BUT WE ARE SLOW AGAIN!!!
* Oracle express edition reach it’s limit * No Partition * No Replication * Poor indexing * Read/Write and Query on the same Master DB.
WHY??
SO WE MIGRATE TO
Network Topology
CBN
apache appserver
PostgreSQL Masterapache staticserver
PostgreSQL Slave
Apps Topology
Internet
apache app server
PostgreSQL Master
http req http resp
sql insert sql update sql delete
PostgreSQL Slave
sql iquery
WAL streaming Replication
We did it again!!!!
DAMN SEARCH IS SLOW!!!
* We have a lot of new products every second * We have to show search results in real time * But every second the sorting keep changing * PostgreSQL load is just too much!!!
WHY??
And Many More……..
SEARCH IS EASY !!!!
Come on Man….SLOW AGAIN??
* We were using apache + mod perl * Apache consume a lot of resource * Our code has a lot of memory leak
WHY??
* We found out about NginX is very light and fast * We use nginx as load balancer * Replace apache modperl with nginx-perl * We have 1 nginx load balancer with several nginx-perl servers * For load balancing method, we mix round robin and clustering
SOLUTION
siege -c100 -t5s -i -b -q 'http://www.tokopedia.com/ebenhaezer' siege: invalid option -- 'q' siege: invalid option -- 'q' ** SIEGE 2.72 ** Preparing 100 concurrent users for battle. The server is now under siege... Lifting the server siege... done.
Transactions: 14788 hits Availability: 100.00 % Elapsed time: 4.59 secs Data transferred: 63.50 MB Response time: 0.03 secs Transaction rate: 3221.79 trans/sec Throughput: 13.83 MB/sec Concurrency: 87.52 Successful transactions: 7481 Failed transactions: 0 Longest transaction: 0.43 Shortest transaction: 0.00
Apps Topology
PostgreSQL Master
sql insert sql update sql delete
PostgreSQL Slave
sql iquery
WAL streaming Replication
Internet
http req http resp
NginX Load Balancer
nginx-perl #1 nginx-perl #2 nginx-perl #3 nginx-perl #n
proxy_pass
SOLR
Import
SOLR query
Now what….Storage??
* Hardware limitation * We used SATA HDD not SSD * Disk Utilities 100% * No back up, No Failover * Capacity is critical * Users keep uploading pictures
WHY??
User
We also use CDN
AFTER ALLWE ARE STILL SLOW!!!
SOLUTION
Internet
nginx-perl #1
PostgreSQL Master
http req http resp
nginx-perl #2 nginx-perl #3 nginx-perl #n
NginX Load Balancer
proxy_pass
PostgreSQL Slave
replication
MongoDBprimary
MongoDBsecondary
replication
SOLR
Redis
query & update
3rd Party API such asLogistics, Banks,
Payment GwETC
Internet
We Start To Know About NginX, NoSQL
In-Memory Storage GlusterFS Storage
Scale out (not scale up) and many more…..
Lesson Learn??
Thanks to ourAwesome Engineers
and many more…
We are back in business
BUT …………..
For the first time in our life we were doomed!!!
* One of our GlusterFS Server is broken. Image read/write is super slow.
* We were using version of postgresql which has some bugs on indexing.
WHY??
Another Awesome Engineers
Mixed with International Team
Current State
New VP of Engineering
FUTURE
* Mobile First Company
* Zero Downtime
* Full to Cloud
* Re-architech to SoA
* Open API to Public
* Deploy New Tech, such as replace perl with Go Lang
* Advance Alert & Monitoring
* Redundancy and Failover
* Multiple 3rd party
* Datawarehouse such as Cubes, Pentaho etc
* Machine Learning, Business Intelligence
* Build things that can be share with others
* Really pay attention on security
* and many more……
What if the problems come from ISP?
Unsolved Issues
* User cannot access Tokopedia * Pictures are not showing * css and js are not loaded * Sometime it just show a blank page * Some ISPs do Ads Injection * ALL WITHOUT REASONS
FACTS
WHY??WE DON’T KNOW
BUT SOMETHING HAPPENON ISP SIDE
Works well* Using NginX Geo Module * All HTTPS since Q4 2014 * Try CDN Load balancing
Don’t work at all* Talked to ISP * “Fight” in idEA
What we’ve done
Don’t think “someone else will join and take care of this” — Mike Krieger of Instagram
Whether you think you can, or you think you can’t, you’re right — Henry Ford
THANK YOU ANY QUESTIONS?