Running a Megasite on Microsoft Technologies

31
Running A Megasite Running A Megasite On Microsoft On Microsoft Technologies Technologies Casey Jacobs Casey Jacobs Aber Whitcomb Aber Whitcomb Director of Engineering Director of Engineering CTO CTO Microsoft.com Microsoft.com MySpace.com MySpace.com Chris St.Amand Chris St.Amand Jim Benedetto Jim Benedetto Sr. System Engineer Sr. System Engineer VP of VP of Technology Technology Microsoft.com Microsoft.com MySpace.com MySpace.com NGW046 NGW046

description

MySpace and Microsoft.com are two of the most-visited Web sites on the planet. Come to this session to hear about lessons learned using Microsoft technologies to run Web applications on a massive scale. Representatives from Microsoft.com talk about lessons learned using an all-Microsoft datacenter. Representatives from MySpace talk about the realities of using Microsoft technologies in a scalable, federated environment using SQL Server 2005, .NET 2.0 and IIS 6 on Windows Server 2003 64-bit editions. This session features an open Q&A with a panel of technical managers and engineers from MySpace and Microsoft.com.

Transcript of Running a Megasite on Microsoft Technologies

Page 1: Running a Megasite on Microsoft Technologies

Running A Megasite On Running A Megasite On Microsoft TechnologiesMicrosoft Technologies

Casey JacobsCasey Jacobs Aber WhitcombAber WhitcombDirector of EngineeringDirector of Engineering CTOCTOMicrosoft.comMicrosoft.com MySpace.comMySpace.com

Chris St.AmandChris St.Amand Jim BenedettoJim Benedetto Sr. System EngineerSr. System Engineer VP of TechnologyVP of TechnologyMicrosoft.comMicrosoft.com MySpace.comMySpace.com

NGW046NGW046

Page 2: Running a Megasite on Microsoft Technologies

AgendaAgenda

Introduction – Quick FactsIntroduction – Quick Facts

MySpace.com – Growing UpMySpace.com – Growing Up

Upcoming Technology EnablersUpcoming Technology Enablers

Open Panel Discussion Open Panel Discussion

Page 3: Running a Megasite on Microsoft Technologies

IntroductionIntroduction

Page 4: Running a Megasite on Microsoft Technologies

Brief History Of Microsoft.comBrief History Of Microsoft.com

Microsoft combines Web platform, ops, and

content teams

Standardization effort begins, consolidation

hosted systems

Focus on MSCOM Network Programming and campaign-

to-Web integration

Single MSCOM group formedBrand, content, site std’s, Privacy, brand compliance

Microsoft launcheswww.microsoft.com

Information & supportpublishing; hosting

Enable an innovative customer experience online & in-product

Product Info, Support, Dev / ITPro Experience, Customer Intelligence, Profile Mgmt &

Enterprise Downloads

2001

4M UUsers / day

2003

6.5M UUsers / day

1995

30k users / day

2006

17.1M UUsers / day

Page 5: Running a Megasite on Microsoft Technologies

Microsoft.comMicrosoft.comQuick FactsQuick Facts

Infrastructure and Application FootprintInfrastructure and Application Footprint

5 Internet Data Centers & 3 CDN Partnerships5 Internet Data Centers & 3 CDN Partnerships

110 Web Sites, 1000’s App's and 2138 Databases 110 Web Sites, 1000’s App's and 2138 Databases

80+ Gigabit/sec Bandwidth80+ Gigabit/sec Bandwidth

Solutions at High ScaleSolutions at High Scale

www.Microsoft.com www.Microsoft.com 13M UUsers/Day & 70M Page Views/Day13M UUsers/Day & 70M Page Views/Day

10K Req/Sec, 300K CC Conn’s on 80 Servers10K Req/Sec, 300K CC Conn’s on 80 Servers

350 Vroots, 190 IIS Web App’s & 12 App Pools350 Vroots, 190 IIS Web App’s & 12 App Pools

Microsoft UpdateMicrosoft Update250M UScans/Day, 12K ASP.NET Req/Sec, 1.1M ConCurrent 250M UScans/Day, 12K ASP.NET Req/Sec, 1.1M ConCurrent

28.2 Billion Downloads for CY 200528.2 Billion Downloads for CY 2005

Egress – MS, Akamai & Savvis (30-80+ Gbit/Sec)Egress – MS, Akamai & Savvis (30-80+ Gbit/Sec)

Page 6: Running a Megasite on Microsoft Technologies

MySpace Company OverviewMySpace Company OverviewLaunched Sept, 2003Launched Sept, 2003Latest as of February 2006Latest as of February 2006

64+ MM Registered Users64+ MM Registered Users38 MM UUsers & 2.3M 38 MM UUsers & 2.3M ConcurrentConcurrent260K New Registered 260K New Registered Users/DayUsers/Day23 Billion Page* Views/Month23 Billion Page* Views/Month

DemographicsDemographics50.2% Female / 49.8% Male50.2% Female / 49.8% MalePrimary Age Demo: 14-34Primary Age Demo: 14-34

Site TrendsSite Trends260K New Users/Day260K New Users/Day430M Total Images 430M Total Images Millions of Songs Streamed/DayMillions of Songs Streamed/Day1000’s of New MP3’s/Day1000’s of New MP3’s/Day20 Million Comments Posted20 Million Comments Posted

Media Metrix February 2006 Audience RankingsMedia Metrix February 2006 Audience Rankings

Source comScore Media Metrix February - 2006

Internet RankInternet Rank Pageviews in ‘000sPageviews in ‘000s

YahooYahoo #1#1 29,50829,508

MySpaceMySpace #2#2 23,56623,566

MSNMSN #3#3 14,69514,695

EbayEbay #4#4 9,6329,632

GoogleGoogle #5#5 7,3297,329

HotmailHotmail #6#6 6,8126,812

Page 7: Running a Megasite on Microsoft Technologies

MySpace.com MySpace.com Quick FactsQuick Facts

Infrastructure and Application FootprintInfrastructure and Application Footprint3 Internet Data Centers3 Internet Data CentersServer BreakdownServer Breakdown

2682 Web and 650 Database Servers2682 Web and 650 Database Servers90 Cache Servers 16gb RAM90 Cache Servers 16gb RAM650 Dart servers650 Dart servers60 DB Servers60 DB Servers150 Media servers150 Media servers

3000 disks in SAN architecture3000 disks in SAN architectureEgress ManagementEgress Management

17,000 mb/s bandwidth17,000 mb/s bandwidth15,000 mb/s on CDN15,000 mb/s on CDN

Page 8: Running a Megasite on Microsoft Technologies

MySpace.comMySpace.com

Growing up in the Internet WorldGrowing up in the Internet World

Page 9: Running a Megasite on Microsoft Technologies

0 users0 usersThe beginningThe beginning

Two tiered architectureTwo tiered architectureSingle DatabaseSingle Database

Load balanced web serversLoad balanced web servers

Great for rapid developmentGreat for rapid development

Less complexity means faster time to Less complexity means faster time to market and less operational costsmarket and less operational costs

Works for small to medium sized Works for small to medium sized websites, not big oneswebsites, not big ones

0 Users

Page 10: Running a Megasite on Microsoft Technologies

500k Users500k UsersA Single database is not enoughA Single database is not enough

Max out a single databaseMax out a single database

Split reads and writes across separate Split reads and writes across separate databasesdatabases

Use transactional replication so Use transactional replication so multiple databases can service readsmultiple databases can service reads

500k Users

Page 11: Running a Megasite on Microsoft Technologies

1 Million1 MillionVertical partitioningVertical partitioning

Transactional replication doesn’t work Transactional replication doesn’t work for all workloads and data typesfor all workloads and data types

Use a combination of Vertical Use a combination of Vertical Partitioning and replicationPartitioning and replication

1M Users

Page 12: Running a Megasite on Microsoft Technologies

2 Million2 MillionSANSAN

Start to reconsider SCSI arrays for the Start to reconsider SCSI arrays for the long-termlong-termSCSI arrays have good performance SCSI arrays have good performance but reliability issuesbut reliability issuesSANS provide better performance, SANS provide better performance, uptime, and redundancyuptime, and redundancyMove to a clarion and enjoy better Move to a clarion and enjoy better these benefitsthese benefits

2M Users

Page 13: Running a Megasite on Microsoft Technologies

3 Million3 MillionHorizontal partitioningHorizontal partitioning

Vertical Partitions see Vertical Partitions see performance problemsperformance problems

Decide we need to re-architect the Decide we need to re-architect the databasedatabase

Horizontal partitioning is the Horizontal partitioning is the answer but is difficult to do while answer but is difficult to do while in productionin production

3M Users

Page 14: Running a Megasite on Microsoft Technologies

Horizontal PartitioningHorizontal Partitioning

All features reside on All features reside on a single database servera single database server

Data is partitioned by user IDData is partitioned by user ID

Some data cannot be partitioned Some data cannot be partitioned especially on a social networking siteespecially on a social networking site

3M Users

Page 15: Running a Megasite on Microsoft Technologies

5 Million5 MillionNetwork bottlenecksNetwork bottlenecks

Various areas of the network Various areas of the network become saturatedbecome saturated

Gig uplinks are maxed outGig uplinks are maxed outSwitch to Autonomous network and BGPSwitch to Autonomous network and BGP

Get multiple gig links and 10G linksGet multiple gig links and 10G links

Load balancer is maxed outLoad balancer is maxed out““Must load balance the load balancers”Must load balance the load balancers”

Use DNSUse DNS 5M Users

Page 16: Running a Megasite on Microsoft Technologies

7 Million7 MillionSite dependenciesSite dependencies

Separating features on the front end Separating features on the front end isolates potential bottlenecksisolates potential bottlenecks

Using subdomains Using subdomains is easiest wayis easiest way

7M Users

Page 17: Running a Megasite on Microsoft Technologies

10 Million10 MillionScalable storageScalable storage

Trying to partition storage on the Trying to partition storage on the backend is time consuming and backend is time consuming and inefficientinefficient

Maxing out SANs is very costlyMaxing out SANs is very costly

We realize scalable storage is keyWe realize scalable storage is key

10M Users

Page 18: Running a Megasite on Microsoft Technologies

15 Million15 MillionDB’s versus CachingDB’s versus Caching

Databases still having perf issuesDatabases still having perf issuesDatabases are expensiveDatabases are expensive

Have a lot of transactional overheadHave a lot of transactional overhead

Caching tierCaching tierHigh speed cache is perfect for readsHigh speed cache is perfect for reads

LRU algorithm is self managingLRU algorithm is self managing

Drastically reduces database loadDrastically reduces database load

Page 19: Running a Megasite on Microsoft Technologies

MySpaceMySpaceWhere we are todayWhere we are today

Page 20: Running a Megasite on Microsoft Technologies

Upcoming Technology Upcoming Technology EnablersEnablers

What’s Next for Microsoft.com and What’s Next for Microsoft.com and MySpace.com?MySpace.com?

Page 21: Running a Megasite on Microsoft Technologies

SQL Server 2005SQL Server 2005Product technology enablersProduct technology enablers

Peer-To-Peer ReplicationPeer-To-Peer ReplicationSystem & Data Center AutonomySystem & Data Center Autonomy

Zero “perceived” Application Downtime Zero “perceived” Application Downtime from Consumersfrom Consumers

Eliminates Single Point of Failure for R/W Eliminates Single Point of Failure for R/W DatabasesDatabases

Mirroring (SP1)Mirroring (SP1)Targeting Replacement of Log Shipping Fail-Targeting Replacement of Log Shipping Fail-Over pairsOver pairs

3 Systems in TAP Program (Technet, 3 Systems in TAP Program (Technet, Learning & Genuine) Learning & Genuine)

Reduced Failover DowntimeReduced Failover DowntimeLog Shipping: 5-15min AvgLog Shipping: 5-15min Avg

Mirroring < 1min (planned)Mirroring < 1min (planned)

Table PartitioningTable PartitioningReduced Storage CostsReduced Storage Costs

Scale Up at Lower CostsScale Up at Lower Costs Data Center A

Database Mirroring

Web Cluster 1 Web Cluster 2

Principle Mirror

Sync / Async

Transactions

Data Center BData Center A

ICPSQL.PHX.GBL

Peer-To-Peer Replication

SQL A SQL B SQL C SQL D

NLB VIP 1 NLB VIP 2

Page 22: Running a Megasite on Microsoft Technologies

MySpaceMySpaceScaling SQL ServerScaling SQL Server

V1: Single Instance – < 1 Million UsersV1: Single Instance – < 1 Million UsersSingle SQL Server Instance Supports All Users and FeaturesSingle SQL Server Instance Supports All Users and Features

V2: Single Instance Replicating to Read Only V2: Single Instance Replicating to Read Only Full Copies < 2 Million UsersFull Copies < 2 Million Users

Single server handles all write transactions, read Single server handles all write transactions, read transactions spread across multiple transactional transactions spread across multiple transactional replication copiesreplication copies

V3: Vertical Partitioning - < 4 Million UsersV3: Vertical Partitioning - < 4 Million UsersEach Feature/Page of the site on its own SQL ServerEach Feature/Page of the site on its own SQL Server

Page 23: Running a Megasite on Microsoft Technologies

MySpaceMySpaceScaling SQL ServerScaling SQL Server

V4: Horizontal Partitioning - < 8 Million UsersV4: Horizontal Partitioning - < 8 Million UsersAll features/pages brought back to single database schemaAll features/pages brought back to single database schema

Standard schema across all databasesStandard schema across all databases

User ranges partitioned across databasesUser ranges partitioned across databases

V5: Horizontally Partitioned Core with Replicated V5: Horizontally Partitioned Core with Replicated Content, Vertically Partitioned Features Databases, Content, Vertically Partitioned Features Databases, “Shared Content” Databases - > 8 Million Users“Shared Content” Databases - > 8 Million Users

Primary Myspace schema exists across large farm of servers Primary Myspace schema exists across large farm of servers

Small amounts of content replicated to all horizontally partitioned Small amounts of content replicated to all horizontally partitioned servers to allow for features spanning all servers to allow for features spanning all user rangesuser ranges

V6: Migration to SQL Server 2005 - >26 Million UsersV6: Migration to SQL Server 2005 - >26 Million Users

Page 24: Running a Megasite on Microsoft Technologies

SQL Server 2005SQL Server 200564 bit64 bit

Memory Pressure under 4GB 32 Limit Memory Pressure under 4GB 32 Limit Servers loaded with 32Gigs of RAMServers loaded with 32Gigs of RAM

<4 Gig Addressable to the memory pools we were <4 Gig Addressable to the memory pools we were stressingstressing

ManifestationsManifestationsConnection TimeoutsConnection Timeouts

Servers going “dark”, requiring restartServers going “dark”, requiring restart

Rejected ConnectionsRejected Connections

Problem Eliminated on 64bit ArchProblem Eliminated on 64bit Arch Connection/Sort memory pools now able to Connection/Sort memory pools now able to address all 32Gigs of RAMaddress all 32Gigs of RAM

Page 25: Running a Megasite on Microsoft Technologies

Virtualizing StorageVirtualizing Storage

What is it?What is it?Software layer between your disks & hostsSoftware layer between your disks & hosts

AdvantagesAdvantagesProvisioning is very simple, makes capacity Provisioning is very simple, makes capacity planning more predictableplanning more predictable

Much better performanceMuch better performance

Can easily add more capacity to a LUNCan easily add more capacity to a LUN

What do we use?What do we use?3par3par

14 week bake off14 week bake off

Page 26: Running a Megasite on Microsoft Technologies

Longhorn And IIS 7.0Longhorn And IIS 7.0Product technology enablersProduct technology enablers

UNC Content StoreUNC Content StoreSimplified Content MgmtSimplified Content Mgmt

Reduced Disk FootprintReduced Disk Footprint

File Replication (DC to DC)File Replication (DC to DC)Latent/Long links improved 80X Latent/Long links improved 80X (10Mbps vs 850Mbps)(10Mbps vs 850Mbps)

Enabler of Geo-Hosting OptionsEnabler of Geo-Hosting Options

Centralized IIS Config’sCentralized IIS Config’sCopy “Host-Host” capabilityCopy “Host-Host” capability

Eliminate complex scripting of meta-Eliminate complex scripting of meta-base & config’sbase & config’s

Dynamic Content CompressionDynamic Content CompressionFurther reduced EgressFurther reduced Egress

Improved Web Perf DeliveryImproved Web Perf Delivery

Data Center A

UNC

UNC Content Store

DFS Replication

Data Center B

UNC

Web Cluster Web Cluster Web Cluster

DFS Replication

File StoreBackup Backup

Web Cluster

File Store

Data Center A Data Center B

Content Replication

High Bandwidth File Replication for Content Sync, Peer-

to-Peer, Log Shipping

Page 27: Running a Megasite on Microsoft Technologies

IIS 7.0IIS 7.0Failed Request TracingFailed Request Tracing

Page 28: Running a Megasite on Microsoft Technologies

Objective – Enable Targeted Release of App’s and Content

Avoid demographic support spikes and further align to marketing campaigns

Microsoft Confidential. © 2006 Microsoft Corporation. All rights reserved. This presentation is for internal Microsoft use only.

Akamai Edgesuite

US Users(NYC, LA, DC)

Taiwanese Users

Polish Users

Allother users

Policy:Suppress

WGA Release

Policy:Release

WGA at 8% per day

Policy:Release

WGA at 2% per day

Policy:Release

WGA at 5% per day

Broadband Users

Narrowband Users

Easy to reach – regulate

as needed

Hard to reach – NEVER regulate

Sensitivity to Time/Frequency of customer online experiences

Improve ability to reach last 30% of client population

Geo-Targeting SolutionsGeo-Targeting SolutionsDemographic managementDemographic management

Page 29: Running a Megasite on Microsoft Technologies

Open Panel DiscussionOpen Panel Discussion

Page 30: Running a Megasite on Microsoft Technologies

© 2006 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

Page 31: Running a Megasite on Microsoft Technologies