This presentation is available for download from:...
Transcript of This presentation is available for download from:...
Mission-CriticalEnterprise Cloud
Applications
Eugene CiuranaOpen-Source Evangelist
CIME Software Labs
http://ciurana.eu/contact
This presentation is available for download from:http://ciurana.eu/TSSJSE2009
About Eugene...
• 15+ years building mission-critical, high-availability systems
• 13+ years Java work
• Open source evangelist
• Official adoption of open source/Linux at Walmart worldwide
• State of the art tech for main line of business roll-outs
• Largest companies in the world• Retail• Finance• Oil• Background: robotics to on-line retail
This presentation is about...
• How to design, implement, and roll-out cloud/enterprise hybrid applications
• Different cloud architectures
• PaaS• SaaS• Infrastructure as a service (IaaS)
• Technologies: ESB, clouds, mini-clouds, Java, App Engine, Chef!/Puppet, etc.
• How cloud technologies lower costs
• Understanding the advantages of:
• Cloud deployments• Hybrid deployments distributed in the cloud and enterprise
What You’ll Learn
• Identify the applications best suited for cloud deployments
• Identify the advantages of Platform-as-a-Service or Software-as-a-Service resources
• Learn the caveats of cross-platform and cross-language integration
• Learn about high-performance alternatives to XML serialization for data exchange
• Learn how to structure event-based, stateless apps for maximum scalability
What is the Cloud Anyway?
• Ask 10 different people, get 10 different answers
• In general, you may use 4 types of cloud offerings
• Platform as a Service• Software as a Service• Infrastructure as a Service• Pure infrastructure
• Some times you integrate pre-fabricated apps, some times platform, some times both
Cloud Services Features
• Quick deployment of prepackaged components
• Uses commodity, virtualized hardware and network resources
• Amazon Elastic Cloud 2 (EC2) and Simple Storage Service (S3)
• Google App Engine (Python, Java)• Rackspace Cloud Services
• The overall model is “pay as you consume”
• Horizontal scalability is achieved by adding or removing resources as needed
• May host full applications or only services
Cloud Services Features
• They could replace the data centre
• Basic administration moves to the application owner
• It may move away from the IT team - political fallout
• For the bean counters... it’s an operational expense!
• Tax advantages• Turn on or off as needed• In a tight economy, IT infrastructure ends up under the CFO -
give the guy options
• Assuming sensible SLAs, the ROI is better than for co-located or company-owned data centres
• What do scalability and high-availability mean?
• Scalability: the property of a system to handle bigger amounts of work or to be easily expanded in response to increased demand
• Vertical• Horizontal
Dual Core
Single Processor
16 GB RAM
Virtual Node 0
Virtual Node 2
Virtual Node 1
Dual Core
Dual Processor
32 GB RAM
Virtual Node 0
Virtual Node 2
Virtual Node 1
Scalability and High Availability
Scales up
Virtual Node 3 Node Node Node
Load Balancer
Scales
out
Node Node Node
Load Balancer
Node
Scalability and High Availability
• Availability: how the system provides useful resources over a set period of time
• Uptime != availability• Many factors affect it: network, storage, processor, etc.
Availability Downtime90% 36.5 days99% 4 days
99.9% 9 hours99.99% 53 minutes
99.999% 5.3 minutes99.9999% 32 seconds
•Amazon S3 Outage (summer 2008)•Major companies affected (LeapFrog, eBay, Apple, many others)•Amazon’s SLA? “Sorry - it won’t happen again.”
Can you affordthis?
SLAs and Availability
• Service Level Agreements drive the architectural and technological decisions
• Cloud systems’ scalability characteristics are often pitched
• What is the availability?
• What is the impact on business?• How do you perform disaster recovery?
• What service level are the vendors offering?
• The higher the availability (3-nines, 4-nines, 5-nines), the higher the cost
• Co-lo, data centre deployments still make more sense for 4-nines availability and above
Platform and Software as a Service
• Identify the type of implementation after establishing the availability requirements
• From a vendor?• In-house development?
• Software as a Service (SaaS)
• The vendor provides the full application and is responsible for scaling it and meeting SLAs
• Integration with the vendor via web services or EDI• Examples: GSI Commerce, Salesforce.com
• Platform as a Service (PaaS)
• The vendor provides the operating system and/or application stack
• The vendor provides a well-defined API for interacting with the system
• Examples: Amazon Web Services, Google App Engine, Nirvanix
PaaS Systems
• Amazon EC2 and S3
• Virtual images with various Intel-based operating systems, app servers, databases, etc.
• Billed hourly + bandwidth• Java, PHP, Ruby, other technologies supported• “Data centre in the sky”• Lower cost than Rackspace or Nirvanix
• Weaker SLAs?
• Google App Engine
• The apps are written as callbacks from a virtualized service• Datastore ~= database; Memcache ~= caching; most
interaction via HTTP• Billed on usage• “You run on the same infrastructure as Google”• Low cost, still evolving, minimal SLAs
A Hybrid Cloud Architecture
• Many mission-critical systems will live behind the corporate firewall
• The cloud is used for high-load applications and services
• The clouds applications work independently of the data center applications, and vice versa
• Loose coupling over web services• Assume that the data in the cloud is “throwaway” - it’s either
consolidated or sourced in the data centre• The cloud exists mostly for distribution and scalability
A Hybrid Cloud Architecture
services.company.com
Node Node Node Node
Legacy Back-end
Enterprise
Database
Enterprise Service Bus
User
services.company.com
Firewall
Amazon S3http://media.company.com
PNG PNG PNG PNG
PNG PNG PNG PNG
Google App Enginehttp://www.company.com
Node Node Node
Node Node Node
Datastore Datastore
A Hybrid Cloud Architecture
End-User System (Mac, Windows)
Connected
App
Web
Browser
USB
Internet
S3
Content
RepositoryThird-party
Partner Site
www.leapfrog.comconnected
productsLearningPath
Firewall
Mule ESB backboneHTTP, SOAP (CXF), REST, etc. routing, filtering, and dispatching; ActiveMQ JMS broker; dedicated services
Mule ESBConnected products SOAP, REST
web services
Mule ESBDevice log upload, processing,
servlet container
Content
Management
System
REST, JCR
Device
Logs
Crowd SSO
Customer
Data
Game
play
Data
Content
Authoring
User
Credentials
Servlets
App Logic
Which Parts Go To The Cloud?
� � � � � � � � � � � � $ $ � � � � * � % � # � � � � � # & % � � � � � � � $ ! � % � � � � � � � " & � & � � � � � � ' � � % $
� % � # � � %
� � � � � � � � � � � #
� ! ! � � � � % � � �� � # ' � #
� � � � % � �� � # ' � � � $ � � # ( )
� ! ! � � � � % � � �� � # ' � #
� � � � % � �
� � � � � � � � � � � # � � � � � � � � � �
� & � � � � � � � & � � � � � � � & � � � � � � � & � � � � � �
� � � � � � � � � � � # � � � � � � � � � �
� & � � � � � �� � � � � � � � � �
� & � � � � � �� � � � � � � � � �
� � % � � � $ �
� � � � � � � � � � � # � � � & � � ) � � �
� & � � � � � �$ � # ' � � % � � � � � �
� & � � � � � �$ � # ' � � % � � � � � �
�$ � � # �
� � � � � � � � � � � # � � � � � � $ $ � � � � � # � � #
� � % � ' � � � � � % � ' � � �
�$ � � # �
Which Parts Go To The Cloud?
• Depending on the cloud configuration, load balancing may not be necessary
• Amazon provides Elastic Load Balancers and Elastic IP• Google App Engine provides stateless calls only
• The client itself or Memcache keep state instead• Rackspace provides either “elastic” addresses or explicit load
balancers
• Enterprise Service Bus and other integration may exist in both the cloud and the data centre
• Message routing is OK is if latency is below the SLA requirements
• Strategic partner, application, and services integration may occur 100% on the cloud, with data consolidation behind the firewall
Case Study
• Large consumer and services company
• .Net / Windows servers infrastructure
• ASP.Net• SQL Server
• Two-tier architecture
• The system “just grew” and became a business
• Server pages, services, media
• 6-month scalability project began in April
• Implementation complete 25.Sep of the same year
Case Study - Objectives
• Stable architecture defined by July
• Low cost
• Build scalability wherever possible
• Optimal data transfer rate for all properties
• Web systems• Media properties
• Meet business requirements and SLAs
• Hard dates set by the business, no input from the technology team
• Milestones: 1.July, 10.Oct
Case Study - Initial Configuration
• System grew “as needed” - no planning
• Combines end-user and internal applications in the same server
• It only scales vertically -- and not very well
• Applications and database are tightly coupled
• Application servers and services tightly coupled
• Single environment: from development to production without passing Go nor collecting $200
• Disaster recovery?
• 4 hours minimum• From (stale?) backups
rsync
Linux
UDP
Services
Windows
Case Study - Initial Configuration
Zone Server
ClientClientClient
Client
Internet
Applications
.Net
Services
.Net
SQL
Server
UDP
Services
Windows
UDP
Services
Windows
UDP
Services
Windows
Master, IRC
Linux
� � � �� � � � �
Case Study - Phase I Scalability
Case Study - Phase I Scalability
• Introduces a CDN for asset delivery
• Amazon S3 (optional Cloud Front) for asset delivery• Reduces load on company servers and bandwidth costs
• Introduces database replication for production environments
• Establishes a continuous integration environment
• Set tools for hybrid UNIX/Windows environment• UNIX tools take precedence; Cygwin everywhere
• Improved build/release process
Case Study - Phase I Scalability
Co
nti
nu
ou
s I
nte
gra
tio
n S
erv
er
Ve
rsio
n C
on
tro
l S
ys
tem
, A
uto
ma
tic
Bu
ild
s,
Un
it T
es
tsServices
services HTTP other
proxy
.Net Admin App Server
.Net App Server
.Net Server
Services
Load balancer
Databases DatabasesIRC
Production
Services
services HTTP other
proxy
.Net Admin App Server
.Net App Server
.Net Server
Services
Load balancer
DatabasesIRC
Staging
Services
services HTTP other
proxy
.Net Admin Server
.Net App Server
.Net Server
Databases
IRC
QA
Services
services HTTP other
proxy
.Net Admin Server
.Net App Server
.Net Server
Databases
IRC
Development
Developer 0 Developer 1 Developer 2 Developer 3
Deployment
DeploymentDeployment
CDNCDN
CDN
Deployment
CDNCDN
CDN
CDNCDN
CDN
CDN
Case Study - Phase I Scalability
• Fail-over with traditional database replication techniques
Partition 0
Primary Cluster
Node 0 Node 1
DB 0
DB 0b
Partition 1
DB 1
DB 1b
ESB as app services provider
Case Study - Phase II Cloud Deployment
Case Study - Phase II Cloud Deployment
• Web applications move to a uniform technology (.Net)
• The database and stored procedures are normalized and optimized
• Applications use common resources via Mule ESB and services
• No more direct calls from apps to databases• Business logic is implemented as stateless POJOs
• Software stack is “best of breed”
• Windows, other servers driven by business requirements• Open source software wherever possible
Case Study - Phase II Cloud Deployment
• Web and other RPC services must coexist
• Different partners use different protocols• Mule transformers take care of all the interfaces so that
development may scale / continue independently of what happens in other layers
• Bandwidth can be expen$ive!
• Data exchange protocols
• Clients: custom, XML, JSON• Images: HTTP, S3• Cloud-to-HQ: custom, XML/SOAP, protocol buffers• HQ data centre: XML/SOAP, protocol buffers
• Replication strategy: data centre
• The cloud isn’t trustworthy yet
Case Study - Phase II Cloud Deployment
• Deployment involves using an Amazon Machine Image (AMI)
• Use existing ones from Amazon or third parties• Configure your own to meet your software requirements
• AMIs need a post-configuration step in a load-balanced environment
• Unique file processing (/etc/hosts, /etc/hostname, app configuration, etc.)
• Necessary intermediate step between launching the AMI and having a useful machine
• Elastic Load Balancer and Elastic IP limitations
• ELB only works for dynamic IP addresses• ELIPs are limited to 5 per account
Case Study - Phase II Cloud Deployment
legacyservices01
Web services
EvolverSP
Dimenxian
EvolverMP
DimensionMP
legacyservices02
Web services
EvolverSP
Dimenxian
EvolverMP
DimensionMP
balance
services01
Web services
Proxy
Routing
Queuing
services02
Web services
Proxy
Routing
Queuing
master01
IRC
game server list
174.129.238.159
master02
IRC
game server list
services
syncserver01
rsync
repository
balancesyncserver
database
master
database
slave
balance
web01
balance
web02
los01
balance
los02
losgamesvr01
balance
losgamesvr02
ep01
FRONT END
balance
ep02
FRONT END
syncserver02
rsync
repository
174.129.238.110
174.129.238.126
174.129.238.129
174.129.238.145
174.129.238.184
174.129.238.189
replicate
174.129.239.18174.129.239.11
game server
174.129.239.42
game server
174.129.239.45
game server
174.129.239.57
game server
174.129.239.68
balance
World
ep01
SERVICES
balance
ep02
SERVICES
174.129.239.41
downloads
S3174.129.238.188
app-x server
174.129.239.81
QA
DNS
174.129.15.199
legacyservicesdatabase
web
IMUService
IMUService
IMUService
Case Study - Phase II Cloud Deployment
Setup Monthly Bandwidth Total / year
Production $1,300 $5,5580 $2,000 $70,260
Staging $1,300 $5,5580 $ 0 $68,260
QA $ 600 $1,450 $ 0 $18,000
Dev $ 600 $1,450 $ 0 $18,000
Total $174,520
Traditional Hosted Environments
Case Study - Phase II Cloud Deployment
Setup Monthly Bandwidth Total / year
Production $ 0 $4,320 $1,200 $66,240
Staging $ 0 $4,320 $ 0 $51,840
QA $ 0 $1,080 $ 0 $12,960
Dev $ 0 $ 0 $ 0 $ 0
Total $131,040
Amazon EC2 Configuration
$43,000 cheaper
Case Study - Phase III Final Deployment
Case Study - Phase III Final Deployment
services01
Web services
Proxy
Routing
Queuing
master01
IRC
game server list
174.129.238.159
master02
IRC
game server listservices
syncserver01
rsync
repository
balancesyncserver
database
master
database
slave
balance
web01
balance
web02
los01
balance
los02
losgamesvr01
balance
losgamesvr02
ep01
FRONT END
balance
ep02
FRONT END
syncserver02
rsync
repository
174.129.238.110
174.129.238.126
174.129.238.129
174.129.238.145
174.129.238.184
174.129.238.189
replicate
174.129.239.18
game server
174.129.239.42
game server
174.129.239.45
game server
174.129.239.57
game server
174.129.239.68
balance
World
downloads
S3174.129.238.188
app-x server
174.129.239.81
QA
DNS
174.129.15.199
database
web
services01
Web services
Proxy
Routing
Queuing
services01
Web services
Proxy
Routing
Queuing
services01
Web services
Proxy
Routing
Queuing
Thanks for Coming!
Questions?
Wanna know more about real life cloud, scalable systems?
http://ciurana.eu/scalablesystemsSubscribe to the newsletter!
Eugene CiuranaOpen source [email protected]+41 44 586 8462