TEKsystems Educations Services Presentsjtconsult.com/courseware/cloud2/Cloud_Computing_II.pdf ·...

Post on 22-May-2020

5 views 0 download

Transcript of TEKsystems Educations Services Presentsjtconsult.com/courseware/cloud2/Cloud_Computing_II.pdf ·...

TEKsystems Educations Services PresentsCloud Computing II

• VirtualBox (https://www.virtualbox.org/wiki/Downloads)

64-bit OS

• Classfiles (Large zip containing a VM)

• Unzip to a folder on your computer

Cloud Computing

What You Will Need…

Upon completion of this course, you will be able to:

• Explore advanced infrastructure topics such as:

• elasticity, availability, reliability, and orchestration

• Examine open source and private cloud offerings such as OpenStack

• Utilize distributed processing frameworks such as Apache Hadoop

• Create PaaS-based applications with Openshift and others using Java, Python, and interface-driven approaches

• Explore security, identity access management techniques

Cloud Computing

Course Objectives

Session 1

• The State of the CloudReview, Key Players, Trends

• Private and Open Source Cloud Platforms

Session 2

• Working with Distributed Processing Frameworks

Cloud Computing

Course Agenda

Session 3

• Cloud Best PracticesScalability, Availability, Elasticity

• Exploring PaaS with Openshift

Session 4

• PaaS Continued

• Cloud Security TopicsIdentity Access Management

Cloud Computing

Course Agenda

Cloud Computing

Instructor Introduction

• Name

• What you work on

• Reason for attending

Cloud Computing

Student Introductions

• Sign-in Sheet

• Training Manual

• Start / Stop Time

• Breaks and Lunch

• Questions and Answers

Facilities and Logistics

Module 1: The State of the Cloud

The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance

• Brief Review

• Cloud Principles

• Key Players, Products, and Services

• Trends

Cloud Computing

Overview

• Virtualization / Cloud Technologies

• Hypervisors, Computer Ring Security

• Horizontal vs. Vertical Scaling

• Type I, II, Paravirtualization, Full Virtualization,

Hardware Acceleration

• Virtual Appliances

Cloud Computing

Brief Review

• 3 primary modes for server virtualization:

• Full virtualization

• Paravirtualization

• Hardware acceleration

Cloud Computing

Virtualization Types

Ring 0: Kernel

Ring 1: Device DriversRing 2:Device Drivers

Ring 3:Applications

• Amazon Web Services (AWS)

• Elastic Cloud Compute (EC2)

• Simple Storage Service (S3)

• Elastic Beanstalk

• SimpleDB

• Availability Zones

Cloud Computing

Brief Review (continued)

• Some believe in the 5-3-2 principle

• It defines (for clouds):

• 5 key characteristics

• 3 delivery methods

• 2 deployment models

• These can be summarized as follows…

Cloud Computing

Cloud Theory

Cloud Computing

5-3-2 Principle

On-demandself-service Broad

Network Access Resource Pooling(location transparent)

Rapid Elasticity

Metered Service (pay-per-use)

SaaSIaaS PaaS

Public Cloud Private Cloud

Cloud Computing

Public Cloud Key Players (IaaS)

Amazon Web Services

Terremark

HP

IBM SmartCloud

Cloud Computing

Public Cloud Key Players (PaaS)

Force.com

Wyaworks

Cloud Computing

Public Cloud Key Players (SaaS)

IntacctTaleo

Rollbase

Cloud Computing

Private Cloud Key Players

Eucalyptus Systems

1. Cloud Management Services– Cloud Federation Services

– One-click deployments, monitoring, alerts, scheduling, logging, auditing tools

2. Emphasis on better availability, stronger SLAs3. More Open Source Competition

– OpenStack vs. Eucalyptus vs. Commercial Products

4. Fragmentation of PaaS markets– Vendors providing different types of PaaS offerings

5. Big Data use within cloud services

Cloud Computing

5 Trends in the Cloud

Cloud Computing

Open Source Cloud-Related Products

DeltaCloud

• 5-3-2 defines cloud characteristics, service, and deployment models

• Hundreds of companies now provide products and services at all levels of the service model stack

• Open source tools continue to evolve and mature

– Tools are rapidly being adopted by larger companies and incorporated into commercial offerings

Cloud Computing

Summary

Module 2: Implementing IaaS

The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance

• Cloud Platforms, Private Clouds• Open Source Cloud• Distributed Processing• Cloud Implementation Best Practices• Cloud Orchestration

Cloud Computing

Overview

• Advantages of implementing private clouds:– Increased utilization of assets

– Security (trust issues)

– Easier integration (with on-site systems)

– Control of the cloud operating environment

– Less likelihood of vendor lock-in

• Like public clouds, private clouds typically also meter client usage

Cloud Computing

Private Clouds

• Currently, several products have evolved that are competing for the open source cloud market

• Eucalyptus - Private IaaS cloud, compatible with EC2 & S3

• Open Nebula – a cloud virtualization platform

• CloudStack – Cloud.com's offering incorporating OpenStack technologies (code base)

• OpenStack• Designed for private cloud and public cloud developments• First released: Oct 2010• Combined effort of Rackspace Cloud and NASA Nebula Cloud• Provides storage, management tools, and virtualization capabilities

Cloud Computing

The Open Source Cloud

Cloud Computing

Eucalyptus Cloud Environment

Eucalyptus offers an open source cloud

Solution allowing users to provision their own resources via a UI against a company's on-premise data center

• Another configuration recognized by the cloud community is a hybrid cloud

• A hybrid cloud has elements of both a private cloud combined with a public cloud

• One way to accomplish this is via Amazon's VPC (Virtual Private Cloud) service

• VPC use cases:– Provisioning test environments

– New branch office/units (virtual desktops)

Cloud Computing

Virtual Private Cloud

Cloud Computing

Remote Cloud to Enterprise

Enterprise

Data Center

VPCSubnet

VPN Gatewayinternet, VPN connectionCustomer

Gateway

AWS VPC provides one cloud and VPN connection.

Within the cloud, define up to 20 subnets, using conventional CIDR notation

Subnets are connected in a star topology with a single virtual router between them

• OpenStack is a collection of tools that deliver a highly scalable cloud operating system

• It is free to use under Apache 2.0 license

Cloud Computing

OpenStack Technologies

• Strong support from the business community

Cloud Computing

OpenStack Commercial Backing

• Requirements:• Most Linux distros, targeted for Ubuntu

• Hypervisors: Xen, XenServer, Hyper-V, KVM, ESX

• Version Releases• Austin (version 1) released: Oct 2010

• Grizzly (v7): Apr 2013

• Havana (v8): Oct 2013

• Icehouse (v9): Apr 2014

• Juno (v10):Oct 2014

• Kilo (v11):Apr 2015

• Liberty(v12) Oct 2015

Cloud Computing

OpenStack Versions/Requirements

• Work from Public Clouds

• Install from a "vanilla" script

• Install from the ground up

• Install from build scripts

Cloud ComputingInstallation Options

http://docs.openstack.org/juno/install-guide/install/apt/content/

http://docwiki.cisco.com/wiki/OpenStack

Cloud Computing

OpenStack Components

Cloud Computing

OpenStack Compute Architecture

• nova-api – often referred to as the cloud controller, initiates most orchestration efforts

• nova-schedule – handles a VM creation request determining where best to create it

• nova-compute – a worker that creates and destroys VMs

• glance-registry – stores image metadata• nova-network – sets up networking tasks (i.e. IPs)• nova-volume – similar to AWS EBS, maintains

instance snapshots

Cloud Computing

Compute Daemons

OpenStack Nodes

• What is Neutron?

Cloud Computing

OpenStack Neutron"Network-connectivity-

as-a-service"

Set of supported plugins:

Open vSwitchCiscoLinux BridgeNicira NVPRyu

NEC OpenFlow.Big Switch, Floodlight REST Proxy. PLUMgridHyper-V Plugin. Brocade Plugin. Midonet Plugin.

• OpenStack has two main networking modes:• Fixed IPs

• IPs are assigned to instances and are fixed until instance terminates

• Flat & Flat DHCP Mode

• A single-global network for new instances• VLAN-DHCP Mode

• Segmentation mechanism where each tenant (project) can provide its own private network

• Floating IPs• IPs are addresses that are dynamically assigned but can be

reassigned to another instance at any time

Cloud Computing

OpenStack and Networking Modes

Openstack VMs with Neutron

• Neutron provides a bridge adapter (br100 in the image) as a gateway to the VMs running on a particular host

Openstack with Neutron

Horizon(Browser) Nova-API

EC2 APIOpenStack API

REST API

Asynchronous Message Queue

KVM, Xen, …

Nova Compute Nova Network

Neutron NW Mgr

Neutron API

Neutron PluginOpen

vSwitch

allocate

allocate

POST, GET, PUT, DELETE

Neutron Agent

VM1

VM2libvirt orXEN API

• Openstack Heat is responsible for orchestration within Openstack

• It implements an orchestration engine to launch multiple composite cloud applications based on templates in the form of text files that can be treated like code.

– A Heat template describes the infrastructure for a cloud application in a text fileo servers, floating ips, volumes, security groups, users

– When you need to change your infrastructure, simply modify the template and use it to update your existing stack.

Cloud Computing

OpenStack Heat

• Heat templates are text files written in YAML to describe a configuration:

description: Simple template to deploy a single compute instance

resources:

my_instance:

type: OS::Nova::Server

properties:

image: cirros-0.3.3-x86_64

flavor: m1.small

key_name: my_key

networks:

- network: private-net

Cloud Computing

OpenStack Heat Templates

• Openstack Ceilometer is responsible for collecting measurements of the utilization of the resources comprising deployed clouds

– physical and virtual resources

• Ceilometer persists these data points for subsequent retrieval and analysis, and trigger actions when defined criteria are met

– Meters, Samples, Statistics, Pipelines, and Alarms are used to organize Ceilometer functionality

Cloud Computing

OpenStack Ceilometer

• Openstack Swift is the object store for Openstack

• Stores blocks of data and makes it available to users

• Swift is a widely-used and popular object storage system provided under the Apache 2 open source license

• Requests are made via HTTP using a RESTful API.– GET,PUT,POST,DELETE

Cloud Computing

OpenStack Swift

• Cinder is a volume manager– Volumes are hard drives mounted on Machines

– Under the most common scenario, the Cinder volumes provide persistent storage to guest virtual machines

• VMs start and stop frequently– The data in a Volume can persist the lifecycles of a VM

Cloud Computing

OpenStack Cinder

• Exploring OpenStack

• Refer to exercise 1 in the student exercises

Cloud Computing

Exercise 1

• Openstack has a set of command line tools• nova xxx

• keystone xxx

• neutron xxx

Example commands:

Cloud Computing

OpenStack Commands

keystone user-create --name=admin --pass=ADMIN_PASS --email=admin@example.com

keystone user-list

keystone user-delete

openstack project createopenstack project delete

Add a new user

List users

Delete user

Project

Cloud Computing

OpenStack Admin Console

OpenStack Dashboard project called Horizon

Cloud Computing

OpenStack Admin Console Cont.

• Projects - organizing of servers and resources

• Admin - Manage Openstack

• Identity - Create and manage groups and users

Cloud Computing

OpenStack Admin VMs

• Images are software loads -- Operating Systems

• Flavors define types of machines

Cloud Computing

OpenStack Networks

• The console is an interface to define networks

• Routers

• ip ranges

• OpenStack can be complex to configure. • Log files are often key to diagnosing a broken

installation• Log files are under: /var/log/**projectname

– var/log/nova

– var/log/glance

– var/log/cinder

– /var/log/keystone

Cloud Computing

OpenStack Resources

Cloud ComputingBest Practices for Cloud Implementation

These are discussed…

• Assume the cloud will fail

• Utilize Elastic IP services• When a server fails, elastic IPs can instantly remap to a set of servers

• Also useful for application upgrades/updates

• Incorporate multiple availability zones

• Eliminate single points of failure

• Use automated backups for databases

• Take snapshots of application instances

Cloud Computing

Designing for Failure

(see next slide for more)

• Accomplished at multiple levels: geographic, data center, application, and infrastructure

• Ensuring no single points of failure exist

Cloud Computing

High Availability

Infrastructure HADatabase1

Database2WebServer2

WebServer1LoadBal1

LoadBal2

failover

Creating a Load Balancer on AWSallows for defining availability zone and health check valuesCreate Placement Groups

(clusters) on AWS to achieve this effect

• Rule of thumb: build for 30-40% capacity beyond estimated requirements

• Estimate by Peak Bandwidth, Concurrent Users, or Application Sizing

• Peak Bandwidth:

• Acquire estimations from monitoring s/w, monthly traffic logs

• Concurrent Connections

• How much bandwidth does each user consume on average?

• Application Size

• Page sizes and requests per page per sec

Cloud Computing

Capacity Planning

• There are 3 ways to implement elasticity:

• Scaling at fixed intervals

• Event-based scaling

• On-demand scaling• The requires a system that

scales without human intervention

• Automate builds and deployments– Use services to monitor system metrics

– Incorporate tools such as Chef, Puppet, CFEngine

Cloud Computing

Implementing Elasticity

New Terms:Spin-up Elasticity– time it takes to spawn a new instanceSpin-down Elasticity – time it takes to shut down instances that are no longer needed(on EC2 this is 1 minute and 1 hour respectively!)

ChefPuppet

AWS CloudFormationRightScaleAbiquoenStratus

AWS CloudWatchCloudKickNetIQScienceLogicZenoss

RightScaleKaavoScalrMorph

Cloud Computing

Cloud Infrastructure Management

Provisioning Configuration Management

MonitoringAutomation / Orchestration

• Many companies have products that automate cloud-based administrative tasks

• Examples:• Verify proper authentication,

provisioning new instance and storage resources, notify upon completion

• Or, automatically scale resources upon changes in load

Cloud Computing

Automation and Orchestration

Tools come in flavors:- Config Mgt (Chef, Puppet, Juju)- Mgt Console Based (RightScale, Abiquo, enStratus)- Template-based (RightScale, CloudFormation)

Cloud Computing

Resource Orchestration• Describes the coordination of services to allow

for business process workflowo provision/manage resourceso reproduce deployments and test environments

Cloud Computing

Automation Products

define allocation limits

Other automation tools/vendors:RightScaleAWS CloudformationKaavoScalrenStratusTidal Enterprise Orchestrator

• Create loosely coupled components

• Use REST-based services, asynchronous calls

• AWS recommends a "GrepTheWeb" approach

Cloud Computing

Decouple Components

• Data that changes infrequently, should be cached on the edge

• Video, audio, CSS, PDFs, JavaScript files, static HTML

• Use content delivery services to cache and deliver (CDNs)

Cloud Computing

Static Data Close to the User

• Keep dynamic data as close as possible to your computing instances

• Reduces latencies

• Decreases costs, data in/out is metered by the Gb

• In-cloud data transfer is free

• Perform processing within same availability zone• Move data into the cloud before processing

– Use external services such as import/export services

Cloud Computing

Dynamic Data Closer to Instances

• Cloud solutions force a change in paradigm

• Updates to servers at 2:00am on Saturday is no longer necessary

• Run servers continuously in parallel• Shift IP addresses to new instances as needed,

• Shift them back afterwards

• Regression/Unit Tests require servers to be provisioned

for only a short time

• On-premise servers may sit idle for most of the day

Cloud Computing

Think Parallel

• Many design rules have already evolved with respect to the cloud including: designing for failure, how to implement elasticity, and placing data close to where it will be used

• OpenStack is a newer contender in the private cloud open source market

• Orchestration and automation tools are appearing quickly to help simplify administrative tasks

Cloud Computing

Summary

• Exploring OpenStack Swift

• Refer to exercise 2 in the student exercises

Cloud Computing

Exercise 2

Module 3: Distributed Processing

The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance

• Distributed Processing• Introducing Apache Hadoop• MapReduce

Cloud Computing

Overview

• The last few years has seen companies store any and all information passing through their networks– Shopping sites store much more information than just what users

are purchasing

– Search sites store every possible piece of information

• The infrastructure required to store all this used to be cost prohibitive– storage costs have dropped

– Cloud providers are plenty

– Data is a goldmine of value to most companies

Cloud Computing

The Data Explosion

• In cloud solutions running hundreds of instances, how are large computational tasks accomplished?

– Ex: Google searches returns queries in less than a second

• Ans: Via a distributed processing system

• A Distributed Processing System requires often utilizes a distributed file system

– NFS is the most well-known distributed file system

Cloud Computing

Distributed Processing

• NFS is inadequate to handle massively scalable architectures utilized in grids and clouds

– It is file based thus limited to storage on a single machinescapacity

• HDFS (Hadoop File System) is designed to overcome NFS shortcomings taking advantage of large scale nodes

– Breaks files you specify into 64Mb chunks (blocks)

– Distributes the blocks to machines in the cluster

– Replicates the blocks on two other machines along the way

Cloud Computing

Distributed File Systems

• Apache Hadoop (introduced earlier) utilizes a– Distributed file system (HDFS)

– MapReduce algorithm for reliable parallel processing

– Cassandra – a scalable multi-master database

– Avro – data serialization system

– Pig - A dataflow language running on HDFS

– Hive – A data warehouse supporting SQL syntax which is converted into MapReduce Jobs

– Hbase – a distributed column (object) style database

Cloud Computing

The Hadoop "EcoSystem"

• Hadoop can be hosted on AWS S3 file system– MapReduce algorithms can be run on EC2 servers

– Data is read from server instances and written back to S3

Cloud Computing

Hadoop and MapReduce

Clusters of Yahoo Search servers running MapReduce Algorithms

• MapReduce was created by Google to solve issue of searching massive amounts of data

– By 2014 was over 1 billion web sites online

– Required thousands of servers

– Cost of servers became expensive so cheap servers (x86 architecture machines) were sought

– MapReduce was implemented across these thousands of machines

Cloud Computing

The Need for MapReduce

• MapReduce originates from functional programming

• map():

• reduce():

Cloud Computing

MapReduce - Functional Programming

def doubleIt(val): return 2*valresults = map(doubleIt, [1, 2, 3])

[2, 4, 6]

def sum_reducer(val1, val2): return val1 + val2

print reduce(sum_reducer, results)

12

Cloud Computing

HDFS Architecture

NameNode

DataNode

DataNode

Namenode server (master)opens, closes, renames files and directories

Uses a master/slave architecture

Datanodes (slaves)perform read/write of blocksof data to clients

• Hadoop processes tasks using a master/slavearchitecture

Cloud Computing

Hadoop: HDFS & MapReduce

Cloud Computing

Hadoop Commands• Because HDFS is a different file system than the native

OS, it uses a different command set to manage it

– The hadoop script in the bin directory contains the commands

– The syntax is:

where moduleName can be either dfs or dfsadmin for HDFS related tasks

Examples: bin/hadoop dfs –ls /

bin/hadoop dfs –mkdir /task

bin/hadoop moduleName –cmd args

Cloud Computing

Other Hadoop Commands• Inserts myFile into HDFS calling it file2

– Note: if file2 is a directory, it will create file2/myFile

• Display contents of file2

• Retrieves a file from HDFS putting it in the local FS

bin/hadoop dfs –put myFile file2

bin/hadoop dfs –cat file2

bin/hadoop dfs –get file2 localFile2

• Three XML files are commonly used to help configure properties for hadoop deployments:

• core-site.xml

• mapred-site.xml

• hdfs-site.xml

Cloud Computing

Hadoop Config Files

conf/hadoop-env.sh also contains environment specific details that may be edited (such as JAVA_HOME)

contains info about location of namenode

job tracker info, list of data nodes, etc.

path locations for datanodes where blocks will be stored

• Datanodes are given the location of the namenodeserver in their config files (core-site.xml)

– When started the datanodescontact the namenode, allowing them to be dynamically added to the list for job processing

Cloud Computing

Adding Nodes to the Cluster

Namenode

Datanode

Datanode

Datanode

• To run the example:

• Files in the input directory are read and counts of words are written to the output directory

• It is assumed that both inputs and outputs are stored in HDFS – If your input is not in HDFS, but rather a local file

system somewhere, copy the data into HDFS using:

Cloud Computing

Running the WordCounter

./hadoop jar hadoop-*-examples.jar wordcount[-m <#maps>] [-r <#reducers>] <in-dir> <out-dir>

./hadoop dfs -mkdir <hdfs-dir>

./hadoop dfs -copyFromLocal<local-dir> <hdfs-dir>

Cloud Computing

WordCounterimport org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.*;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throwsIOException, InterruptedException {

String line = value.toString();StringTokenizer tokenizer = new StringTokenizer(line);while (tokenizer.hasMoreTokens()) {

word.set(tokenizer.nextToken());context.write(word, one);

}}

}

Cloud Computing

WordCounterpublic static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values, Context context) throws IOException, InterruptedException {int sum = 0;while (values.hasNext())

sum += values.next().get();context.write(key, new IntWritable(sum));

}}

public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = new Job(conf, "wordcount");

job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);

FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.waitForCompletion(true);}

}

• A number of providers offer cloud solutions ideally suited for private-based clouds– Open Source options are ideal choices

• Hadoop is a framework for implementing computational services scaled across many servers

• MapReduce makes scalable operations possible

Cloud Computing

Summary

• Utilize the virtual CentOS image to configure and run an Apache Hadoop application – Step to configure and run:

o Launch guest OS

o Create a Hadoop user, create ssh keys

o Set up ssh (secure shell)

o Obtain hadoop, configure the XML files

o Set Hadoop environment (Java & Hadoop home)

o Run the daemons

o Import file into HDFS

o Run job, view results

Cloud Computing

Exercise 3 – Apache Hadoop

Module 4: The PaaS Model

The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance

• PaaS Subcategories

• Google App Engine

• Other PaaS Providers

• Private PaaS Considerations

Cloud Computing

Overview

• PaaS: environments that support development and runtime frameworks

– Development tools may be hosted with the PaaS providero Ex: Metadata aPaaS, or

– Created using local IDEs and deployed into the PaaS cloudo Ex: Framework aPaaS

• By 2015, cloud development solutions are growing faster than on-premise development

Cloud Computing

PaaS Overview

• PaaS can be broken into several subcategories:

– Application platform as a service (aPaaS)o Instance aPaaS (Azure, Elastic Beanstalk), o Framework aPaaS (GAE, Heroku, Djangy), o Metadata aPaaS (Force.com, OrangeScape)

– Software infrastructure as a service (SIaaS)o Offer partial cloud development environments,not full cloud platforms

o Ex: AWS SimpleDB, AWS SQS, MS SQL Data Services, Akamai, AWS Cloudfront, AWS IAM

Cloud Computing

PaaS Subcategories

Cloud Computing

Public Cloud Key Players (PaaS)

Force.com

Wyaworks

• GAE is a platform for creating web applications

• No server setup, config, or management

• Python & Java are the primary languages supported

o Python 2.7 and Java 6 runtimes

o Any JVM-based languages can be used (JRuby, Groovy, Scala, etc.)

o SpringMVC, Struts 2, most Python web frameworks (including Django)

Cloud Computing

Google App Engine

http://cloud.google.com/appengine/

Cloud Computing

Open Source OpenShift

• OpenShift is a platform for hosting PAAS

applications

• OpenShift Origin is an opensource implementation of a

PAAS product

• Can be installed and run

• http://www.openshift.org/

• Most commonly run within a Docker Container

Cloud Computing

RedHat OpenShift

• RedHat OpenShift is a platform for hosting PAAS

applications based on OpenShift Origin

• A hosted OpenShift implementation

• Create accounts, upload and install applications

• http://www.openshift.com/

• Many clients provide automated upload and deployments

Apps deployed with Git• Up to 3 applications• No Expiration• Apps with dependencies must

install "cartridges"• Add-ons to support features

Cloud Computing

OpenShift hosted Requirements

Pricing:Free

Requirements:

• Apps run within a secure sandbox environment

• Independent of hardware and OS physical locations

• Apps can read but not write to the file system• Must use provided services for persistence

• Apps may only respond to standard HTTP requests using

standard ports

• Code may only respond to web requests or scheduled tasks

• Cannot spawn subprocesses

Cloud Computing

PAAS Secure Sandbox

JEE (Jboss)

PHP

Python

Ruby

NodeJS

Drupal

WordPress

Cloud Computing

OpenShift Hosted Languages

• Current PaaS trends are toward development of more open solutions (non-framework specific)

• New PaaS clouds avoiding vendor lock-in

• The Open PaaS market has become competitive• Red Hat OpenShift

• VMWare Cloud Foundry

• OpenCloud CloudSwing

• DotCloud

Cloud Computing

Open PaaS

• A private PaaS is one in which can be deployed into your own private cloud

• Cloudbees, Cumulogic offer private PaaS clouds

• Private PaaS solutions can bring large companies with hundreds/thousands of developers onto a common platform

• Saves infrastructure costs, dev-time costs

Cloud Computing

Considering A Private PaaS

• Heroku is a polyglot aPaaS cloud application solution

• It is a multi-tenant hosting environment

• Developers create apps in Java, Clojure, Scala, Python, Ruby, Node.js

• Uses a command-line interface and git (decentralized revision control system) to deploy apps into the cloud

Cloud Computing

Heroku

• Cloud Foundry is an open source PaaS platform released under Apache License 2.0

• VMWare offers 3 products:• Cloudfoundry.com – a service providing online PaaS cloud

capabilities

• Cloudfoundry.org – a community where you can download the software for your own use

• Micro Cloudfoundry – a stand-alone version to locally develop solutions for deployment later

• Can develop Grails, Ruby, Java, Node.js

Cloud Computing

VMWare Cloud Foundry

• Another open cloud PaaSplatform running on Amazon EC2

• Easy deployment using Python admin and command-line tools

• Wide range of language/DB choices

• Free account sign up

Cloud Computing

DotCloud

How to distribute applications

• The cloud provides a very flexible runtime for applications– The application can be moved around and the

environment scaled automatically

• Developers will often use local tools for writing and editing code

– Applications can be tested locally (or remotely) but the process has to be simple and repeatable

• Git is a very common tool for distribution of applications

Why Git?

• Git has many advantages over earlier systems such as CVS and Subversion– More efficient, better workflow, etc.

– distributed nature is implicit

– See the literature for an extensive list of reasons

• Best competitor: Mercurial– Much less popular than Git

• Many cloud PAAS products are based on GIT for version control

Why Git?

• Linus Torvalds uses BitKeeper to manage Linux code

• Ran into BitKeeper licensing issue

– Liked functionality– Looked at CVS as how not to do things

• April 5, 2005 - Linus sends out email showing first version

• June 15, 2005 - Git used for Linux version control

Using Git

• Git is an application that must be installed• Git needs a repository

– git clone remoteurl

– git init myrepo

• Git stages and then commits your changes ( 2 steps) – git add *

– git commit -m "My change message"

• PaaS services have become numerous enough to create numerous subcategories of offerings

• Google App Engine provides numerous APIs for

application development

• Suffers from vendor lock-in

• Many new PaaS vendors are providing platforms

that support multiple languages and open solutions

to avoid vendor lock-in

Cloud Computing

Summary

• OpenShift & Python

• Now you do it!

Cloud Computing

Exercise 4

• OpenShift & WebApp Framework

• Now you do it!

Cloud Computing

Exercise 5

Module 5: Security Issues

The State of the CloudImplementing IaaSDistributed ProcessingThe PaaS ModelSecurity, Standards, and Governance

• Security Concerns

• Authentication Techniques

• Identity Access Management

• Infrastructure Security

• Tackling Compliance

Cloud Computing

Overview

• Cloud security is a responsibility of both the cloud provider and the client– The "cloud stack level" determines each role

• For example, AWS states that for EC2 they are responsible for:– Physical– Environmental– Virtualization

Cloud Computing

Cloud-based Security Issues

IaaSPaaS

SaaS Moving down in the services stack, the client becomes more and more responsible for security!

• The service models have similarities and differences regarding security requirements:

• SaaS – policy controls, user access to application resources

• PaaS – data security, data encryption, data regulatory issues (compliance)

• IaaS – Virtual machine security, physical andenvironmental controls

Cloud Computing

Service Models and Security

Cloud Computing

Cloud Security Alliance• CSA is an organization

made up of many corporate representatives

– Goal is to promote best security practices within the cloud

– Define areas for cloud architecture, governing in the cloud, operating in the cloud

Cloud Computing

CSA Critical Areas of FocusCloud Computing Architectural FrameworkGovernance & Enterprise Risk ManagementLegal & Electronic DiscoveryCompliance & AuditInformation Lifecycle ManagementPortability & InteroperabilityTraditional Security Business Continuity & Disaster RecoveryData Center OperationsIncident Response Notification, and RemediationApplication SecurityEncryption & Key ManagementIdentity & Access ManagementVirtualization

• The Cloud cube classifies clouds in 4-dimensions• Attempts to categorize clouds in order to assure better

security standards

Cloud Computing

The Cloud Cube Model

Proprietary Open

Perimeterized

De-perimeterized

External

Internal

Data

Software

Users

Mgmt

• Identity Access Management focuses on how users may access account resources

• Most vendors provide a proprietary interface to achieve this:

• Google - Google Provisioning API for App Engine/Apps

• AWS - uses their IAM service

• Microsoft - uses Windows Identity Foundation (WIF) API

with MS Forefront

• OpenStack - uses OpenStack Identity (Keystone)

Cloud Computing

Identity Access APIs

• Keystone roles:• Provide user management

• Keep track of what users can do

• Service Catalog• Provide a catalog of available services and their endpoints

Cloud Computing

OpenStack Identity (Keystone)

keystone user-list

keystone user-create --name sally --pass sally --email s@...

keystone tenant-create –-name AWCTenant

keystone role-create --name standard_user

keystone user-role-add --user sally --role standard_user

keystone service-list

• AWS provides several ways for clients to manage and limit access to account resources:

• AWS IAM (Identity Access and Management)

• Free service supports assigning individual username/pswds, access keys, MFA devices, temporary security credentials

• Provides complete API for programmatic security

• Key management, policy management, group management

• Multi-factor authentication (MFA)

• AWS IAM allows clients to grant credentials to individuals or groups

Cloud Computing

AWS Account Security

• Google's Provisioning API allows clients to create, update, delete user accounts, create security groups

• Uses REST-based URLs to perform operations

Cloud Computing

Google Provisioning API

Cloud Computing

Google Provisioning APIAppsPropertyService service = new AppsPropertyService("myAppName");GenericEntry entry = new GenericEntry();entry.addProperty("email", "bob@mydomain.com");entry.addProperty("password", "password");entry.addProperty("firstName", "Bob");entry.addProperty("lastName", "Smith");service.insert(

new URL("https://apps-apis.google.com/a/feeds/user/2.0/" + "mydomain"), entry);

URL feedUrl = new URL("https://apps-apis.google.com/a/feeds/user/2.0/" + "mydomain");

List<GenericEntry> allUsers = new ArrayList<GenericEntry>();while (feedUrl != null) {

GenericFeed feed = service.getFeed(feedUrl, GenericFeed.class);allUsers.addAll(feed.getEntries());feedUrl = (feed.getNextLink() == null) ? null :

new URL(feed.getNextLink().getHref());};

service.delete(new URL("https://apps-apis.google.com/a/feeds/user/2.0/" + "mydomain" + "/" + "bob@mydomain.com"));

• Many companies will not allow corporate data to be hosted in a public cloud

• Regardless of the vendor security promises, it is still corporate data and full control means ownership of the hosting

• As companies become more comfortable with hosting and with stronger SLA's hosting corporate applications and data will be more common

• Standards need to be accepted and corporate policies need to reflect the cloud's distributed nature

Cloud Computing

Data and the Cloud

• As a step toward company control of distributed data many companies have begun to enforce point of use encryption

• Information is encrypted before sent to a network location

• Information is decrypted locally before use• All information being passed and stored is encrypted

• A cost is paid for all the encryption/decryption cycles• The enforcement and acceptance of encryption

allows a wider use of distributed data

Cloud Computing

Encrypt and decrypt

• Large user-bases such as Google, Yahoo!, MSN, MySpace, Facebook, Twitter and others have all become identity platforms– Utilizing their login mechanisms removes the need for users to

keep registering for new services

– There are two types of authentications: delegated, federated

• Delegated Authentication uses the identity providers mechanism for authentication

– Ex: Facebook, Twitter

– Attempts to bring SSO to reality

– Twitter stores username, passwords on behalf of other sites

o OAuth

Cloud Computing

Authentication & SSO

• Federated Authentication– Users may use any authentication mechanism, as long as it is

compatible

– Decentralized

– Allows for any identity provider to supply credentials

– OpenID is best candidate for this implementation

• Differences between OpenID (Federated) and OAuth (Delegated)

Cloud Computing

Federated Authentication & OpenID

OpenID OAuthDecentralized CentralizedProvider may be Unknown Provider KnownShares Identity Only Shares Additional Data

Resources

Cloud Computing

How Does OAuth v2.0 Work?User Joe's Hardware

ShopFacebook

wishes to access resourcesrequests a temporary token

redirected to facebook login (if needed)

user logs in with Facebook (might be logged in already)

redirected to Joe's Hardware Shop

requests an access token

provided

finds out all of your secrets

provided

• A number of threats specific to virtualization technologies have been identified:

• Blind spots

• Inter-VM Attacks

• Trust Levels

• Instant-on Gaps

Cloud Computing

Virtualization Threats

• Blind Spots• Inability to "see" communications between VMs because it

resides within the software layer

• Inter-VM Attacks• A VM is successfully attacked by breaking out of its

isolation ("hyperjacking") attacking the hypervisor

• Hypervisor can be used to attack other VMs

Cloud Computing

Virtualization Threats

Hypervisor

VMVMVMVMBlind spot

Inter-VM

• Varying Trust Levels• Some servers host apps within VMs that contain mission-

critical data, while others host non-mission critical data

• Instant-on Gaps • Clouds allow for the provisioning / de-provisioning of VMs

• VMs may lie dormant for long periods

• These VMs may become "out-of-date" with respect to security updates

Cloud Computing

Virtualization Threats

• Due to the diversity of the services offered, securing of PaaS & SaaS environments is difficult

• Some companiesoffer solutions foraggregating SaaSservices through a proxy

Cloud Computing

Securing PaaS and SaaS Solutions

• Compliance comes down to who can view and see corporate data

• True compliance requires full control of data

• Google has fired employees for viewing Google App Engine client data

• Amazon assures clients that only few, necessary personnel can view the user organization's data

• Compliance standards:– Statement on Auditing Standards (SAS 70)

– Payment Card Industry Data Security Standards (PCI DSS)

– Health Insurance Portability and Accountability Act (HIPAA)

Cloud Computing

Tackling Compliance

• The top security concerns include the lack of proper identity management controls

• AWS IAM and Identity Management and others provide access to automated user provisioning capabilities

• Numerous virtualization threats pose potential problems with the VM / hypervisor model, but VMs can be self-protected

Cloud Computing

Summary

Course Summary

The State of the CloudImplementing IaaSThe PaaS ModelProviding SaaS SolutionsSecurity, Standards, and Governance

• Key Players (Cloud Providers)• Cloud Best Practices• Implementing Elasticity• Improving Availability / Reliability• Providing Failover• Orchestration Techniques• Automating Scalability• OpenStack• PaaS Subcategories• Using GAE

Cloud Computing

What Did We Learn?• Deploying into PaaS

Environments

• Open PaaS Providers

• Trends in SaaS

• Securing IaaS, PaaS, SaaS services

• Evolving Authentication Techniques

• Virtualization Security Threats

• Identity Access Mgt

Cloud Computing

Reference Sources

• Please take the time to fill out an evaluation

• All evaluations are read and considered

Cloud Computing

Evaluations

Questions

Appendix A:Python Supplemental

Cloud Computing

Introducing Python

• High-level programming language• Supports functional programming• Can be object-oriented• Automatic memory management• Dynamic typing• For this reason, it is often referred to as a scripting

language, much like Ruby, JavaScript, Perl, Tcl

Cloud Computing

Python VersionsVersionDate Comments

0.9 1991 Pre-1.0 release

1.3 1995

1.5 1999 Unicode support, list comprehensions 1.6 2000

2.0 20002.1 2001 New nested function scoping rules,

warnings added 2.2 2001 Declare classes as subclasses, super() added,

new rules for multiple inheritance 2.3 2002 Set class added, generators added

2.4 2004 Decorators added2.5 2006 Conditional expressions, try/except/finally

combo, with statement added2.6 2008 print(), more string formatting methods

2.7 2010 Last 2.x release, several 3.0 backported features3.0 2008 Not 2.x backward compatible, many updates

3.1 2009

Cloud Computing

Executing a Python Script

• There are 4 ways to execute Python scripts We'll explore each of these:

• From within the Python shell

• From the OS command-line

• As a shell script file or by double-clicking

• From within an IDE or interactive environment

Cloud Computing

How Scripts Run Under the Hood

• Python Scripts are not compiled• Scripts are translated into byte code• This is so they will execute faster than raw source files

• Byte code files end with .pyc extensions

• The PVM will run .pyc files if they exist, otherwise it will create the byte code and execute it at runtime

myscript.py PVM myscript.pycpython

Cloud Computing

Built-In Types

Value Descriptionint integerlong long integerfloat floating pointcomplex complex numbersbool Booleansstr stringsobject objectsfunction functionslist List sequences (arrays)tuple Tuple sequences (fixed arrays)dict Dictionaries (hashes)file File objects

This is a list of some of the built-in types in Python

Cloud Computing

Data Structures

• The primary types of Python data structures are:

• Sequences– Strings– Lists– Tuples

• Dictionaries

Cloud Computing

Lists• Lists are ordered sequences of objects

• Duplicates are allowed

my_list = []

my_list = [1, 3, 5]

my_list = [3.3, 'hello', Person()]

my_list = [3.3, 'hello', Person(), 3.3, Person()]

my_list = list('hello')

my_list = list()

Empty lists

someList = list(sequence)

Cloud Computing

List Manipulation• Lists can be concatenated

• Lists can be appended or inserted

• Access lists using index notation:

new_list = my_list + [1, 2, 3]

my_list.append('new value')

Makes a new list

my_list = [1, 2, 3]my_list.insert(1, 'hello')

Added to end of list

[1, 'hello', 2, 3]

print my_list[0] 1

Cloud Computing

Functions• Functions are commonly used within Python• Additional features are introduced in chapter 3

• Functions are defined as follows

def funcName(arg0, arg1, arg2, ..., argN):statementsreturn value

List of parameters must be supplied or () if noneReturn values are optional. A

value of 'None' is returned when a return statement is omitted

Function statements must be indented

Cloud Computing

Functions• Functions must be defined before they can

be called

def displayResults(customer, purchase_amount):print 'Customer: %s, amount: $%f.2' %

(customer['surname'], purchase_amount)

displayResults({'surname' : 'Smith'}, 108.2)

Customer: Smith, amount: $108.20

Cloud Computing

Modules

Modules are namespaces in Python

Physically, each .py file represents a module

Functions, variables, classes declared at the top of a module can be made available to other modules

These attributes can't be used until they are imported

Cloud Computing

Object-oriented Python

• Python features many object-oriented capabilities including inheritance, constructors, overriding, encapsulation

class Person(object):def __init__(self, name, age):

self.name = nameself.age = age

def display(self):print '%s is %i' % (self.name, self.age)

p1 = Person("Bob", 37)p1.display()print type(p1)

Cloud Computing

Constructors

• __init__() acts as the class constructor• __del__() that acts as a destructor, but these aren't

commonly used

class Person: def __init__(self, name, age):

self.name = nameself.age = age

def __str__(self): return self.name + " " + str(self.age)

p = Person('Bob', 37)

self is not automatically receivedIn Python, it must be explicitly provided

Without self here, a local and then global name will be soughtself is implicitly passed

Cloud Computing

Overloading Constructors

• No way to overload constructors in Python• Can implement type checking if neededclass Person:

def __init__(self, name='', age=0): self.age = age

if isinstance(name, str): self.name = name elif isinstance(name, dict):

self.name = name['name']self.age = name['age']

def __str__(self): return self.name + " " + str(self.age)

p1 = Person()p2 = Person('John')p3 = Person('Jim', 33)p4 = Person({'name': 'Sally', 'age' : 43})

Cloud Computing

Inheritance

• The format for inheritance in Python is:

class Subclass(Superclass):

class Employee(Person):def __init__(self, name, age, salary, dept):

Person.__init__(self, name, age)self.salary = salaryself.dept = dept

def __str__(self): return Person.__str__(self) +

'{0} {1}'.format(self.salary, self.dept)

e1 = Employee('Sally', 43, 75000.00, 'HR')

So, an Employee can inherit from Person as follows: