Sourced case study

20
DevOps: A Software Architect’s Perspective Chapter: A Continuous Deployment Pipeline for Enterprise Organizations By Len Bass, Ingo Weber, Liming Zhu with John Painter of Sourced Group Daniel Hand of Daedalian

description

A case study about a Continuous Deployment Pipeline for the book DevOps: A Software Architect's Perspective

Transcript of Sourced case study

Page 1: Sourced case study

DevOps: A Software Architect’s Perspective Chapter: A Continuous Deployment Pipeline for Enterprise Organizations

By Len Bass, Ingo Weber, Liming Zhu with John Painter of Sourced Group Daniel Hand of Daedalian

Page 2: Sourced case study

This is a chapter of a book we are releasing one chapter at a time in order to get

feedback. Information about this book and pointers to the other chapters can be found

at http://ssrg.nicta.com.au/projects/devops_book. We would like feedback about what

we got right, what we got wrong, and what we left out.

We have changed the book organization from previously announced. The new table of

contents is:

Part I - Background

1. What is devops?

2. The cloud as a platform

3. Operations

Part II Deployment Pipeline

4. Overall architecture

5. Build/Test

6. Deployment

Part III Cross Cutting Concerns

7. Monitoring

8. Security and Security Audits

9. ilities

10. Business Considerations

Part IV Case studies

We envision a collection of case studies. One has been released.

11. Case Study: Rafter Inc (business continuity)

This chapter is another case study.

Part V The future of DevOps

14. Operations as a process

15 Implications of industry trends on DevOps

Page 3: Sourced case study

Case Study: A Continuous Delivery Pipeline for Enterprise organizations

1. Introduction Sourced Group is an enterprise consulting organization, founded in 2010, that grew from a set of individuals with financial services backgrounds. Currently, Sourced Group’s team comprises of engineers that fall into one of two core skill areas, Data Management (databases and data warehousing) or Solutions Architecture and Automation. This case study introduces the Continuous Delivery Pipeline (CDP) reference architecture developed and refined by Sourced Group while working with leading Australian enterprise organizations within the financial services, media, telecommunications and aviation sectors. We discuss the different facets to deploying a CDP within an enterprise: the organizational context, the CDP itself, and how security is managed.

2. Organizational Context A key aspect of this chapter is that Sourced Group is a consulting organization. As such, they not only need to deal with the culture and organization of their customers but they also need to identify and train personnel who will be responsible for the CDP after the engagement is completed. Figure 1 presents the organizational structure that is typical of a Sourced Group engagement. Notice the “CDP Onboarding Group”. This is the group that will be responsible for the CDP after Sourced Group’s engagement is complete. Education and knowledge transfer is key to the success of adopting any new technology. To this end, Sourced Group organizes an onboarding team consisting of virtual or seconded members of the CDP group early in an engagement. The charge of the on-boarding group is to help DevOps teams develop or migrate their existing applications onto the CDP. In addition to smoothing the transition of the deployment process, the on-boarding team provides real-time feedback to the continuous delivery team from the digital portfolio managers and their application teams. Feedback is typically a mixture of platform requirements and consumer feedback. In the type of enterprises that are Sourced Group’s customers, there are multiple development teams, each with their own projects, skill set, and utilization of the cloud for their applications. This organizational diversity complicates the task of introducing a CDP.

Page 4: Sourced case study

Furthermore, it is common that an enterprise organization successfully conducts one or more small application pilot projects on a cloud platform, with varying degrees of human involvement or automation. The organization then sees operational complexity and cost rise as cloud native features such as on demand infrastructure and auto-scaling are not directly compatible with traditional toolsets and operational approaches. These pilot projects are a good entry point for the introduction of the CDP. See Chapter: Business Considerations for a general discussion of rolling out DevOps practices. The effort required to design, implement and train an organization around a new CDP depends on a number of factors, including the existence of one or more incumbent platforms, availability of personnel, skills and prior experience, the degree to which the customer’s environment is regulated or constrained by one or a number of governing bodies e.g. Australian Prudential Regulatory Authority (APRA), Payment Card Industry (PCI) and the support and commitment from the wider business. Figure 1: Project Team Structure describes the teams and stakeholders of a typical CDP implementation project. In addition to the onboarding team, observe the role of Security Operations. Since Security Operations (SecOps) already has interactions with the development teams and since security is such an essential element of any enterprise, they become a natural adjunct to the introduction of a CDP. As a result, SecOps are always a key stakeholder throughout each engagement. As we discussed in Chapter: Security and Security Audits, SecOps have a challenging role to fulfill and almost without exception cause new initiatives to move forward more slowly than they otherwise would. The introduction of an enterprise-wide CDP causes SecOps to be concerned about maintaining security. The use of a common CDP not only maintains the current security mechanisms but in most cases significantly improves on them, in part because the CDP is able to enforce policy and procedure and reduce both the number of privileged users and systems. Although SecOps, typically, begins with questioning the CDP, with time and experience, they tend to become enthusiastic advocates. Partially, this is because their concerns are automatically incorporated in the builds as we shall see when we discuss CloudFormation merging.

Page 5: Sourced case study

Figure 1: Project Team Structure

Executive sponsorship and cross portfolio project management are also critical during each engagement. Enterprise-level deployment of any technology needs senior executive sponsorship and an experienced program manager to navigate internal challenges, organize resources, mediate between teams, assist with stakeholder management and generally ensure smooth delivery. Again, we discuss these issues more generally in Chapter: Business Considerations. It is a common scenario that at the start of a CDP implementation engagement, each application DevOps team manages the delivery and support of their respective CDP. In order to bring economies of scale, ownership of common tools and technologies is assigned to groups and each group is responsible for the support of one or more tools enterprise wide. The typical set of tools used are Splunk, Atlassian Confluence, Sonatype Nexus, Atlassian Bamboo, Atlassian Stash and Atlassian JIRA. The tool teams are populated from members of existing DevOps teams. Ideally, the members have the desire to specialize and go deep on the respective technology and either the skills or desire to develop the skills necessary to support an Enterprise wide platform.

3. The Continuous Delivery Pipeline The CDP provides a standardized method for an enterprise to manage the lifecycle of an application. Larger enterprises, particularly those in the financial sectors, have very strong risk management frameworks and are generally willing to trade small amounts of agility in exchange for assurance and risk reduction. Standardization of the application lifecycle provides reduces risk by isolating change into feature branches, testing that change, and providing a moderated path into production.

Page 6: Sourced case study

3.1 CDP Tooling The CDP requires two basic elements, a source repository and the Continuous integration/deployment (CD) tool. A wide range of open source, hosted, and commercial git source repositories exist in the market but enterprises generally have regulatory concerns or IP protection requirements that dictates the use of behind-the-firewall solutions such as Atlassian Stash or GitHub Enterprise. Sourced Group uses Atlassian Bamboo as the CD tool of choice and it forms the backbone of the CDP. Atlassian Bamboo offers several key features that are important to an enterprise CDP such as tight integration against ticketing and audit trails, but the unique branch management system is critical to the objectives of the CDP. A core tenant of the CDP is standardization of the application lifecycle to provide assurance. That lifecycle is defined via a plan that is then executed against the source control systems. In order to provide assurance it is essential that the same lifecycle (plan) is run against a feature or testing branch as would be run against the mainline or integration branch. Atlassian Bamboo achieves this by building branch awareness into their plans, allowing you run copies of the plan against an arbitrary number of source branches. The branches are automatically detected and cleaned, further reducing human effort. Figure 2 shows the use of the virtual plan based on branches.

Figure 2 – Virtual CDP plan for each branch

Page 7: Sourced case study

This is in contrast to many CD systems that manage each branch as a different plan. As demonstrated in Figure 3 this creates point of administration and inevitably leads to drift between the plans and the loss of standardization and assurance.

Figure 3 – Many CD tools need discrete plans for each branch, leading to drift

While not critical to the success of the CDP it is very common to see a ticket management solution such as Atlassian JIRA. This greatly improves the cross referencing of code changes against issues or feature requests that have been logged by the business and can go as far as the automated generation of release notes based on the tickets that were closed in the latest application build. While the CDP reference architecture can be adapted to multiple public and private cloud environments, for the purposes of this chapter Amazon Web Services (AWS) is the target platform. The remainder of this case study assumes a foundational understanding of a number of AWS services and products. Readers should refer to Chapter: The Cloud as a Platform for general cloud concepts and http://aws.amazon.com/ for AWS specific information. Atlassian Bamboo does not natively interact with AWS so the CDP utilizes a commercial plugin “Tasks for AWS” (https://marketplace.atlassian.com/plugins/net.utoolity.atlassian.bamboo.tasks-for-aws) to interface with AWS. Figure 4 shows how the tools that are used within the CDP interact. JIRA information is fed both to Atlassian Stash and to Atlassian Bamboo. Stash provides application and configuration source code for Bamboo to build images which it then deploys.

Page 8: Sourced case study

Figure 4 – The complete CDP toolset

3.3 The application lifecycle The primary technical outcome of the CDP is the standardization of the application lifecycle. The application lifecycle can be broken down into 5 main phases, shown in Figure 5.

1. Testing – Perform functional code testing and produce application artifacts. We do not discuss testing in this chapter but see Chapter: build and Test for a discussion on testing and testing strategies.

2. Baking – Bootstrap the application artifacts and configuration onto a temporary target operating system, then “bake” the image by taking a snapshot, or in AWS specifics, an AMI.

3. Deployment – Deploy a new independent “stack” of the application by launching copies of the snapshot (AMI) and supporting infrastructure such as load balancing and DNS.

4. Release – Release the new independent stack by moving the floating DNS entry from the existing stack to the new stack. Prior to release the new stack is modified to match the capacity and scale of the existing stack to ensure continuation of service when rolling out. This stage optionally includes patching or modification of persistent data.

5. Teardown – Once all traffic has been moved to the new stack the existing stack is torn down as it is no longer required. It is prudent to perform safety checks on the stack prior to teardown to ensure that all traffic has been moved away from the environment, or you are not tearing down the running production stack.

Page 9: Sourced case study

Figure 5. The steps involved in each phase of the CDP The standardized application lifecycle is applied to the source control system, with the ability to run the lifecycle against any arbitrary commit on any arbitrary branch. The coupling of a standardized application lifecycle and source control system provides a high degree of assurance to both the application developers and the business. If the lifecycle successfully runs in a lower order branch, such as a feature branch or a testing branch, then you have some assurance that the change will be successful in a higher order branch such as production. Application developers branch the source code and apply the standard application lifecycle target via the CDP. The CDP will build discrete and independent environments for that feature branch, allowing for individual experimentation without interrupting other feature development. This non-blocking independent environment approach is one of the foundations of the platform. Large and experimental features can run in parallel with small changes, while still being held to the same standards. Developers are given clear guidelines on what constitutes a complete, releasable feature and the business gets a real-time view of the progress of a sprint or individual features. The general developers role is then simplified to three functions:

Branch for the feature and develop Commit and validate a functional artifact and environment (preferably

with as much automated testing as possible) Pull request the feature branch into the higher order branch.

A lead developer is then responsible for reviewing the build results for that feature branch, visiting the validation environment and accepting it into the sprint branch. Once the sprint is complete the team lead then submits the entire block of change to the testing team. Smaller teams, or teams with less governance and risk management requirements may choose to simplify this model to simply merge feature branches straight into the UAT (staging environment), or if they are very confident and feature exceptional testing, features branches into production. The use of “higher order branches” other than production is very specific to each enterprise, but the CDP generally promotes fewer branches over more. Enterprises generally have large, potentially geographically dispersed teams with the majority of teams adopting a “sprint” methodology to software development. Due to regulatory and risk requirements it is usually necessary to pass all changes through a separate testing team prior to release. The CDP caters for that by provisioning discrete non-blocking environments for each branch, allowing development works to continue while an independent team completes testing. Development works are not blocked for long periods of time awaiting testing and signoff, and while the time between feature submission and release may be longer, the overall velocity of the project is not impacted.

Page 10: Sourced case study

Figure 6 shows the flow of features into production. Independent branches can be deployed independently.

Figure 6: Releasing features independently

3.4.CloudFormation The CDP depends heavily on Amazon’s CloudFormation which provides the ability to completely define a virtual environment, resources and security components using a configuration file written in JSON,. Environments can be replicated consistently and benefit from the same level of unit testing commonplace with application code. If a particular resource request fails, possibly due to a misconfiguration, the stack is simply torn down saving cost for the customer. Developing and maintaining Amazon CloudFormation templates for a small number of applications works well. However, as the number of applications increases it becomes increasingly inefficient to update each template. AWS frequently releases updates and best practices that users want to take advantage of. However, introducing change to the templatescan be time consuming and complex, as the different code bases must be managed. Sourced Group addresses this shortcoming by managing a common set of generic application and operationally focused templates that are dynamically fused together at bake or deploy time. This ensures that changes and updates are effectively permeated throughout all environments under the control of the CDP. Another benefit of providing a centrally managed set of templates is it significantly reduces the time and effort required by teams to add a new

Page 11: Sourced case study

application to the CDP. This is particularly important when a team has little or no prior experience deploying to the CDP as it removes much of the heavy lifting. Figure 7 shows the CloudFormation merging process. Independent templates are merged in this process to create a single template that is used in the baking process. Some of the individual groups include the perimeter security group (responsible for preventing external access to the organization’s network), the network security group (responsible for security within the network inside of the perimeter) and the applications group. These templates are merged in a priority fashion so that an applications group cannot override settings specified by one of the operations templates. Operations templates are stored in a Git repository and applications templates are stored in a different Git repository. This provides greater levels of control and separation. Atlassian Bamboo merges the various templates into a single CloudFormation template suing a priority ordering that we discuss in Section 4: Security.

Figure 7: The Cloud Formation Merging Process

3.5 Phases of the standardized application lifecycle We repeat Figure 5 to provide context for the discussion of the phases of the lifecycle.

3.5.1 Baking

Page 12: Sourced case study

The baking stage creates a self contained image (AMI) of the application onto a disposable server. The server is “baked” into an AWS AMI which can then be sent to any global region, and passed to AWS autoscaling to deploy. As discussed in chapter: Build and Test, there is varying levels of baking that can be implemented. The CDP creates heavily baked images that are passed to AWS autoscaling for deployment without the involvement of any other processes or systems. The AMI is immutable and directly attached to a commit in the git repository leading to a very high degree of visibility and confidence as to the contents of the AMI. Git now forms the single point of truth not just of the source code, but of all known running copies of the application, as they were simply a bake of that source code at given commit (point in time).

While this leads to organizations managing larger numbers of AMIs, it has a number of benefits including, simplification of the deployment and ongoing operation support of the application, reduced boot times on scale events, reduced failures during the boot process and increased consistency. The image requires no further bootstrapping.

The baking process uses CloudFormation to launch a vanilla or predefined enterprise SOE AMI, such as Red Hat Enterprise Linux v7 or Microsoft Windows server 2012. The instance independently bootstraps the application using the AWS cloudformation bootstrapping system cfn-init. The instillation of packages, files, and service management is handled via the cfn-init agent. cfn-init is declarative, so it either completes all items successfully or it will signal a failure. This provides a high level of assurance that the configuration is correct and the application correctly installed. If this signal is not received after a specified timeout then the instances is destroyed and the build is failed. If success is signaled the AMI api is called via the Tasks for AWS plugin to create the new “baked” AMI from the running host, and the host is then destroyed as it is no longer needed.

Each tier of the application is a separate image. If a business application consists of three tiers, a web tier, an application tier and a database tier, Sourced Group pre-bakes three distinct AMIs, one for each tier.

Page 13: Sourced case study

Figure 8: The Continuous Delivery Pipeline (CDP) Baking Process

Figure 8 illustrates each of the steps in the baking process. These are:

1. Create an instance that will be the base for the baked image. This is the builder instance.

2. Load the necessary software into that instance 3. Save a copy of the image 4. Tag the image with appropriate identifying information 5. Destroy and clean up the builder instance.

3.5.2 Deploy The deployment phase of the lifecycle has become very simple due to the use of baking in the previous stage. The baked AMI is simply supplied to AWS auto-scaling which then launches the requested number of minimum instances. There is no node registration or configuration required as it was bootstrapped during the bake stage. The scaled instances are provided with an IAM role to allow secure access to other AWS services such as S3. Because the role is part of the independent stack and defined in the application repository any changes to the role are tracked and audited via the source control and deployment systems. This creates secure IAM credentials without having to pass plain text strings or configuration files to the instance. The IAM profile is also directly associated with that stack, so if a breach were to occur the stack can simply be rolled to refresh the entire IAM environments.

3.5.3 Release One of the most vital parts of the CDP is the release stage. Unlike the previous stages, the release stage impacts current production systems and as such is always a manually triggered stage. Although the process itself is automated, the CDP will pause at this step and await for a gatekeeper to manually move to the next step in

Page 14: Sourced case study

the pipeline. The use of a manual trigger at this point also allows indefinite time for further testing and validation prior to release. Figure 9 shows the adjustment of the local DNS to move a set of AMIs from one state to another. Note that the AWS DNS service is called Route 53. At this point in the process we now have two application stacks, the current production stack which we refer to as the “Green” or active stack and the newly built stack or the “Red” inactive stack. During the release stage our intention is to roll the traffic from the current green stack to the red stack.

Figure 9. Application release diagram Application release is achieved in an automated fashion, using features of Route 53 and CloudFormation. Unlike previous stages, where a new CloudFormation stack was created, during release, an existing CloudFormation stack is updated. An AWS Bamboo plugin allows us to achieve this through the “update stack if already exists” feature. With this feature enabled, Bamboo will attempt to create a CloudFormation stack. If a stack is found with an identical name then Bamboo will update it with the new JSON. In previous stages of the CDP the build number is used in the CloudFormation stack names to create unique resources per build. In the release case the build number is omitted. This allows an environment to be controlled by a single stack controlling the DNS record, a stack which is always updated and not re-created per build. With the DNS entry controlled via CloudFormation a release becomes simply a CloudFormation stack update. Figure 10 shows a sample CloudFromation release script. "Resources" : { "Route53DNSRecord" : { "Type" : "AWS::Route53::RecordSet", "Properties" : { "HostedZoneName" : mydomain.com., "Comment" : "Application DNS Record", "Name" : “ myapplication.”, "Type" : "CNAME", "TTL" : "10", "ResourceRecords" : “myapplication-build-4.mydomain.com“

Page 15: Sourced case study

} } } Figure 10. Example release CloudFormation By its very nature, the release stage is the riskiest stage of the CDP as it has a direct impact on the production application. We use a number of techniques to reduce this risk, the most valuable being traffic matching or pre-warming.

Traffic Matching Once an application stack is confirmed ready for production release, a number of steps still need to be taken to ensure a smooth transition between stacks. When moving traffic from the current production stack to the release candidate stack, we are moving traffic from a busy warm environment to a cold unscaled environment. If not managed correctly, this can have a detrimental impact on performance and may result in an application outage. Therefore, it is important that we pre-warm (scale-up) the new environment prior to rolling live traffic. By utilizing AWS API’s we can programmatically check the number of healthy instances currently associated to the production load balancer(s). Obtaining this information allows us to scale the new environment to match the same number of instances currently required in production. This is achieved by adjusting the desired instance property on the relevant EC2 Auto Scaling group. It is equally important to ensure the newly added instances move to an “InService” state on the load balancer prior to rolling the traffic. Depending on the EC2 Auto Scaling Group settings, the new environment may begin to scale back down due to low traffic, so its imperative that this task be undertaken immediately prior to rolling the traffic. If the application being deployed utilizes an instance based file or memory cache system, it can be advantageous to pre-warm the cache prior to traffic rolling. This will ensure the first users of the application are not hindered by a cache creation process.

Rollback Utilizing Route 53 and CloudFormation for releases simplifies the rollback process. After releasing a new stack into production, we keep the previous stack up and running until all post release testing is complete. During this time if an issue is detected with the new release, rolling back to the old environment is a simple and automated process. Rollback can be manually initiated via Bamboo, in this case, Bamboo will be make another stack update request to CloudFormation to update the DNS record. The stack update will point the DNS record back to the previous build. During this process, it is again essential that traffic/load matching takes place. The previous production stack will have likely scaled down as it was not actively receiving load. The CDP needs to re-warm the previous production stack and roll traffic back. Essentially this is just a reversal of the release process.

3.5.4 Teardown One of the main goals of the CDP is to simplify the creation of new application stacks. This simplicity often results in the rapid consumption of AWS resources with new

Page 16: Sourced case study

stacks now being created at the click of a button. To ensure AWS costs are kept to acceptable levels it is essential that application stacks be torn down when no longer required. In most enterprises this is achieved through a combination of team responsibility and a set of automated enforced rules. Development teams become responsible for ensuring their environments are torn down and cleaned up after use. Teardown is the final stage of the CDP and is again manually triggered. Bamboo provides a visual indication of active environments, as they will still appear to be in progress, having not completed their final stage. Once triggered, the teardown stage is responsible for removing any non-persisted resources. In most cases this is achieved via a delete stack request to CloudFormation. A set of automated tasks can help to ensure unused stacks are removed. · Shutdown of all non-production environments during non-working hours. · Compulsory tear down of all non-production stacks over weekend and

holiday periods. To avoid tearing down a stack that is currently in use, the CDP is configured to verify the stack is not receiving traffic prior to tear down. CloudFormation stack tagging is also essential to this process to ensure only non-production stacks are affected. If the application stack includes an S3 bucket the CDP must first remove all objects from the bucket prior to teardown. CloudFormation will only permit the teardown of empty S3 buckets to ensure that data is not inadvertently destroyed.

3.6 Managing State The final component of the CDP is a meta-application storage tier. It provides a highly available, consistent and dependable storage service to store resource state, for example whether an Amazon Elastic Compute (Ec2) instance contains state, as would be the case if a resource is a database server. One particularly important use case of the data store is during the deployment update stage, during which stateless components such as web and application servers should be updated but not database servers. Sourced Group uses Amazon’s Dynamo DB http://aws.amazon.com/dynamodb/ to manage resource state. It provides a fully managed NoSQL data store that is highly available, distributed, secure, consistent low-latency performance.

4. Baking Security into the Foundations of the CD Pipeline Security is addressed at a foundational level within the CDP, both in terms of operation of the pipeline itself as well as the resources associated with the applications it manages. Chapter: Security and Security Audits discusses security in some detail. In this section our focus is the separation of duties with Amazon CloudFormation and authentication and authorization using Amazon Identity and Access Management (IAM).

Page 17: Sourced case study

4.1 Implementing Separation of Duties with Amazon CloudFormation As we previously mentioned when discussing CloudFormation, enforcing separation of duties between network transport, network security, operations and applications groups is achieved using a combination of Amazon CloudFormation templates and a destructive (overriding) merging process. This is even more important than providing an efficient baking or deployment process. The network transport group and network security groups are responsible for inter datacenter connectivity and perimeter networking and security respectively. They maintain Amazon CloudFormation templates that define the environment in which the CDP manages resources. For example, the network transport group’s Amazon CloudFormation templates relate to Amazon Virtual Private Clouds (VPCs), peering connections and routing tables. In contrast, the network security groups are concerned with VPC subnets, security groups and Access Control Lists (ACLs). Operations maintain Amazon CloudFormation templates that effectively implement best practices around host and resource logging, effective use of Amazon availability zones, resource tagging, etc. Similarly, each application group develops and maintains an application specific Amazon CloudFormation template. These templates focus on application health checks, auto-scale rules, trigger and thresholds. When deploying a particular application, begin with the application template and overlay the operational template, which effectively masks or overrides any unprivileged configuration settings specified by the applications group. The merge is destructive with a bias towards operations, thereby enforcing best practices and standards. For example, in the event that an application owner specifies that all application components should reside in a single availability zone, maybe due to a desire to minimize latency, an operational Amazon CloudFormation template will override the configuration and place resources in multiple availability zones to increase system availability. Each Amazon CloudFormation template is maintained in Atlassian Stash. This allows control over who can merge changes into the main production branch and provides detailed logs for auditing purposes.

4.2 Identity and Access Management There are a number of areas where identity and access management, need to be considered within a CDP. The first area is that of host system administration. When hosts are managed outside of a CDP, it is common and generally necessary for administrators to be able to log in or issue remote commands. For example, if an application server stops providing a service, an administrator may log onto the host, investigate the problem and potentially restart the failed service. When you can deploy an environment efficiently and reliably, logging onto hosts becomes unnecessary and can actually lead to increased resolution times.

Page 18: Sourced case study

Removing the administrator access reduces the overall attack surface and removes the need to manage Secure Shell (SSH) public and private keys. Amazon IAM roles, profiles and policies are used extensively to restrict access and permissions of EC2 instances, users and AWS services such as the Amazon Simple Storage Service (S3). For a detailed explanation of how each of these services, consult http://aws.amazon.com/security/, the relevant product documentation and the operational and security checklists located at http://aws.amazon.com/whitepapers/. Where possible, EC2 instances that require privileged access to AWS resources such as objects located in an S3 bucket and Amazon Simple Messaging Queues (SQS), have an IAM role associated with them. This allows access permissions to be managed centrally and revoked instantly if required. IAM roles are the preferred means to restrict access to sensitive AWS resources whenever possible. Applications that do not provide support for IAM roles and require the IAM credentials (access key and secret key) are embedded into the host and have the keys rotated weekly. This is an automatic process and is monitored and tracked. Furthermore, IAM credentials (access key and secret key) are rotated upon deployment of each stack. This restricts the impact of a security breach to a single stack and prevents the same credentials being used to gain access to resources belonging to another application and environment.

5. More sophisticated concepts In this chapter we have sketched the design of Sourced Group’s CDP. We now mention several items that are beyond the basic design but still must be considered when installing a CDP.

5.1 Master Pipeline Construction of a generic pipeline that handles common types of resource configuration provides a basis for a development team utilizing the CDP. In actuality, however, a development team will clone the generic pipeline and then modify it for their specific application requirements. This introduces the problem of propagating changes to the generic pipeline to all of its clones. Sourced Group uses a Continuous Delivery Hierarchy system. Heavy lifting is done by generic components and so an application requires a trivial amount of code to get started.

5 .2 Minimising Drift Between Production and Non-Production Environments A common issue in the enterprise is long running environments that continually drift away from production. Sourced Group solves this problem by refreshing all aspects of the non-production environment against production snapshots for each and every change.

Page 19: Sourced case study

Non production environments utilize the nightly snapshots of persistent data stores in production. Resources are created on-demand from the snapshots This allows development/testing/integration against persistent data stores that are only a few hours behind production.

5.3 Working around provider limitations Some of the common cloud platforms impose hard limits on the ratio or number of resources that you may consume. AWS imposes a hard limit on the number of security groups per VPC, currently 200. This limit impacts the ability to scale the CD techniques to the boarder enterprise. In response to those challenges one client built out an “auto-scaled VPC model”. VPC’s are dynamically created and deregistered in response to exhaustion or availability of security group per VPC. The end result is an “auto-scaled” number of application delivery VPC’s that scales in and out as per security group requirements (or any arbitrary resource that was the lowest common denominator). Note that the VPCs are destroyed for cost minimization when they are no longer required. The high-level process flow;

1. Receive stack build command from user 2. Calculate how many Security Groups the stack requires 3. Check the first application delivery VPC for the availability of the

requisites number of Security Groups a. IF pass -> provision application b. IF fail -> check the next VPC (filo model) c. IF all current VPC’s exhausted call the VPC build job to build new

application delivery VPC. Required details are retrieved from Configuration Management Database (CMDB).

4. Deploy application stack. Register the stack into the CMDB against the requisite VPC ID

5. Janitor process continually polls the running VPC for active applications. If no running applications exist terminate the VPC container and relevant components.

This process depends on the ability to dynamically provision IP address space and features such as peered VPC. In an interconnected enterprise environment it is essential that backhaul data center connections can be managed exclusively via an API, with solutions such as AWS direct connect.

5.3 Vendor Lock In Source Group’s CDP is heavily tied into AWS CloudFormation and therefore brings with it migration challenges to other platforms e.g. VMware, OpenStack, CloudFoundry. This is due to lack of standards at that layer between platforms. In general, migrating from one cloud provider to another is complicated not only by inconsistent APIs but also but the use of different techniques to address the same problem. For example, AWS uses Security Groups whereas VMware uses roles and permissions.

Page 20: Sourced case study

6. Summary Sourced Group has been assisting enterprises in installing a Continuous Delivery Pipeline for multiple years. Their pipeline is built around a 5 phase view of the lifecycle: test, bake, deploy, release, and teardown. Each of these phases relies on a set of tools that are utilized to automate the process. Security is a major concern within the enterprise and integrating the security operations group into the adoption process involves first having technical solutions that they realize will help them such as CloudFormation templates and secondly involving them in the roll out process.