Monday, March 6, 2017

Blue-green deployment on AWS

1 Introduction

The purpose of this document is presenting a Blue-green upgrade approach for the a typical 3 tier application

1.1 The Challenge

There are couple downside of  traditional method of upgrading applications:
  • If the upgrade fails after smoke test, we must revert the server back to previous version of the application.
  • Customer can not access the environment during smoke test.
  • If the upgrade and the testing is complicated, it might need to take a long time to finish and exceed the SLA with the customers.

1.2 The blue-green upgrades

In order to minimize the deployment risk, time, cost and the stress on our system administrators, we could use the Blue/Green deployment method.

The following diagram shows the basic anatomy of a blue/green deployment architecture. There are two environments, the blue one is currently used in production, the green one is being upgraded. Our goal is to transition users over to the green one in a gradual controlled fashion.
Once we bring the green environment up, we can validate the new software (smoke test) before going live. Then, we start shifting traffic away from the blue environment and send it to the green one. We can use weighted DNS resolution to push traffic gradually to the green environment or revert traffic back to the blue environment in case of issues.

2 The Connect Application

2.1 High level view

The following diagram illustrate different layers for a typical 3 tier applicaton


There are four layers:
  • Presentation Layer is standard web browser.
  • Web Layer
    • SpringBoot web server that manages the creation of the web interface
    • Elasticache is used to cache user session information.
  • API/business Layer:
    • SpringBoot application server  provides the business logic and workflows.
  • Data-persistence Layer:
    • NoSQL DB (i.e mongoDB Atlas) provides the logics needed to store and query data.

2.2 The physical architecture

For the production implementation, we have to add high availability and support services to the application. The Web and App server must has at least one server over two availability zones. in the following diagram. The yellow boxes are the boxes has the software we need to upgrade regularly.

3 The Blue-green upgrade approach

3.1 Try to decouple the schema changes and the code changes if possible

When we perform the blue/green upgrade, we are planning to have both environments connect to the data source (mongoDB) simultaneously. This way will greatly simplify the deployment process.

In case we need to account for schema changes, which should be rarely on using MongoDB, we need to decouple the schema changes from the code changes.

For example:
  • Database updates are backward compatible
  • Code code changes are backward compatible with the old schema.

3.2 Plan for exceptions

If for a release that we can not decouple the schema changes form the code changes, we need to use the traditional method, having a downtime to allow us to deploy the update.

We will use the same process on the deployment, except we need to block the access from the users.

3.3 The deployment architecture

The following diagram illustrate the infrastructure for the blue/green upgrade for the Connect-project. In this setup, we will create scripts on the Rundeck server to orchestrate the upgrade.
.

3.4 Process flow during upgrade

Detail steps on blue/green upgrades:
  1. On the initial deployment, we will create two identical environment for the application. We will designate a tag for each component to indicate it is belonged to blue or green environment.
  2. One out of the two environments will be shutdown when it is not needed.
  3. During the deployment,
    1. The deployment team will kick off a Rundeck job to:
      1. Bring up the green environment but not routing traffic to it.
      2. Deploy new app (jar file) to the web and app  EC2 instance. T
      3. Inform QA to perform smoke test
    2. QA team will perform smoke test with the internal Load balancer end point for the green environment.
    3. When smoke test completed and succeeded, the support team will kickoff a 2nd Rundeck job to:
      1. Switch over the DNS to point to the new green environment or use the weighted DNS routing capabilities of Amazon Route 53 to gradually switch a set percentage of the traffic over to the green environment.
      2. Modify the tag in Green environment to Blue and Blue environment to Green.
      3. Shut down the Green environment (the old Blue environment)
    4. If anything goes wrong, we can use a Rundeck job to switch back the Route 53 record to the old environment.

4 Reference

4.1 How fast we want to transition the traffic?

That depends on a few factors:
  • The time to Live (TTL) of your DNS records: you want to give the DNS system enough time to propagate the new changes, and also give your users enough time for the DNS cached values to expire. Badly behaved clients may ignore your TTL altogether and continue sending traffic to the old environment for a much longer time.
  • If you use Auto Scaling, you should give your green environment enough time to scale out to accommodate the increasing traffic.
  • Similarly, if you use Elastic Load Balancing (ELB), and you route a very large volume of traffic to a new load balancer in a short space of time, it will not give the new load balancer enough time to scale out. Pre-warming your load balancer ensures that it is sized to handle the amount of traffic that you are expecting to receive, rather than the amount of traffic it is currently receiving. You can contact AWS using the support options available in your Management Console to request pre-warming.

1 comment:

  1. This blog very useful to the users. I need say thanks and Keep update with more information. And, more knowledge on AWS Online Training Hyderabad

    ReplyDelete