Releasing Frequently to Production

David McMahon

Introduction

Deploying code to production automatically, often, and quickly is a principle we follow for our Fenergo SaaS CLM Platform. It brings with it many benefits, but also demands that a certain set of standards are followed to ensure it is done successfully.

Continuously Deploying for Continuous Value

The reason we deploy to our production environments often is to release value (via enhancements and new features) to our customers as soon as it becomes ready. The architectural decisions we made around how our platform is implemented, facilitate the release of independent autonomous enhancements to parts of the application instead of broad holistic full release. The larger the release, with a high volume of updates, require longer delivery cycle which package multiple changes. Clients get no value from a feature until it is first available, and then actively being used by them, and there is no opportunity for iterative improvements which come from making tighter, smaller, incremental improvements.

Once we have working software in production and in the hands of our clients, we have the opportunity to monitor, observe, refine and take feedback. That allows us to improve our features and create a more impactful product for our customers. To get to this position we follow a number of principles in order to achieve successful frequent releases to production:

Automated Deployments

All of our deployments are fully automated, requiring no manual steps to release any feature to production. We use Pulumi for IaC (Infrastructure as Code) and ensure that everything is maintained in source control so that every deployment is auditable and repeatable. Not only does automation speed up deployments, but it also greatly reduces the
risk of human error and produces stable, predictable deployments.

Automated Testing

To support confidence and ensure the correctness of a release, we employ a suite of automated testing as part of the build process. Along with standard unit and integration testing built into every pipeline, we also ensure that API and UI tests run as part of every deployment. So testing is executed at a code level but also at a consumer experience level. These tests are structured to mirror the acceptance criteria we have written to describe the correct functionality of all our features. A release cannot proceed in the build pipeline unless the automated
test gate passes, so new and existing functionality is proved to be working as expected.

Feature Flags

When we release our software, it is deployed into a Multi-Tenant region so All our clients are working on the exact same codebase, differentiated only by their own unique configuration.
When new features are released, we can control the availability of that feature per-tenant, using a configurable Feature Flag. All features are built with a feature flag and this strategy gives us multiple benefits.

Decoupling deployment and release

As we are following trunk based development, feature flags allow us to continuous release our main branch for any micro service to production without the fear of releasing a feature that is not ready to be used. This enables engineers to work on a short lived feature branch which avoids problematic large merges back into main. The idea is that our Master branch can ALWAYS be pushed to production because it is in a constant correct state.

Incremental Rollouts

Feature flags enable us to roll out features in an incremental fashion. The process we follow when releasing a feature involves deploying the code to production with the flag set initially as disabled. We then enable the flag for a small number of internal tenants and users to monitor the usage for a period of time. Once we are happy that the feature is operating as expected we can then enable the flag for a broader range of non production tenants. In the event of issue being identified, it is far less impactful to make minor code adjustments or updates at this point, before finally releasing to all production tenants. Between each step in this rollout we will observe, monitor and refine, ensuring that the feature is working as expected and has integrated into our infrastructure as designed.

Monitoring and Observing

During development, testing, incremental roll out and post full release, we are constantly observing and monitoring our features. As we are building and running on AWS, we utilise features such as X-Ray https://aws.amazon.com/xray and Cloudwatch https://aws.amazon.com/cloudwatch to ensure that we have full visibility of how our components are operating within our infrastructure. Telemetry is a first class citizen of our engineering culture. We monitor across numerous metrics, including latency, throughput, memory, CPU utilisation and error rates. We then integrate these metrics with alerting through PagerDuty to ensure that we have a full picture of how our services are operating and how our customers are consuming them. We can then use this data to decide where to focus our efforts and identify what areas need to be optimised.

Summary

Through all these efforts we have achieved a stable, resilient and dependable CI/CD pipeline which pushes value to our clients at the earliest possible time but also dovetails seamlessly into our agile design and build processes. We have achieved an enviable release velocity which allows us to push updates to production hundreds of times a month with minimal impact or risk. Comparing that to traditional software release cycles that might only update core capability a handful of times per year and the benefits to our users are plain to see.