Circuit Breaker Pattern for Serverless Applications
Introduction
When you are developing micro-services, you might find that there are some dependencies which could fail or be temporarily unavailable. Examples of these dependencies are calls to other services, data stores, etc. These dependencies might be unavailable for multiple reasons such as maintenance, throttling, etc.
The idea behind the circuit breaker pattern is you wrap the dependency, so if there is a consistent failure, you can temporarily stop trying to call the dependant service and instead define an alternative fallback function.
Normally a circuit breaker can be in three different states:
- Closed: we call the dependency as usual. If there are failures, we track them and we open the circuit if a defined threshold is reached.
- Open: instead of calling the dependency, we call the fallback function. If we don’t define a fallback function, it will fail early, without calling the dependency. This can prevent overloading an unhealthy dependent service or data store. While the circuit breaker is OPEN, if we reach a timeout, we will change the status to HALF.
- Half (open): if the request is successful, the status will change to CLOSED and it will start processing all incoming requests. However, if the requests are still failing, the Circuit Breaker will change to OPEN.
The following diagram illustrates the different statuses and transitions:
Proposed Architecture for Serverless Applications
The Circuit Breaker pattern is a well know pattern and some popular libraries such as Polly https://github.com/App-vNext/Polly implement this pattern. In the implementation offered by Polly, the status of the circuit breaker is persisted in memory. This works well for single instance services, but if we are running multiple instances of the service concurrently, you need to share the status of the circuit breaker, so all the instances are consistently handling it. This applies for AWS lambda functions or ECS if running multiple nodes in parallel. For lambdas, you also have to accommodate for a scenario where serverless functions can get cold and will erase the memory. For that reason it is important to persist the status of the circuit breaker to a durable store.
Regarding the persistence, in order to have lower latency reading/updating the state, we decided to store the status of the circuit breaker in Amazon MemoryDB, which is based on Redis with persistent storage. Because we store a really small object to make the storage as performant as possible.we decided to use Amazon MemoryDB as it offered higher performance than DynamoDB for storing circuit breaker objects.
Based on those principles, the proposed architecture below illustrates where a lambda might have two circuit breakers to manage two separate dependencies and persist the status in Amazon MemoryDB. We can see a lambda which has two dependencies, Amazon ElasticSearch and an external API, so we can define one circuit breaker for each. In addition, we can also define a fallback function for each dependency, the failure handling process could be different depending on the dependency.
.NET 6 implementation
You can find the full source code of this implementation on my GitHub repository: https://github.com/Fenergo/circuitbreaker-dotnet.
First of all, you need to configure the dependency injection
{ services.Configure<ElastiCacheConfig>(configuration.GetSection("CircuitBreaker")); services.AddSingleton<ICircuitBreakerFactory, CircuitBreakerFactory>(); services.AddSingleton<ICircuitBreakerRepository, CircuitBreakerRepositoryElastiCache>(); }
For the persistence, we have a generic repository defined by the interface ICircuitBreakerRepository. Then we could implement other persistence storages if we want to. For this implementation, as it was discussed above, we are using MemoryDB, so the implementation of the repository is provided by the class CircuitBreakerRepositoryElastiCache.
In addition, we are injecting the class CircuitBreakerFactory, so we just need to have this dependency in the classes where we are using circuit breakers.
Then you can use it in your Web APIs, Lambdas, etc. For example in a Web API controller, we can it as follows:
public class MyController : ControllerBase { private readonly ICircuitBreakerFactory _circuitBreakerFactory; public MyController (ICircuitBreakerFactory circuitBreakerFactory) { _circuitBreakerFactory = circuitBreakerFactory; } public async Task<IActionResult> MyMethod([FromBody] RequestModel request) { var circuitBreakerOptions = new CircuitBreakerOptions<ResultModel>("circuit-breaker-id-123", () => asyncFunction(request)); var circuitBreaker = _circuitBreakerFactory.CreateCircuitBreaker<ResultModel>(circuitBreakerOptions); return await circuitBreaker.Fire(); } }
In the model CircuitBreakerOptions, we can configure the following properties:
- CircuitBreakerId (required): id for the circuit breaker. This value has to be unique, but we could also use the same value if for example we want to use the same circuit breaker in two places. (For example, if EventStore is down, it might affect projections and commands).
- Request (required): function to wrap under the circuit breaker.
- FailureThreshold: number of failure executions that should happen until the circuit is OPEN.
- SuccessThreshold: number of successful executions to change the status of the circuit breaker to CLOSED.
- Timeout: number of milliseconds to try a new request since the circuit breaker changed to OPEN.
- Fallback: alternative function to execute if the main function fails or the circuit breaker is OPEN.
On the other hand, the CircuitBreakerModel that we persist is defined by the following properties:
- Id: unique id to identify the circuit breaker
- Status: current status of the circuit breaker (open, closed or half)
- FailureCount: number of consecutive failure requests
- SuccessCount: number of consecutive successful requests
- NextAttempt. datetime when the circuit breaker is closed and we should try next time.
Conclusions
In this article we have seen the importance of managing failures properly in your applications with circuit breakers. In addition, we discussed about the reason why multi-node applications should persist the status of the circuit breakers, so all the instances can work consistently.
Finally, we have seen an implementation for .NET 6, which helps us to manage and persist our circuit breakers. The implementation is flexible and it allows us to define multiple circuit breakers and handle them independently. Furthermore, we can use MemoryDB as persistent storage or implement our own repository.