The History of our Logic Engine
Introduction
A huge part of implementing any process orientated platform, is designing the conditional logic which gets applied at programatic junctions. This is responsible for directing and controlling the flow of logic in an application, reacting dynamically to user or API interactions. The most effective way to implement such requirements is to employ a "Rules Engine" that can apply that conditionality in a standard, reusable way.
Business Rules as they apply to an evolving regulatory landscape, introduce a lot of diversity. In short, rules will change frequently and rapidly. Our goal was to make our implementation flexible and customizable, allowing the end-user to set rules without the need for any code modification or deployments, whilst we also wanted to ensure the performance met the SLA commitments we adhere to for clients. The rules engine in our SaaS application is designed to meet numerous functional use cases across the platform, from controlling the visibility of data fields within our UI to triggering conditional steps in our workflow engine. This means it is used frequently and cross cuts a lot of other functionality. In designing it, we knew It needed to work well.
Implementation
We define rules as a part of an API contract, which describes what gets displayed and how it should behave. Rules can be easily serialized and rendered via the UI query builder component. A Single Rule (one instance of an operand) is matched to a property on a page (referred to as a fieldName) and contains a specified value and operation (like equal, defined, in, etc.). Our rules support nesting with AND and OR conditions which make it both powerful and flexible. A sample rule is as follows:
{ "name": "High risk company", "description": "...", "logicalOperation": "AND", "operands": [ { "fieldName": "entityType", "operation": "equal", "value": ["Company"] }, { "logicalOperation": "OR", "operands": [ { "fieldName": "amlRisk", "operation": "equal", "value": ["High"] }, { "fieldName": "countryOfIncorporation", "operation": "select_any_in", "value": ["Iraq", "Siria"] } ] } ] }
We mostly work with C# as a development language, and picked the NRules open source library as an implementation. This library accepts rules definitions, data that the definition should be matched against and we then save the rule definitions in a database. Now we have all parts of the rules engine – the rules, the data, and the engine itself. It’s time to host it !
Initial architecture.
NRules library has been optimized to execute the rules at light speed using the rete matching algorithm. This is a pattern matching algorithm for implementing rule-based systems and was developed to efficiently apply rules or patterns to many objects, or facts, in a knowledge base. However it has one significant drawback - the rules compilation speed is quite slow. The time required to compile the rules increases exponentially with the growing number of the rules in the engine. With this in mind, we decided to host our engine API on the AWS Elastic Container Service, which would load and compile the rules once upon startup (or on-demand, when a new version of rules is available) and hold it in the memory as long as possible. The ECS would be exposed using Elastic Load Balancer, simple as that!
So How did it perform?
It worked very well until our number of tenants started to grow. The memory consumption of the ECS task was linearly increasing, and we knew that it would eventually hit the hardware limits. Horizontal scaling was not an easy option, because it would require a complex frontend API service that routes requests to the backend services that contain the specific rules for specific tenants.
Revised architecture and the problems it solved
The Fenergo SaaS platform makes extensive use of AWS serverless technologies, so our natural decision was to employ AWS lambda functions to handle the Rules Engine. Lambda can scale incredibly well offering a both high performance and low cost. It is however also stateless and the AWS runtime kills the container every few minutes, making the caching more difficult. We couldn’t simply poll for SQS messages in order to reload the rules on-demand, because of the horizontal scaling. To address both issues, we introduced ElastiCache (Redis) which would store the rules. We decided to use the in-memory cache, with a short TTL to accommodate for traffic peaks from single tenants.
To verify the scalability, we performed a number of performance tests for different traffic and rules counts using k6 (https://k6.io ) and the results were excellent:
Conclusions
Thanks to rich AWS monitoring we were able to identify bottlenecks and plan for resolutions. We have successfully migrated from Fargate into Lambda, leaving all the scaling to be handled as a Service by AWS which not only simplifies our operations but meets the performance requirements and greatly simplifies our deployments.