Dropping the cost of DynamoDB

Alberto Corrales Garcia

Introduction

DynamoDB is the most popular NoSql database on AWS. It allows you to easily build highly performant applications, taking care of the good practices, such as: encryption, access control, high-availability, backups, etc.

DynamoDB allows you to configure two types of capacity allocation: provisioned and on-demand, and it defines the capacity with read capacity units (RCU) and write capacity units (WCU). One RCU is one strongly consistent read or two eventually consistent reads for a data block of up to 4 KB. On the other hand, one WCU represents write request for a data block of up to 1 KB.

When it comes to pricing, it is important to understand these two modes and pick the one that fits better for each case, so we can have tables that auto-scale without over-provisioning.

In this post, we will see a real case, where picking the right mode helped to drastically reduce the cost of using DynamoDB.

Provisioned mode

With provisioned capacity, you can configure the maximum RCU ans WCU of the table and this usage will be charged hourly. In addition, you have to configure RCU and WCU for the indexes of the table.

When you use provisioned mode, if the application exceeds the defined limits, the exceeding requests will be throttled. From the user perspective, it could be reflected as write or read requests being temporary denied, which is not a good user experience.

Normally the traffic of a table tends to be variable, for example high during working hours and low during non-working hours. Or we might have spikes of traffic for more demanding processes like a data migration.

When you use provisioned mode and the traffic of the table is variable, you might have to configure a high limit for RCUs and WCUs, so you prevent rejecting requests. The problem with this approach, is the fact that you would be over-provisioning and paying for a high capacity that is not being used. In addition, planning the capacity needed for each table and index of the table might be challenging, because it can change over time.

In order to try to make the provisioned mode less rigid, DynamoDB allows you to configure auto-scaling policies. These policies allow you to configure values for min ans max capacity and the percentage of target utilization that will be trigger the alarm to scale up.

However, there are two important limitations with this approach:

You still need to have a minimum capacity and pay for it, even if you are not using it.
When DynamoDB scales up, it reacts very slowly, so you will end up having throttling for RCUs and WCUs. In this article we can see good examples of this, where the dynamodb is scaling up too late, so there will cause throttling.

On-Demand mode

With on-demand capacity, pricing is based on the amount of read and write request units the application consumes throughout the month. There are no up-front charges for this mode and your DynamoDB capacity will auto-scale following these rules.

The most interesting part of on-demand is the fact that when you create a new table with on-demand mode, you will serve up to 4,000 write request units and 12,000 read request units, but you don’t have to pay provisioning that capacity and your workload won’t be throttled for using this capacity. For this reason, on-demand is the best mode to work with frequently unpredictable workloads and it is what AWS recommends for these scenarios.

Taking the best of provisioned and on-demand

On-demand mode in DynamoDB can help reduce the cost for tables, so you only pay for the requests performed, instead of over-provisioning by using fixed RCUs or WCUs. However, when you are using on-demand mode, you cannot set any limits on WCUs or RCUs to prevent unfair usage of the table. This can be a risk as a table serving a constant high load, might lead to a huge increment in the cost.

Being able to establish limits is one of the good features about provisioned mode. For example, imagine you want to limit a table with 100 RCUs and 150 RCUs, but don’t want to pay for that capacity if you are not using it. In addition you don’t want to have throttling when you go from 0 RCUs to 100 RCUs. In this scenario you can use a hybrid model, in Fenergo we normally work with on-demand mode, but we establish limits for RCUs and WCUs, so if we exceed one of those limits, we change the mode of the table to provisioned to start applying throttling. This will restrict the traffic for the affected DynamoDB table and can prevent unfair usage of it.

The proposed architecture is represented by the following diagram:

In the diagram, you can see that each of our tables is configured as on-demand. We define AWS CloudWatch alarms and when an alarm is triggered, AWS sends an event to the default bus of AWS EventBridge. We can then subscribe to those alarms and react accordingly to them. In order to be able to filter out these alerts, we defined a concrete naming convention, so in the EventBridge target, we only subscribe to alarms which starts with dynamodb-threshold-alarm#:

dynamodb-threshold-alarm#{TableName}#{AlertName}

In addition, from the alarm name, we know the name of the table that triggered the alarm and the alarm type (for example table-rcu, index-wcu, etc).

When we receive any of these alarms, we trigger a step function, which will switch the mode of the table from on-demand to provisioned. In order to establish the limits for each table, we store those values in a DynamoDB table.

In a step function, we have setup the logic to perform the mode update. When we change from on-demand to provisioned, we will have to keep the provisioned mode for 24 hours, since DynamoDB only allows one switch every 24 hours. After that, we can switch back to on-demand. We also need to check if we already have an update in-progress for a given table. In that case, we will skip the flow.

Updating from on-demand to provisioned shouldn’t happen too often, Its a mechanism to proactively throttle unusual high demand. If a given table was switching too frequently, it may be that the limits established are too restrictive, so these should be reviewed. To maintain visibility in Fenergo we send the same alerts to our email or any provider we are using to manage incidents, like PagerDuty so we can make sure that those reviews are happening as well as the ongoing enhancements.

Cost reduction results

When we starting using DynamoDB, we used provisioned mode by default, since there was a requirement for setting up usage limits in our DynamoDB tables. We work with a large number of DynamoDB tables as we have independent tables for each micro-service. Further still, for each tenant, the data is stored in a different table for segregation and security purposes. In order to prevent throttling, we have to set high thresholds for RCUs and WCUs, especially in those tables with more traffic and if these thresholds are balanced with expected usage the switch should not be needed. However managing a large number of clients and their activity on the platform is noting if not sometimes unpredictable. As you can see in the below diagram, as we are adding new clients and micro-services, the cost of having DynamoDB started growing more and more between April and May.

In June, we starting progressively applying the approach presented in this article to some of the tables. As a result, we started to see how the cost was decreasing.

In July and August, we already had applied the new approach for most of our DynamoDB tables and the cost went flat, in spite of the fact that we continued to add more tables for new clients.

Conclusions

In this article, we have seen the different modes that DynamoDB offers, and how they can impact your budget and costs. We looked at the approach we follow here in Fenergo to reduce the cost of DynamoDB, by leveraging the best of provisioned and on-demand modes.

In the cost analysis, you can see that we are very effectively managing the cost of DynamoDB, which will allow us to keep growing as a platform.

In the future, it would be a great improvement if AWS implements a built-in feature to establish limits for on-demand mode, so we don’t have to switch to provisioned mode. Because it makes sense to use a pay-as-you-go model, but it would we be better if you could set limits. I’ve sent this feedback to AWS, so I hope they consider adding something like that.

Finally, this approach may not fit every scenario, so it is important to understand how your data stores are being used and how DynamoDB pricing works, so you can apply the best strategy for your case.