MemoryDB: a strongly consistent Redis

Somehow I slept on MemoryDB, which launched in 2021, but thanks to Marc Brooker I caught up on Amazon's recent MemoryDB paper.

1. What is it?

MemoryDB fixes a couple problems using Redis on the critical path of a production system.

Redis cluster data loss during leader failover. With MemoryDB, you can use an API-compatible Redis as a primary database without building your own (complex, expensive, probably buggy) infrastructure to detect and fix data loss during failover.

Production performance impact and cost implications of backups. MemoryDB offloads snapshotting. Production workload performance is not impacted by backups. And you can provision data nodes more efficiently with less memory, where with Redis you may need to overprovision to avoid swap usage during backups.

Bonus: for workloads that don't read most keys, you can run smaller data tiering instances that offload unused keys from memory to SSD.

2. How does it work?

tl;dr: by redirecting the standard Redis replication stream to a hardened (Amazon internal) transaction log service and making it synchronous, effectively decoupling Redis's famously fast in-memory execution engine from Redis's infamously droppy storage and replacing the latter.

Redis commands and leader elections all go through the transaction log's conditional append API, which affords several nice performance and correctness benefits explained in the paper.

The paper is pretty approachable to the practitioner. I recommend checking it out, at least to skim it.

3. What's it gonna cost me?

Performance: it depends

By my reading of the performance validation, for modest workloads, there is no difference compared to stock OSS Redis. When 10-node clusters were pushed, the benchmark showed:

Throughput:
- MemoryDB had 50% higher max read throughput, due to optimizing IO thread/TCP connection management
- 38% lower max write throughput; this is the tradeoff in committing each write
Latency: MemoryDB still gets sub-millisecond latency for read-heavy workloads. It's up to 4ms slower (tail latency) on write workloads–still single digit milliseconds or better in all workload distributions.

Pricing: yes, like before plus writes are tolled now

Three dimensions:

Data node instance hours. Pick your region and size memory and replication to taste.
Writes: $.20/GB (all regions)
Snapshot storage: $0.020-0.024/GB-month (varies by region)

#1 and #3 will be similar to self-managed Redis or ElastiCache, but it's worth noting you don't need to overprovision instance memory for backup execution and, for write-heavy workloads, you have the option to use smaller data tiering instances that offload data from memory to SSD.

#2 is where you pay for consistency.

Development, Operations: nothing?

For developers who run dependencies locally for development and CI tests (hi), I don't see any reason you couldn't just run OSS Redis and provision MemoryDB in shared cloud environments.

4. Conclusion

I could think of real-time, blazing performance use cases for this. I'm curious what the uptake has been since it launched in 2021.

YMMV. Be sure to analyze writes cost and write latency for write-heavy workloads. Twenty cents per GB and a few milliseconds per write almost has to be cheaper than hydrating and reconciling Redis from a separate storage, though.