Winds of Change in Web Data

Published in

ITNEXT

8 min readDec 15, 2021

We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time

The words of T.S. Eliot are certainly a mantra that the tech industry lives by. Every now and again, new trends sweep through our industry just to evolve into something very similar to what we had before, except now with a renewed sense of possibilities.

Not long ago there wasn’t much thought given to which OSS database you would choose for operational data: if you wanted to have a server for clients to connect to, you would either pick Postgres or MySQL. If you had a simpler application that needed some local structured data you would go with the convenience and low setup costs of SQLite.

But since we shall not cease from exploration, we collectively left the safe harbor of SQL behind in the search of new models and new scale out architecture that would fit the needs of webscale applications. But as the tide ebbs, it also flows. And it is increasingly common to find developers concluding that with recent changes in hardware, business landscape and developers preferences, Postgres is a fine and safe choice in almost any situation.

But what about SQLite? Does an embedded and simple database still merit a place in our journey? In this article I will explore some of the winds of change that pave the way for how to think about databases in 2022 and onwards, and introduce ChiselStore: an embeddable distributed zero-conf SQLite library that is very likely, good enough for you.

The changing landscape

So, what are the shifts that warrant a look back to traditional solutions?

Size

The first big change is just the sheer size of today’s hardware. Modern storage devices are not getting any smaller: with a click of a button, you can, on AWS, spawn a server with 30 TB of attached storage.

When the exploration into distributed databases started, it was mainly driven by the need of host data that wouldn’t fit in a single server. And for sure, such use cases still exist in specialized domains. But having worked closely to Big Data customers for almost a decade, I could count on the fingers of a single hand how many of them really even crossed the 30TB mark. And once you step out of the specialized Big Data world back to web applications, it is likely that all of your data can either fit into a single server or in just a handful, in which case basic sharding techniques can be used.

Distribution still matters, but instead of being an enabler for data volumes, additional nodes are needed to provide fault-tolerance and proximity to users for better latencies.

Performance

Those big storage devices are also not getting any slower: NVMe devices are now becoming ubiquitous, followed by changes in OS interfaces and APIs designed to take advantage of them. In my experience, most people don’t truly realize how fast a modern NVMe device is, and what are the practical consequences of that for databases. For a large portion of developers, when you say “database”, the first mental image that pops up is that of something limited by the speed of your storage. After all, isn’t the job of a database to store data?

Double-digit million operations per second is the new normal in the storage world.

That is, of course, until the combination of your device and software stack can do 10 million requests per second in a single core. You can now start a 64-core box with the click of a button (that has 30TB of attached storage), meaning that as your single core issues 10 million requests to your NVMe device, the other 63 are free to do whatever your heart desires.

The first immediate consequence is that databases are now limited by CPU power. But another interesting consequence is that compute and database nodes are bound to merge: there is no more justification to see compute and storage as separate entities with different access patterns and characteristics when there is nothing special about storage nodes anymore.

As a matter of fact, a basic version of SQLite, without much tuning effort on our part, can do north of 500k read requests/s with meager 16 threads. If the stories of other fellow sailors are to be believed, with some effort those numbers can be much higher, up to 4 million reads in a single server

[OVERALL], ThreadCount, 16
[OVERALL], RunTime(ms), 17656
[OVERALL], Throughput(ops/sec), 566379.70

Writes are slower, and indeed, much slower:

[OVERALL], ThreadCount, 1
[OVERALL], RunTime(ms), 38478
[OVERALL], Throughput(ops/sec), 2598.88

But read-mostly use cases are very common for web applications, and as we’ll see soon, that’s not the end of the story for writes.

The deployment model

The third architectural shift is the spread of Kubernetes, and the mindset change that comes with it. Before Kubernetes, database nodes were permanent: be them metal boxes or virtual machines, once it is up it will stay attached to that node forever. But in the Kubernetes world things are by their nature ephemeral. Databases born before the Kubernetes era employ all sorts of tricks to bind resources whenever possible and mimic their old ways.

But the ones that embrace this new reality behave differently: they are distributed by nature, and should a node die for one reason or another, it is not maintained: it is simply replaced. A snapshot is streamed from an object store, and the delta of the log is replayed. If that sounds crazy to you, remember those servers are likely to have 100Gbps links and storage write speeds of dozens of GB per second.

There are other consequences of adopting designs like that as well: nodes could be run in memory, since they will be destroyed on a failure anyway.

For cases in which data fits in memory (let’s keep in mind that “fits in memory” in 2021 can easily mean half a Terabyte), sqlite is much faster:

[OVERALL], ThreadCount, 1
[OVERALL], RunTime(ms), 2678
[OVERALL], Throughput(ops/sec), 37341.29

Still, memory is much too expensive, and storage is increasingly cheap. But that leaves the door opened for an alternative design: Just don’t fsync: the storage can be used as a way to index large amounts of data, and yet the whole thing seen as ephemeral.

Disabling fsync yields, for our benchmark:

[OVERALL], ThreadCount, 1
[OVERALL], RunTime(ms), 8795
[OVERALL], Throughput(ops/sec), 11370.09

Kubernetes is also just the tip of the iceberg. Still dominant in cloud-oriented scenarios, web applications are usually fully serverless in today’s environments and have their application code running on edge computing locations, through edge functions. Players like Vercel and Netlify captured hearts and minds of web developers with this model, and the embedability of SQLite makes it a strong contender for innovative edge-based data deployments.

Black boxing distributed consensus

A final shift, which is often overlooked, is the commoditization of Raft. A while back, distributed consensus belonged in the realm of the dark arts, where a handful of the initiated could devise clever ways of provide any kind of guarantee in a distributed system. Now, there are Raft libraries for virtually any language.

ChiselStore

Seeing the trends above unfold is what led me to found ChiselStrike. And while the company is itself still in private beta and not a lot is known, we are today Open Sourcing a central component of this story: ChiselStore.

In a nutshell, ChiselStore wraps a sqlite database in a Raft Library. We’re not the first ones to combine those two pieces of technology. Most notably, we are aware of rqlite and dqlite.

rqlite is a great implementation of a distributed database using SQLite +Raft. But it is designed to be used as a standalone database. We believe that there is a lot of power to be unleashed by an embedded version, that can be used in a zero-conf way much like SQLite itself in edge computing scenarios.

dqlite matches ChiselStore conceptually very much, but the C implementation turned us away. Aside from being an embedded distributed database, ChiselStore is written in Rust using the LittleRaft library. Despite LittleRaft being a nascent library, we are very bullish on Rust and know by experience that its touted safety advantages are not a fluke.

It’s true that SQLite itself is also written in C. But at the same time, it is perhaps the most widely deployed database system in existence running on mobile phones, browsers, and various other devices, so at that point we’ll just let that one slip.

Distributing SQLite with Raft and Rust

ChiselStore is a zero-conf database library that you can embed in your application much like you would embed SQLite. With ChiselStore, your application executes SQL statements, such as CREATE TABLE, INSERT or SELECT, using the library API. Internally, ChiselStore uses the Raft consensus protocol to replicate the SQL statements to all nodes in the cluster, which apply the statements to their local SQLite instance.

Raft guarantees that all of the SQLite instances in the cluster have identical contents, which allows the cluster to keep operating even if some of the nodes become unavailable. As a consequence of the usage of Raft, ChiselStore provides strong consistency (linearizability) by default. SQL statements on a cluster of ChiselStore appear to execute as if there is only one copy of the data because SQL statements execute on the Raft cluster leader node.

Relaxed reads

Strong consistency is expensive to achieve. However, often times in web applications it is acceptable for reads to lag relative to the write leader. For those scenarios, ChiselStore provides a relaxed reads consistency mode, allowing clients to perform read operations on the local node. This mode is great for situations where availability is more important and the application has to serve reads even if the node is disconnected from the rest of the distributed database.

As we can see, using relaxed reads the performance for reads is very close to what SQLite provides natively

[OVERALL], ThreadCount, 16
[OVERALL], RunTime(ms), 17856
[OVERALL], Throughput(ops/sec), 560035.8

Limitations

ChiselStore is in its infancy and currently not suitable for production use. In particular, LittleRaft lacks support for Raft snapshots and joint consensus. That is, the replicated log of SQL statements is never truncated, and it is not possible for nodes to join and leave a cluster dynamically.

Those are limitations of the LittleRaft library, that we believe should be lifted soon.

Summary

With recent hardware advances and shifts in industry trends, often times we find ourselves in the same place we started our journey as far as databases go: SQL is just good enough.

While Postgres is a common and safe choice, in some scenarios, like at the edge, SQLite can be a strong contender. The problems of how to have a globally distributed application on top of SQLite are tackled by ChiselStore.

ChiselStore provides an easy way to replicate data close to endpoints in a serverless environment for many use cases. It has comparable read performance to SQLite when relaxed reads are being used.

We are currently working on integrating ChiselStore with our serverless runtime as an optional, zero-config database. If you want to hear more about what we’re doing at ChiselStrike for serverless & edge, please contact us.

ITNEXT

Winds of Change in Web Data

The changing landscape

ChiselStore

Distributing SQLite with Raft and Rust

Relaxed reads

Summary

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in ITNEXT

Written by Glauber Costa

No responses yet