Key-Value Stores

Key-value stores are specialized NoSQL databases. Essentially, they are dictionaries¹.

Database Properties ¶

ACID is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc:

Atomicity: Either all statements are applied or none. This is important for SELECTs.
Consistency: A transaction brings the DB from one valid state to another; see also eventual consistency
Isolation: concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially
Durability: once a transaction has been committed, it will remain committed even in the case of a system failure

Please notice that Atomicity, Consistency and Isolation are not relevant in many cases where you use a key-value store as you only have GET / SET for one key.

Database Features ¶

Embedded or client/server: An embedded database is not visible to remote servers. It is embedded in the current process (which can, of course, make it visible).
Type System: Enforcing types takes time and gives guarantees. Some databases have very simple type systems (e.g. SQLite) and some offer more (e.g. MySQL)
Availability / Failover: Systems fail. The machine which contains your database could simply be plugged out. How do you deal with that?
- Backup options: If this is relevant at all depends very much what you use it for. If the key-value store is used as a cache, then backup options are probably not necessary. If it is used to store configuration, then it likely is.
- Replication: Having another machine which replicates the data is the only way to make sure that the system is available, even if the main machine breaks. See Redis Replication.
Scalability: Suppose you want to store more than you can do on the hardware of a single machine. Do you have to buy a better machine (scale vertically) or is it possible to have another (cheap) machine and run the datase on two machines in a distributed way (scale horizontally)? This is done via sharding.
Users: Having a user system and different databases in the same database management system is nice, because it allows centralisation of the service for different teams at the same company. Then somebody takes care of the database being up / being backed up and others just get a user.

Use Cases ¶

Configuration: Suppose you have a product which makes use of micro services. In some cases, those micro services need some alignment in form of configuration.

Caches: Web services can contain all sorts of computations which might need longer than acceptable. Caching / pre-calculating those are a common approach for this problem.

Other Benchmarks ¶

MySQL Performance on Amazon EC2, 2007:

EC2 small: 227 reads/s; 116 read+writes / s
EC2 large: 430 reads/s; 310 read+writes / s
EC2 xlarge: 630 reads/s; 483 read+writes / s

sysbench:

$ mysql
> create database sbtest;
$ sysbench oltp_read_write --db-driver=mysql --mysql-db=dbtest --mysql-user=root --mysql-password=YOUR_PASSWORD prepare
$ sysbench oltp_read_write --db-driver=mysql --mysql-db=dbtest --threads=16 --events=100000 --mysql-user=root --mysql-password=YOUR_PASSWORD run

$ sudo -u postgres psql
postgres=# CREATE DATABASE sbtest;
postgres=# grant all privileges on database sbtest to example_user;
$ sysbench oltp_read_write --db-driver=pgsql --pgsql-db=sbtest --pgsql-user=example_user --pgsql-password=example_password prepare
$ sysbench oltp_read_write --db-driver=pgsql --pgsql-db=sbtest --threads=16 --events=100000 --pgsql-user=example_user --pgsql-password=example_password run

On my Thinkpad T460p I get the following numbers. However, I'm not certain how valuable they are as they fluctuate quite a bit between consecutive runs.

Database	Transactions	Queries	Latency
Database	Transactions	Queries	Min	AVG	Max	95th percentile
MariaDB 15.1	484.81 / s	9701.13 / s	14.25ms	32.61ms	137.37ms	53.85ms
Postgres 10.10	457.44 / s	9644.38 / s	5.44ms	34.90ms	2052.58ms	32.53ms

Features ¶

	SQLite	MariaDB	Postgresql	Redis	Memcached
Server	✗	✔	✔	✔	✔
Users	✗	✔	✔	✗	✗

Serverless Databases ¶

Serverless Database is a fancy name for a hosted database. It is also called DBaaS - database as a service. You don't have to worry about the underlying machine; you pay for the usage. This usually means:

GET
PUT
Total storage in the database

You should not have to worry about:

Backups
Scaling it up

Databases which offer this:

Benchmark ¶

This is not finished. You can find the code on Github.

If you don't make the key column a primary key, MariaDB is quite a bit faster.

Database	Bulk Write	Batched Write	Bulk Read	Read Latency - Percentile
Database	Bulk Write	Batched Write	Bulk Read	25%	50%	95%	99%
dict	2 639 859.62 inserts/s	-	-	0μs	0μs	1μs	2μs
SQLite (in-memory)	259 834.14 inserts/s	27 151.94 inserts/s	69 163.37 selects/s	32μs	32μs	36μs	60μs
SQLite	193 930.10 inserts/s	6181.23 inserts/s	61 012.26 selects/s	34μs	34μs	38μs	62μs
Memcached	131 635.98 inserts/s	123 937.91 inserts/s	57 191.03 selects/s	25μs	26μs	38μs	51μs
Redis	5103.39 inserts/s	40 494.49 inserts/s	TODO selects/s	52μs	54μs	80μs	97μs
MariaDB (MyISAM)	3412.18 inserts/s, 30 266.34^🌞	4 885.09 inserts/s	41 432.73 selects/s	32μs	33μs	36μs	60μs
MariaDB (Aria)	2546.49 inserts/s, 19 708.31^🌞	4309.30 inserts/s	41 305.08 selects/s	33μs	33μs	36μs	57μs
PostgreSQL	1767.12 inserts/s	3529.08 inserts/s	37 732.27 selects/s	32μs	33μs	36μs	59μs
MariaDB (InnoDB)	1152.20 inserts/s ^*, 14 679.97^🌞	643.56 inserts/s	29 405.90 selects/s	33μs	33μs	36μs	53μs

For some DBs, batching improved the bulk insert quite a bit. I guess this is because this way I can avoid swapping. Maybe changing the batch-size from 1000 to higher numbers additionally increases the throughput.

Footnotes ¶

Dictionaries are a fundamental data structure in Python. They are called associative arrays in PHP and hashtable in Java. ↩

Key-Value Stores

Database Properties ¶

Database Features ¶

Use Cases ¶

Other Benchmarks ¶

Features ¶

Serverless Databases ¶

Benchmark ¶

See also ¶

Footnotes ¶

Published

Category

Tags

Contact