ACID is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc:
- Atomicity: Either all statements are applied or none. This is important for SELECTs.
- Consistency: A transaction brings the DB from one valid state to another; see also eventual consistency
- Isolation: concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially
- Durability: once a transaction has been committed, it will remain committed even in the case of a system failure
Please notice that Atomicity, Consistency and Isolation are not relevant in many cases where you use a key-value store as you only have GET / SET for one key.
- Embedded or client/server: An embedded database is not visible to remote servers. It is embedded in the current process (which can, of course, make it visible).
- Type System: Enforcing types takes time and gives guarantees. Some databases have very simple type systems (e.g. SQLite) and some offer more (e.g. MySQL)
- Availability / Failover: Systems fail. The machine which contains your database could
simply be plugged out. How do you deal with that?
- Backup options: If this is relevant at all depends very much what you use it for. If the key-value store is used as a cache, then backup options are probably not necessary. If it is used to store configuration, then it likely is.
- Replication: Having another machine which replicates the data is the only way to make sure that the system is available, even if the main machine breaks. See Redis Replication.
- Scalability: Suppose you want to store more than you can do on the hardware of a single machine. Do you have to buy a better machine (scale vertically) or is it possible to have another (cheap) machine and run the datase on two machines in a distributed way (scale horizontally)? This is done via sharding.
- Users: Having a user system and different databases in the same database management system is nice, because it allows centralisation of the service for different teams at the same company. Then somebody takes care of the database being up / being backed up and others just get a user.
Configuration: Suppose you have a product which makes use of micro services. In some cases, those micro services need some alignment in form of configuration.
Caches: Web services can contain all sorts of computations which might need longer than acceptable. Caching / pre-calculating those are a common approach for this problem.
- EC2 small: 227 reads/s; 116 read+writes / s
- EC2 large: 430 reads/s; 310 read+writes / s
- EC2 xlarge: 630 reads/s; 483 read+writes / s
$ mysql > create database sbtest; $ sysbench oltp_read_write --db-driver=mysql --mysql-db=dbtest --mysql-user=root --mysql-password=YOUR_PASSWORD prepare $ sysbench oltp_read_write --db-driver=mysql --mysql-db=dbtest --threads=16 --events=100000 --mysql-user=root --mysql-password=YOUR_PASSWORD run $ sudo -u postgres psql postgres=# CREATE DATABASE sbtest; postgres=# grant all privileges on database sbtest to example_user; $ sysbench oltp_read_write --db-driver=pgsql --pgsql-db=sbtest --pgsql-user=example_user --pgsql-password=example_password prepare $ sysbench oltp_read_write --db-driver=pgsql --pgsql-db=sbtest --threads=16 --events=100000 --pgsql-user=example_user --pgsql-password=example_password run
On my Thinkpad T460p I get the following numbers. However, I'm not certain how valuable they are as they fluctuate quite a bit between consecutive runs.
|MariaDB 15.1||484.81 / s||9701.13 / s||14.25ms||32.61ms||137.37ms||53.85ms|
|Postgres 10.10||457.44 / s||9644.38 / s||5.44ms||34.90ms||2052.58ms||32.53ms|
Serverless Database is a fancy name for a hosted database. It is also called DBaaS - database as a service. You don't have to worry about the underlying machine; you pay for the usage. This usually means:
- Total storage in the database
You should not have to worry about:
- Scaling it up
Databases which offer this:
This is not finished. You can find the code on Github.
If you don't make the
key column a primary key, MariaDB is quite a
|Database||Bulk Write||Batched Write||Bulk Read||Read Latency - Percentile|
|dict||2 639 859.62 inserts/s||-||-||0μs||0μs||1μs||2μs|
|SQLite (in-memory)||259 834.14 inserts/s||27 151.94 inserts/s||69 163.37 selects/s||32μs||32μs||36μs||60μs|
|SQLite||193 930.10 inserts/s||6181.23 inserts/s||61 012.26 selects/s||34μs||34μs||38μs||62μs|
|Memcached||131 635.98 inserts/s||123 937.91 inserts/s||57 191.03 selects/s||25μs||26μs||38μs||51μs|
|Redis||5103.39 inserts/s||40 494.49 inserts/s||TODO selects/s||52μs||54μs||80μs||97μs|
|MariaDB (MyISAM)||3412.18 inserts/s,
|4 885.09 inserts/s||41 432.73 selects/s||32μs||33μs||36μs||60μs|
|MariaDB (Aria)||2546.49 inserts/s,
|4309.30 inserts/s||41 305.08 selects/s||33μs||33μs||36μs||57μs|
|PostgreSQL||1767.12 inserts/s||3529.08 inserts/s||37 732.27 selects/s||32μs||33μs||36μs||59μs|
|MariaDB (InnoDB)||1152.20 inserts/s *,
|643.56 inserts/s||29 405.90 selects/s||33μs||33μs||36μs||53μs|
For some DBs, batching improved the bulk insert quite a bit. I guess this is because this way I can avoid swapping. Maybe changing the batch-size from 1000 to higher numbers additionally increases the throughput.
- How many keys are too many in memcached?
- NoSQL Database Types
- Redis Use Cases:
- Todd Hoff: How Twitter Uses Redis To Scale - 105TB RAM, 39MM QPS, 10,000+ Instances, 2014.
- Adam Bloom: Using Redis at Pinterest for Billions of Relationships, 2013.
- Siyuan Fu: Optimizing Memcached Efficiency, 2017
- Andreas Wittig: EC2 Network Performance Cheat Sheet, 2018.
- MariaDB: Choosing the Right Storage Engine
- Jan Bodnar: MySQL storage engines, 2017.
- When to use MyISAM and InnoDB?
- How can I speed up bulk inserts into MySQL with SQLAlchemy?
Dictionaries are a fundamental data structure in Python. They are called associative arrays in PHP and hashtable in Java. ↩