JSON encoding/decoding with Python

JSON is a cornerstone for the exchange of data on the Internet. REST APIs use the standardized message format all around the world. Being a subset of JavaScript, it got a huge initial boost in its adoption right from the start. The fact that its syntax is pretty clear and easy to read also helped.

JSON has libraries in every language I know for serialization and deserialization. In Python, there are actually multiple libraries. In this article, I will compare them for you.

The libraries

CPython itself has a json module. It was originally developed by Bob Ippolito as simplejson and was merged into Python 2.4 (source). CPython is licensed under the Python Software Foundation License.

simplejson still exists as its own library and you can install it via pip. It is a pure Python library with an optional C extension. Simplejson is licensed under the MIT and the Academic Free License (AFL) license.

ujson is a binding to the C library Ultra JSON. Ultra JSON was developed by ESN (an Electronic Arts Inc. studio) and is licensed under the 3-clause BSD License. Ultra JSON has 3k stars on Github, 305 forks, 50 contributors, the last commit is only 12 days old and the last issue was opened 5 days ago. I’ve heard that it is in “maintenance mode” (source), indicating that there is no new development.

pysimdjson is a binding to the C++ library simdjson. SIMDjson received funding from Canada. simdjson has 12.2k stars on Github, 611 forks, 63 contributors, the last commit was 11 hours ago, and the last issue was opened 2 hours ago.

python-rapidjson is a binding to the C++ library RapidJSON. RapidJSON was developed by Tencent. RapidJSON has 9.8k stars on GitHub, 2.7k forks, 150 contributors, the last commit was about 2 months ago and the last issue was opened 17 days ago.

orjson is a Python package that relies on Rust to do the heavy lifting.

Maturity and Operational Safety

All mentioned libraries worked for the benchmark examples without issues. Switching the JSON module is not a super big deal, but still, I want to know that the module is supported.

CPython, simplejson, ujson, and orjson consider themselves production-ready.

python-rapidjson marks itself as alpha, but one maintainer says that is a mistake and will be fixed soon (source).

	cPython JSON	simplejson	ujson	orjson	pysimdjson	python-rapidjson
License	Python Software Foundation License	MIT / Academic Free License (AFL)	BSD License	MIT / Apache	MIT	MIT
Maturity
Version	3.8.6	3.17.2	3.2.0	3.4.0	3.0.0	0.9.1
Development Status		Production/Stable	Production/Stable	Production/Stable	Alpha	Alpha
GH First release	1993-01-10	2006-01-01	2012-06-18	2018-11-23	2019-02-23	2017-03-02
CI-Pipeline	GH, Travis, Azure	GH, Travis, Appveyor	GH, Travis	Azure	GH, Travis	Appveyor
Operational Safety
GH Organization	✓	✓	✓	✗	✗	✓
GH Contributors	1319	30	50	9	7	15
Last release	2020-09-23	2020-07-16	2020-09-08	2020-09-25	2020-08-21	2019-11-13
Last Commit	2020-09-25	2020-07-16	2020-09-19	2020-09-25	2020-08-31	2020-05-08
PyPI Maintainers		3	4	1	1	2
Users
GH Stars	33,700	1310	2966	1348	374	397
GH Forks	16,200	290	306	48	25	31
GH Used By	-	47,164	14,760	613	11	661
StackOverflow Questions		279	6	3	-	319
Benchmarks
GeoJSON Read	48ms	45ms	22ms	19ms	14ms	83ms
GeoJSON Write	291ms	352ms	34ms	15ms	289ms	108ms
Twitter Read	6ms	6ms	6ms	5ms	6ms	9ms
Twitter Write	25ms	33ms	5ms	3ms	24ms	6ms
2MB Float List Read	36ms	37ms	16ms	9ms	7ms	66ms
2MB Float List Write	161ms	186ms	25ms	12ms	164ms	104ms

The Questions

One indicator of how easy it might be to resolve problems is to ask questions and see how the behavior is:

SimpleJSON: I’ve got a response the next day. The response was clear, easy to follow, friendly. Bob Ippolito answered me — the guy who originally developed it and who also is mentioned in the Python docs for the JSON module!
uJSON: I’ve got a clear, friendly, easy to follow answer within 30 minutes. @hugovank
ORJSON: No answer after 8 days.
PySIMDJSON: No answer after 8 days.
Python-RapidJSON: I’ve got a clear, friendly, easy to follow answer within 30 minutes. A simple PR wasn’t merged after two days.

One answer I’ve got for all of the projects is that they are essentially not in contact with each other.

The Benchmark

In order to benchmark the different libraries properly, I thought of the following scenarios:

APIs: Web services that exchange information. It might contain Unicode and have a nested structure. A JSON file from a Twitter API sounds good to test this.
API JSON Error: I was curious about how the performance would change if there was an error in the JSON API format. So I removed a brace in the middle.
GeoJSON: I’ve first seen the GeoJSON format with Overpass Turbo, an Open Streep Map exporter. You will get crazy big JSON files with mostly coordinates, but also pretty nested.
Machine Learning: Just a massive list of floats. Those might be weights of a neural network layer.
JSON Line: Structured logs are heavily used in the industry. If you analyze those logs, you might need to go through Gigabytes of data. They are all simple dictionaries with a datetime object, a message, the logger, log status, and maybe some more.

Deserialization Speed

The speed of my hard drive gives a lower boundary for the speed to read. I’ve included it as a baseline in the following 3 charts.

The conclusion from this:

Rapidjson is slow, but for small JSONs like the twitter.json, you will not notice a difference. One can see this with the structured logs.
simdjson, orjson, and ujson are all crazy fast.
Reading a JSON file that contains a structural error is equally fast for most libraries. A notable exception is rapidjson. I guess that it aborts reading the file once it finds the error.

Serialization Speed

In this case, I created the JSON-String beforehand and measured the time it takes to write it to disk as a baseline.

What I conclude from this:

orjson is just insanely fast. It is super close to maxing out my hard drive. And ujson is pretty close to that.
rapidjson is pretty quick, but not on the same level as orjson or ujson.
simdjson is slow.

A professional workflow with JSON

As a closing note, I want to point out some issues I see sometimes and have written myself:

Calling variables foo_json : JSON is a string format. If it’s not a string, it’s not JSON. If you deserialized a JSON with bar = json.loads(foo) , then bar is not a JSON. You can serialize bar to a JSON which is equivalent to the JSONfoo , but bar is not a JSON. It’s a Python object. Very likely a dictionary. You can then all it foo_dict .
Attribute checks all over the place: If you receive a JSON, it’s super easy to convert it to a Python object (e.g. a dict) and use it. This is fine for proof-of-concept code or very small JSON strings. It will bite you in the ass if you don’t convert it to something like a dataclass.

pydantic is a super helpful validation library. You can take the JSON-string, parse it to a Python base representation with dictionaries / lists / strings / numbers / booleans with your favorite JSON library and then parse it again with Pydantic. The advantage you get from this is that you know what you’re dealing with later. No longer just Dict[str, Any] as a type annotation. No longer unhelpful editor autocompletion. No longer checking if attributes exist all over your code.

To include other json packages than the default json , I recommend the pattern

import ujson as json

For Flask, you can use another encoder/decoder like this:

from simplejson import JSONEncoder, JSONDecoder

app.json_encoder = JSONEncoder
app.json_decoder = JSONDecoder

JSON encoding/decoding with Python

The libraries

Maturity and Operational Safety

The Questions

The Benchmark

Deserialization Speed

Serialization Speed

A professional workflow with JSON

See also

Published

Category

Tags

Contact