asyncio

asyncio is a library to write concurrent code using the async/await syntax.

Concurrency Basics

Parallel vs Interleaved

Running things concurrently means to run them at the same time. There are two ways to run stuff concurrently: In parallel or interleaved.

The following images show the difference for compute-bound two tasks:

Parallel Execution

Interleaved Execution

Now you might wonder why on hell would you ever run things interleaved as it takes more time in total, right?

CPU-bound vs I/O-bound

Where does your program spend most of its time? In some cases, it's just computationally heavy. For example, when you train a neural network you spend a lot of time doing matrix multiplications. In other applications, you spend a lot of time waiting for I/O: Downloading data from the internet, waiting for a database to return the selected rows. Or simply reading files from disk.

Let's take a file explorer application as an example. You open a folder and you want to see thumbnails of the images. They might be high-resolution images.

Printing the names of the files and the file size is fast. But computing the thumbnail takes a lot of time. So you can do the thumbnail-calculation in parallel. My laptop has 4 CPUs and hence my laptop can calculate the thumbnails of 4 images in parallel. The next bottleneck is reading the full-size images for the thumbnail-calculation into memory. More time than calculating the thumbnails. Hence the execution time is no longer bound by the speed of the CPU, but by the speed of reading from disk. Here interleaved execution helps:

Start reading a file into memory
While the disk is spinning to the right point, continue computing a thumbnail

This means if interleaved tasks speed up the total running time of the application, they have to compute stuff while I/O is running.

Comparison

	Processes	Threads	Coroutines
Speed-up IO-bound tasks	✔	✔	✔
Speed-up CPU-bound tasks	✔	✗	✗
Use multiple CPU cores	✔	✗	✗
Scheduling	preemptive	cooperative	cooperative
Scalability	~number of CPU cores	~number of CPU cores x number of threads per core	thousands

Concurrency in Python

	Processes	Threads	Coroutines
Packages	`multiprocessing`, `joblib`	`threading`	`asyncio`, `greenlet`

There is also the concurrent.futures package.

Multiprocessing Example

"""'Hello Word' example for multiprocessing in Python."""

import time
import multiprocessing
import random
from typing import List


def dispatch_jobs(data: List[int], nb_jobs: int):
    # Chunk the data
    total = len(data)
    chunk_size = total // nb_jobs
    chunks = split_data(data, chunk_size)

    # Create the jobs
    jobs = []
    for i, chunk in enumerate(chunks):
        j = multiprocessing.Process(target=job, args=(i, chunk))
        jobs.append(j)
    print(f"Created {len(jobs)} jobs.")

    # Start execution
    for j in jobs:
        j.start()


def split_data(data, n):
    return [data[i : i + n] for i in range(0, len(data), n)]


def job(job_id: int, data_slice: List[int]):
    for item in data_slice:
        print(f"job {job_id}: {item}")
        time.sleep(random.randint(0, 10) * 0.1)


if __name__ == "__main__":
    data = list(range(100))
    dispatch_jobs(data, nb_jobs=4)

A more exciting example would be matrix multiplication.

Threading Example

"""'Hello Word' example for multithreading in Python."""

import time
import threading
import random
from typing import List


def dispatch_jobs(data: List[int], nb_jobs: int):
    # Chunk the data
    total = len(data)
    chunk_size = total // nb_jobs
    chunks = split_data(data, chunk_size)

    # Create the jobs
    jobs = []
    for i, chunk in enumerate(chunks):
        j = threading.Thread(target=job, args=(i, chunk))
        jobs.append(j)
    print(f"Created {len(jobs)} jobs.")

    # Start execution
    for j in jobs:
        j.start()


def split_data(data, n):
    return [data[i : i + n] for i in range(0, len(data), n)]


def job(job_id: int, data_slice: List[int]):
    for item in data_slice:
        print(f"job {job_id}: {item}")
        time.sleep(random.randint(0, 10) * 0.1)


if __name__ == "__main__":
    data = list(range(100))
    dispatch_jobs(data, nb_jobs=4)

A more exiting example would be downloading of many files (e.g. imagenet) or a link-checker.

Asyncio Coroutines

One style of coroutines in Python makes use of asyncio. You need an event loop which executes the functions. The await statement the execution until the expression after the keyword returns. This enables other coroutines to execute in between.

The async/await syntax was introduced in Python 3.5 with PEP 492 and looks like this:

import asyncio


async def main():
    print("Hello ...")
    await asyncio.sleep(1)
    print("... World!")


# Python 3.7+
asyncio.run(main())

Note that with await asyncio.sleep(0) you can let other coroutines run. This might make sense if you have a compute-heavy coroutine.

A more exiting example would be downloading of many files (e.g. imagenet) or a link-checker.

Greenlet Coroutines

Greenlet provides another style of coroutines. In contrast to asyncio, where you explicitly define functions as asynchronous and define when you want to let others run with await, greenlets do it implicitly by monkey-patching functions such as sleep.

Web Frameworks

A lot of times (1, 2, 3) you might see benchmarks which show the number of requests per second you can do with Flask / Django and the way higher number of requests/second you can do with Node/Express.js or another web application framework. I sometimes see mistakes like using the development server of Flask which is not intended for production for those benchmarks (I think here). Instead, gunicorn should be used.

Anyway, those miss an important point: The web application framework is likely not the bottleneck. The application logic itself, SSL, the database queries. They likely dominate the execution time. I don't have those numbers at hand, but Miguel Grinberg makes this point as well. You might get a feeling for it by looking at my article about basic operations.

Instead of this sole focus on efficiency, other factors need to be considered: The stability of the framework. The size of the community. The number of developers you can find to work on your application.

Flask

Gunicorn has multiple async workers. gevent and eventlet both use Greenlets. This way, you make a Flask app use Greenlets by letting gevent / eventlet monkey-patch.

As Flask is based on WSGI, it cannot use asyncio. See issue #3339.

Quart

Quart is similar to Flask, but uses the async/await syntax (see migration guide).

Recommended by Miguel Grinberg as an alternative to Flask.

Sanic

Sanic is a Python 3.6+ web server and web framework which allows the usage of the async/await.

Recommended by Miguel Grinberg as an alternative to Flask.

Starlette

Starlette is an ASGI framework, for building asyncio services.

It should be used with an ASGI server, such as uvicorn.

Others

There is also aiohttp and FastAPI. I haven't used either of them and I don't know of any big player using them. FastAPI has a couple of nice features:

Documentation looks good
Generates Swagger
Uses pydantic

Negative about FastAPI is the fact that it is only driven by Sebastián Ramírez. The repository is in his private account and I don't see a project governance document like e.g. SciPy has. Flasks Governance document misses some crucial parts, e.g. who is currently in charge and which organizations decide that.

Being a one-person project means if that person gets hit by a bus, maintenance might suddenly stop. If I use this for my web services, I have to start maintaining this framework.

There is also Twisted since 2002. I haven't used it, I don't know anybody who used it and I don't know what it is doing.