Python is interpreted. Python is a scripting language. I hear those two statements pretty often when people want to say that Python is slow or that Python is not suited for large systems. In this article I want to dispel those myths.
Interpreted Language
What does it mean to say Python "is" an interpreted language? If you mean that Python is usually interpreted, that statement is correct. If you mean that Python is always interpreted, you are wrong6.
Usually, this comes with the connotation that Python is slow.
Scripting Language
Next, what does it mean to say Python "is" a scripting language. According to Wikipedia:
A scripting or script language is a programming language for a special run-time environment that automates the execution of tasks
That sound a lot like "interpreted language" to me. However, later in the article you can read
with the term "script" often used for small programs (up to a few thousand lines of code)
Which leads to the myth that you can't build complex systems with Python.
Myth: Python is Slow
What people usually mean with that statement is that raw execution speed is low.
Speed is not Everything
However, they tend to forget other important properties like development speed, ease to find developers, flexibility of the built systems to adjust for future changes.
A case in point is this claim about the speed of YouTube developers (ycombinator referencing "Python Interviews" by Mike Driscoll). The fact that Python is well-suited for rapid prototyping is also appreciated at CERN4. Same for Quora5.
Raw Number-Crunching
Python has a lot of awesome libraries. Three of them help you with raw number crunching:
- Numpy and Scipy: Two battle-proven libaries which build on BLAS libraries. So the computationally heavy stuff is executed in highly optimized libraries which are written in Fortran.
- Tensorflow / PyTorch: Both libraries heavily rely on CUDA and CuDNN, meaning the code which does the number cruniching is executed on the GPU. No Python involved.
Myth: Python can use multiple cores
... because of the GIL. That is just plain wrong. Have a look at my asyncio article to get an overview over concurrency in Python.
Myth: Python cannot be used in big systems
Large systems might not need single heavy number crunching like BLAS libraries do. They might (a) have just "organizational complexity", meaning a lot of business logic or (b) a lot of single small requests comming into a web service.
There are a lot of pages which say "Python is used at website XY". However, it's pretty hard to tell where and how exactly Python is used. It is a pretty awesome language for writing "glue code", meaning code which helps to keep things together. It's also nice for ad-hoc stuff which could mean that it is only used for that. But, and that is the point of this paragraph, you can also build big systems with Python. Here are some examples:
- Instagram uses Django (sources: 2011, 2016, 2017): Instagram is at place 29 of the gloally most popular websites (Alexa, April 2020). The fact that Instagram uses Django for their website shows two things: You can build complex systems using Python and you can build systems that scale.
- Pinterest uses Flask and Django (Quora 2015 by Steve Cohen): Pinterest is also one of the 500 most-visited websites on earth.
- Facebook (2016): A lot of places, but all seem not super huge. Facebook has released the Tornado Webserver which they seemed to have used for their real-time updates2.
- Dropbox makes heavy use of Python. They had Guido van Rossum working for them for quite a while and in 2019 they put a lot of effort in 2019 to update their code: Our journey to type checking 4 million lines of Python
- Netflix also uses Python in many places.
Other big players where I have seen claims, but no reliable source:
- Yahoo Maps, Yahoo Groups 1
- Google1 3: I see this menioned all the time and you can find job postings for this, but no details what Google uses Python for.
- YouTube: 1 3
How to make Python Fast
I will likely write way more about this, but here are some core ideas:
- Analyze: Where do you spend most of your execution time. Is it mainly waiting for I/O? Then look at my asyncio article.
- Use libraries: Python has a lot of awesome libraries which are well-maintaned. It takes a while to figure out which ones exist, but Numpy, Scipy, Pandas, Dask, Tensorflow, PyTorch, Flask, Django, nltk, scikit-learn, and spacy are certainly some of them. And learn how to use them correctly. I had a 96x speedup just for using numpy for matrix multiplication. I had a 46x speedup by using numpy and a vectorized solution. Still pure Python in both cases.
- Interpreters: You don't need to use the standard interpreter. PyPy might be way faster due to JIT compilation.
- C-Bindings: If Python is to slow for a specific task, you don't need to abandon Python. Cython, ctypes, cffi, c extension and pybind11 are some of the options you have.
How to build big Systems with Python
A lot of the answer is not Python specific and would require way more than just this small part of the article. Some tiny hints which help:
- Coding Standards: pre-commit, black, flake8
- Type hints and mypy for type checking
- Testing: pytest
Footnotes
-
Python.org wiki: Organizations using Python ↩↩↩
-
David Recordon: Tornado: Facebook's Real-Time Web Framework for Python, 2018. ↩
-
Adam D'Angelo: Why did Quora choose Python for its development?, 2014. ↩
-
Ramchandra Apte: Is it feasible to compile Python to machine code?, 2012. ↩
-
Anders Hovmöller: Python is slow - it doesn't have to be, 2020. ↩