Every software project which is not absolutely tiny likely has (or had) bugs, but astonishingly I learned very little about bugs during my university studies. So let's summarize a few important things.
What are bugs
I define a bug like this:
A software bug is a fault in software that causes incorrect behaviour.
An example would be this sum.py
executed with Python 3:
print("Please enter two numbers you would like to add:")
n1 = input()
n2 = input()
print("The summ of those two numbers is {}".format(n1 + n2))
If you enter 42
and 1337
it will give 421337
. The reason for this
behavior is that input()
returns a string, not a number. So clearly neither
the developer nor the user want this behaviour. But it is not always that easy.
Sometimes the design of the application might be misleading. Is that a bug as
well? I tend to say "no" if the developer wanted the behaviour to be as it is.
Then it is just bad design.
Types of Bugs
I would like to give you some ideas which types of bugs can happen and - if possible - how to prevent them.
Functional Bugs
Off by one errors are super common: Smaller or smaller equal? Greater or greater equal? Smaller or greater? They are a logic error which can often be covered by calling the function using it by example input and checking the actual output againts known example output. Those off-by-one errors can also easily lead to infinite loops.
Units of measure are a common one as well. Is "time" measured in hours or seconds? Do we use miles or kilometers? Famously, NASA lost a 125,000,000 USD Mars orbiter due to this bug (source). Luckily, this one can be prevented by using unit libraries like Pythons Pint.
Race conditions are one of the ugly bugs that only appear sometimes, even if you call the program with exactly the same input. They might even disappear completely when you try to investigate them. As an example, imagine a bank account. You have one process P-in which receives ingoing transactions and another process P-out that handles outgoing transactions:
P1:
def transfer_money(account, pay_amount): # L0
current_total = account.get_current_total() # L1
updated_total = current_total - pay_amount # L2
account.set_current_total(updated_total) # L3
P2:
def receive_money(account, get_amount): # L4
current_total = account.get_current_total() # L5
updated_total = current_total + get_amount # L6
account.set_current_total(updated_total) # L7
You have 3141 EUR on that account identified by 'FortKnox'. At midnight in the beginning of year 2018, you receive 42 EUR from your awesome company as a gift for the New Year and you have to pay 1337 EUR in rent. Both happen pretty much at the same time. They are not really in parallel, but interleaved:
Step | Time | Code | Relevant Memory | Bank Account |
---|---|---|---|---|
S1 | 2018-01-01 00:00:00.000 | L0 | account = 'Martin', pay_amount = 1337 | 3141 |
S2 | 2018-01-01 00:00:00.022 | L4 | account = 'Martin', get_amount = 42 | 3141 |
S3 | 2018-01-01 00:00:00.030 | L5 | get_amount = 42, current_total = 3141 | 3141 |
S4 | 2018-01-01 00:00:00.051 | L1 | pay_amount = 1337, current_total = 3141 | 3141 |
S5 | 2018-01-01 00:00:00.072 | L2 | updated_total = 1804 | 3141 |
S6 | 2018-01-01 00:00:00.180 | L3 | 1804 | |
S7 | 2018-01-01 00:00:00.193 | L6 | updated_total = 3183 | 1804 |
S8 | 2018-01-01 00:00:00.237 | L7 | 3183 |
Two resources access the same data - the current value of the bank account. This is problematic. You can prevent this by making both of the functions atomic, e.g. by locking or by using atomic database operations. Rust introduced the concept of ownership to solve this (details).
Arithmetic errors are division by zero or taking the logarithm of something non-positive. It can be prevented in some cases by adding a small positive constant, but it depends a lot on the usecase if this is valid.
Uninitialized resources (e.g. Nullpointer Exception or KeyError
in Python)
I think are mostly covered by line coverage in unit tests.
Non-Functional Bugs
One of the most common non-functional bugs is that the code is too slow.
You can get a grasp of this with pytest-benchmark
,
but the problem is that it depends on the hardware and what you run at the same
time on the machine.
Another one I see often with browsers is that it is too memory intensive. I guess that is a problem for smartphones as well.
Speaking of smartphones, too power intensive is certainly another one.
Bug reporting
When reporting bugs, please don't only say "it doesn't work" or "it's broken". That doesn't help. what does help, is answers to the following questions:
- What did you do?
- What did you expect to happen?
- What actually happened?
Ideally, you can figure out a minimal setting how to reproduce it.
If you have a Stacktrace or some error code that is good as well.
Logging
Developers have to think about the potential of bugs before they happen in production. Which information will be useful to see if a bug occurs?
For web services, it is often some kind of user ID. We want to know who experienced the issue.
Triaging Bugs
Not all bugs are worth fixing, as Jeff Atwood nicely formulated it in 2006. According to Eric Sink you should answer four questions if you want to figure out if a bug is worth being fixed:
- Severity: When this bug happens, how bad is the impact?
- Frequency: How often does this bug happen?
- Cost: How much effort would be required to fix this bug?
- Risk: What is the risk of fixing this bug?
While the first two seem very obvious to me and the cost question of course is important as well, the risk is harder to understand. What risk could fixing a bug potentially have?
Regression bugs. By touching any code, you can break something. If you touch something that was super hard to get right or that inherently has risk (e.g. deleting stuff permanently), then any bug related to those has a high risk.