The Word Error Rate (short: WER) is a way to measure performance of an ASR. It compares a reference to an hypothesis and is defined like this:
where
- S is the number of substitutions,
- D is the number of deletions,
- I is the number of insertions and
- N is the number of words in the reference
Examples
REF: What a bright day HYP: What a day
In this case, a deletion happened. "Bright" was deleted by the ASR.
REF: What a day HYP: What a bright day
In this case, an insertion happened. "Bright" was inserted by the ASR.
REF: What a bright day HYP: What a light day
In this case, an substitution happened. "Bright" was substituted by "light" by the ASR.
Range of values
As only addition and division with non-negative numbers happen, WER cannot get negativ. It is 0 exactly when the hypothesis is the same as the reference.
WER can get arbitrary large, because the ASR can insert an arbitrary amount of words.
Calculation
Interestingly, the WER is just the Levenshtein distance for words.
I've understood it after I saw this on the German Wikipedia:
But I have written a piece of pseudocode to make it even easier to code this algorithm:
Python
#!/usr/bin/env python
def wer(r, h):
"""
Calculation of WER with Levenshtein distance.
Works only for iterables up to 254 elements (uint8).
O(nm) time ans space complexity.
Parameters
----------
r : list
h : list
Returns
-------
int
Examples
--------
>>> wer("who is there".split(), "is there".split())
1
>>> wer("who is there".split(), "".split())
3
>>> wer("".split(), "who is there".split())
3
"""
# initialisation
import numpy
d = numpy.zeros((len(r) + 1) * (len(h) + 1), dtype=numpy.uint8)
d = d.reshape((len(r) + 1, len(h) + 1))
for i in range(len(r) + 1):
for j in range(len(h) + 1):
if i == 0:
d[0][j] = j
elif j == 0:
d[i][0] = i
# computation
for i in range(1, len(r) + 1):
for j in range(1, len(h) + 1):
if r[i - 1] == h[j - 1]:
d[i][j] = d[i - 1][j - 1]
else:
substitution = d[i - 1][j - 1] + 1
insertion = d[i][j - 1] + 1
deletion = d[i - 1][j] + 1
d[i][j] = min(substitution, insertion, deletion)
return d[len(r)][len(h)]
if __name__ == "__main__":
import doctest
doctest.testmod()
Explanation
No matter at what stage of the code you are, the following is always true:
- If
r[i]
equalsh[j]
you don't have to change anything. The error will be the same as it was forr[:i-1]
andh[:j-1]
- If its a substitution, you have the same number of errors as you had before when comparing the
r[:i-1]
andh[:j-1]
- If it was an insertion, then the hypothesis will be longer than the reference. So you can delete one from the hypothesis and compare the rest. As this is the other way around for deletion, you don't have to worry when you have to delete something.