Maximum Fixed-Length Contiguous Subarray

I recently taught a course about improving code for performance. An obvious performance improvement is to not execute unnecessary operations. I lacked a good example when I gave the course, but here is one: Find value of the largest contiguous sub-array of fixed length in a huge array.

This is a toy example, of course, but it shows the idea quite well.

The Code is on GitHub.

Small Example ¶

Input:

array: [7, 9, 2, 0, 5, 2, 0, 1, 1, 2]
sub-array length m = 3

Contiguous sub-arrays of the given length:

[7, 9, 2], value = 18
[9, 2, 0], value = 11
[2, 0, 5], value = 7
[0, 5, 2], value = 7
[5, 2, 0], value = 7
[2, 0, 1], value = 3
[0, 1, 1], value = 2
[1, 1, 2], value = 4

Output: 18

Inefficient Solution ¶

The following solution makes use of the slicing notation. It is short, easy to read and mostly pretty efficient:

from typing import List


def find_biggest_subarray_slice(array: List[int], m: int) -> int:
    return max(sum(array[i : i + m]) for i in range(len(array) - m + 1))

Except that it has one flaw: It makes too many additions and accesses list elements way more often than necessary

Efficient Solution ¶

from typing import List


def find_biggest_subarray_iterative(array: List[int], m: int) -> int:
    value = sum(array[0:m])
    largest_sub_array = value
    for remove, add in zip(array, array[m:]):
        value = value - remove + add
        largest_sub_array = max(value, largest_sub_array)
    return largest_sub_array

Comparison ¶

The inefficient solution is in $\mathcal{O}((n - m) \cdot m)$, the efficient one is in $\mathcal{O}(n - m)$. So you will notice the difference clearly when you compare the execution times with big $m$.

The inefficient solution changes its execution time like as shown in the image below for increasing m and contant n = 100,000:

Total execution time of find_biggest_subarray_slice

In contrast, the efficient solution looks like this:

Total execution time of the efficient solution

Two things to notice:

Worst-Case: For the inefficient solution, it is $m = n/2$. For the efficient solution, it is $m = 1$.
Level: The efficient solution is always below 0.03s. The inefficient one is only for the best-case scenario ($n=m$) below that. And even then, the efficient solution is at 0.0008s whereas the inefficient one is at 0.001s.
Speed-ups: If you look at $m = 60,000$, the more efficient solution gives a 1000× speedup!

Maximum Fixed-Length Contiguous Subarray

Small Example ¶

Inefficient Solution ¶

Efficient Solution ¶

Comparison ¶

See also ¶

Published

Category

Tags

Contact