subarray sum equals k


The subarray sum equals k problem asks:

Given an array of integers and an integer k, you need to find the total number of continuous subarrays whose sum equals to k.

In this problem, you are being asked to count the number of subarrays who's sum equals k. The emphasis is on the sum, so we should first focus on computing the sum as we go.

Think about it this way: you are standing in the snow, and you're given a list of 'hops' (in meters) that you need to make: 1, 2, 2, 0, 3, 2, 5. Initially you hop 1 meter, then 2, and 2 again, then 0 meters, then 3 etc.

The sum of these hops is the distance you've traveled at each hop.

You are then asked: how many of these hops travelled 5 meters consecutively? The answer is 6:

1             _
2         ___
3       _____
4     _____     

This is, once again, something easy for us humans to do: we find the two pointer approach fairly intuitive; simply add numbers until they add up to 5, i.e 1,2,2, move the right pointer as long as the sum of the numbers is less than, or equal to 5, or move the left pointer if the sum is greater.

However implementing a two pointer solution can be very hard, and working with the sums, or the distances traveled is actually a lot easier.

Work with accumulation values

So our total distance traveled at each hop is:

1nums:              1,  2,  2,  0,  3,  2,  5
2accumulation:  0,  1,  3,  5,  5,  8,  10, 15

This is something that can often trip up beginners with DP questions: you can often end up working in a totally different solution space. We are no longer concerned with our input values.. only their accumulation values. That can take quite a leap of the imagination (no pun intended).

The first distance is 0, because before you've taken the first step in a journey, you've traveled 0 meters.

Could i hop back 'k'?

This 'total distance travelled' data is invaluable, because we can ask it questions, namely: "if i hop back by k, do i land on a previous hop (myt own footprints in the snow)?"

If we do land on a previous hop, then we have a sub-array who's sum equals k. If you're standing in the snow, hop back k and land on your footprints, that means the sum of your footprints between these two points is equals to k.

Please note, that if you hop twice on the same spot (you hopped 0 meters) that still counts as a hop, hence the repeated hops from and to 5.

acc - k

In other words, the question we are asking is: is there an accumulation value equal to acc - k.

Let's count the 'hits':

10, 1, 3, 5, 5, 8, 10, 15
4      x........x
5         x..x......x
6               x......x

Now it gets a little trickier. When we travel 10 meters, and ask do i land on a hop if i travel 5 meters back? the answer is: actually you could land on two hops. Since there is a 0 in the array, there are two subarrays that add up to the same accumulation value. We need to account for that.

An easy to understand implementation

1from itertools import accumulate
3def subarraySum(nums, k):
4    total = 0
5    accum = [0] + [*accumulate(nums)]
6    for i, acc in enumerate(accum[1:]):
7        total += accum[: i + 1].count(acc - k)
8    return total

We begin by adding out starting position 0 to our accumulation array (when we hop back, it's valid to land on the starting position). We then iterate over our accumulate (barring 0).

We then ask if i hop back k meters, how many valid hops do i find?. This works, but is highly inefficient.

A more efficient implementation

1from collections import Counter
2from itertools import accumulate
4def subarraySum(nums, k):
5    count, total = Counter({0: 1}), 0
6    for acc in accumulate(nums):
7        total += count[acc - k]
8        count[acc] += 1
9    return total

A more efficient implementation is to count the acc (distance travelled) values in a counter, on line 8 as we go along (as we iterate over accumulate). This is purely for the O(1) lookup.

We initialise the counter with {0:1} since if we land on our starting position, we've made a valid hop.

We'll want to perform the lookup (ask the question, on line 7) before adding our accumulation value (distance) to the counter.