# K numbers in array that sum to C

#### Posted on August 12, 2013

##### Last updated on August 13, 2013

I have seen this question and its variants during a couple of interviews, so it is worth exploring.

The simplest form of the question is:

Given an array of integers, determine whether there two distinct elements in it that sum to an integer $$C$$.

The brute-force approach: Look at every distinct pair of elements $$a_i$$, $$a_j$$, and check whether $$a_i + a_j = C$$. There are $$(n-1) + (n-2) \ldots + (1) = \frac{1}{2}n(n-1)$$ distinct tuples, giving us a complexity of $$O(n^2)$$.

The code to solve this could look something like this:

We can try and do better. What if we sorted the list? That sometimes helps, and it’s $$O(n\operatorname{log}(n))$$ complexity. We now have a sorted list and are trying to find two distinct elements $$a_i, a_j$$, such that $$a_i + a_j = C$$. We also know that that, since the array is sorted, $$i > j \leftrightarrow a_i \geq a_j$$. Then, if we have $$a_i + a_j > C, i > j)$$, we know that we must lower one of the terms. Since $$a_j > a_i$$, we can try $$a_{j-1}$$, since $$a_{j_1} \leq a_j$$, and then just maybe $$a_i + a_{j-1} = C$$. Symmetrically, if we have $$a_i + a_j < C, i > j$$, then we can try $$a_{i+1} + a_j$$. This gives us an algorithmic idea. Start with two pointers - one to $$a_0$$ and one to $$a_{n-1}$$, the last element. If the sum of the two elements is $$C$$, stop. Otherwise, if it’s smaller than C, move right the left pointer, and if it’s greater than $$C$$, move the right pointer to the left.

In code, it might look like this:

Since sorting is $$O(n\operatorname{log}(n))$$, and the second pass through the array is linear, the total running time of this algorithm is $$O(n\operatorname{log}(n))$$, which is improvement over the last algorithm.

But maybe we can do even better than that. What do we actually need to determine? We need to determine whether there exist $$i$$, and $$j$$, such that $$a_i + a_j = C$$, or alternatively, $$a_i = C - a_j$$. Hence, if we are scanning the array linearly, and are looking at $$a_j$$, if the array has in it the number $$C - a_j = a_i$$, we have found our solution. This leads to the following idea: Go through the array $$a$$, and add every member to a set, so that we can query array membership in $$O(1)$$ time. Then, iterate through the array again, and for each element $$a_j$$, use the precomputed set to check whether $$C-a_j$$ is in the array. This clearly gives us an $$O(n)$$ algorithm. A straightforward implementation might look like this (Java):

Let’s check that it works with input a = [5,8,29,1,-4,3], C = 25. This should return true as $$29-4 = 25$$.

Seems to be working. Let’s try $$C=10$$. This should return false since no tuple sums to $$10$$ in this example.

The problem is that by putting everything into a set, we have lost the ability to tell how many instances of a number there were in the original array. The code above allows us to use the $$5$$ twice. Similarly, if we gave the code a = [5], C=10, it would still return true. Here is one way to fix it: Create a hashmap that maps a number to the index it appeared at in the original array. Then, scan the array again, and for every $$a_j$$, check if $$C-a_j$$ is in the hashmap. If it is, check if it appears somewhere besides index $$j$$. If it does, we found our tuple. An implementation might look something like this:

Assuming good implementations of hashing functions, this algorithm should be linear, $$O(n)$$. This covers the case for $$k=2$$. I’ll now look at cases where $$k > 2$$.

## 0.1 Case for $$k > 2$$

Let’s look at $$k = 3$$. We now need to find three numbers that sum to $$C$$ : $$a_x + a_y + a_z = C$$. Hence, $$a_x = C - a_y - a_z$$. The obvious brute-force way is to look at every pair of three elements, giving us an $$O(n^3)$$ algorithm. We can use a similar technique as last time to improve this. Store all the array elements in a set, and for every two distinct elements $$a_y, a_z$$, check if $$C - a_y - a_z$$ is in the array. This gives an $$O(n^2)$$ algorithm.

For $$k = 4$$, compute a set $$S$$ of sums of all 2-element subsets of the original array. Then, for every two elements $$a_e, a_f$$ in the original array, check if $$C - a_e - a_f$$ is in $$S$$. If it is, we have found a solution, again in $$O(n^2)$$ time.

In general, when k is even, we have the following algorithm:

1. Compute a set $$S$$ that stores all the sums of all the $$k/2$$-element tuples of the original array
2. Look at every $$k/2$$ tuple of elements in the original array, and compute its sum $$s$$. If $$C - s \in S$$, we found a solution

This is $$O(n^{\frac{k}{2}})$$

When $$k$$ is odd, we have the following algorithm:

1. Compute a set $$S$$ that stores all the sums of all the $$(k-1)/2$$-element tuples of the original array.
2. Look at every $$(k+1)/2$$-element tuple in the original array, and compute its sum $$s$$. If $$C - s$$, we found a solution.

This is $$O(n^{\frac{k+1}{2}})$$

Notice the above algorithms allow the same element to be used more than once. Fixing this is an implementation detail; we have to keep track as to which indices a sum came from, and ensure that there are no duplicates. This will increase the runtime of our algorithm.

Markdown SHA1: a43a49b1ef8ac14022702fee8c3c74aa6abe505d