Problem Statement:
Given an array \(A\) of real numbers and a real number \(t\), determine a non-empty subsequence \([i,j]\) of \(A\), \(0 \leq i \lt j\) such that \(\left|\sum_{m=i}^{j}A[m]-t\right|\) is minimized. It other words, find a contiguous non-empty subsequence such that its sum is as close to \(t\) as possible.
We can start off with the simpler case where \(t=0\). The naive solution is to go through every pair of indices, and compute the sum in between. This is \(O(n^3)\), but can be reduced to \(O(n^2)\) if we accumulate the sum in the innermost loop.
Here is a way to do this in \(O(n\operatorname{log}(n))\):
- Compute a cumulative sum array \(A_{cum}\)
- Sort \(A_{cum}\). The smallest difference will be between two consecutive elements. If two consecutive elements are equal in sort(\(A_{cum}\)), then there exists a subsequence that sums precisely to \(0\). Hence, go through all adjacent pairs, and compute the smallest difference.
Here is the Java implementation which returns the lowest absolute difference between \(0\) and sums of subsequences.
The case for an arbitrary \(t\) is a bit trickier. I tried to extend the above solution to handle this case, but that approach wasn’t fruitful. The idea was to again sort the cumulative array, but this time keep two indices – one starting at the very left and one at the very right, and move them together until a best solution is found. What didn’t work with this approach was the lack of knowing of the ordering of the subsequences, since we lost positional information after the sort. I didn’t come up with an alternative solution on my own, and I credit this Quora post for inspiration/solution.
The solution goes as follows:
Let prefix[i]
be the cumulative sum up to \(i\), like in the example above. Then, we need to compute the best subarray ending at \(i\). This means we need to find an index \(j \lt i\) such that the sum is as close as possible to \(t\). This means that this value needs to be as close as possible to prefix[i] - t
, since prefix[i] - prefix[j]
is the sum of the subsequence between [j,i]
. So, for every prefix, we must find another earlier prefix that’s as close in value to prefix[i] - t
. We can use a Java TreeSet
for this which supports \(O(\operatorname{log}(n))\) insertion, lower bound, and upper bound operations.
The working Java code:
The floor(x)
operation returns the greatest element less than or equal to \(x\), and similarly, ceiling(x)
returns the least element greater than or equal to \(x\) in the set. With that, we first check if the smallest element in the set of prefixes is \(\leq\) to our required sum. If it this, we get that prefix since it might improve leastDiff
. Similarly, if the largest element is \(\geq\) our required sum, we find the least element greater than or equal to the sum we need to balance out, and check whether it helps.
This algorithm is Question 10 Column 8 in Programming Pearls by Jon Bentley. Hope it helps.