Calculator Puzzle Algorithm Optimization
Calculate the largest number using all ten digits and all four operators exactly once. Operators are evaluated from left-to-right.
The following is a valid solution to the problem — a bad solution, but valid. It uses each operator once and each digit once.
10 + 2 → 12
- 45 → -33
/ 3 → -11
* 6789 → -74679
= -74679
A classmate showed me this problem a few years ago. I write code, so I wrote a solver. That program ran for ten hours then crashed.
(I encourage you to play around and try find an optimal solution yourself.)
Recently the problem came back into my mind, and I figured I could make something better. After plenty tinkering and optimization, I created a solver which finds the best solution in 3 seconds. This post covers the process of writing the solver, starting in Python and ending in Rust.
Permutation Solver
The simplest solver idea involves representing the sequence of digits and operators as a string, and permuting through them.
The previous example would be written as "10+2-45/3*6789"
which could then be permuted to "12+0-45/3*6789"
.
The largest value is found by permuting through all possible solutions, and evaluating each one.
The heart of the program looks as follows:
from itertools import permutations
def evaluate_str(s):
# implemented in file
def find_max():
# starting value
current_str = "0+1-2*3/456789"
max_value = (0, "")
# use Python builtin permutations function
for permutation in permutations(current_str):
value = evaluate_str("".join(permutation))
if value > max_value[0]:
max_value = (value, permutation)
print(f"New highest {max_value}")
print(f"Highest value {max_value}")
After letting the program run for half an hour the largest solution found was 8437890 from "0+8753*964-2/1"
.
Benchmarking shows that this program crunches around 160 solutions solutions per second and finishes after 18.2 hours.
Recursive Construction Solver
The largest issue with the permutation solver is that it considers invalid solutions.
The string "0123456789+-*/"
is a valid permutation, but not a valid solution.
Incrementally constructing valid solutions would be much more efficient than iterating through permutations.
Before beginning on a new solver, a good understanding what makes a valid solution is required.
A valid solution must:
- contain each operator (+,-,*,/) and each digit (0,1,2,3,4,5,6,7,8,9) without duplicates.
- begin and end with a digit.
- contain no adjacent operators.
- not contain a division by zero.
Constraint (1) is the simplest and is already enforced by the permutation Solver.
Constraint (2) is required because an operator at the edge of a solution doesn’t make sense.
Constraint (3) is required to ensure invalid situations like "1+*3...
cannot occur.
Constraint (4) is primarily an implementation detail handled by the evaluate function.
These constraints allow validating solutions, do not provide an algorithm to construct valid solutions. let’s create one.
To construct a valid solution, start with a digit as per (2).
1
At this point, two valid choices: append a different digit, or append an operator. The actual solver implementation will take all choices simultaneously, but for demonstration a different digit.
10
The two choices still remain. Add operator.
10+
At this point, another digit must be appended as per (3). This process is repeated until all digits and operators have been used as per (1).
One fault with this process, as stated, is that it might reach a dead end. For example:
10+23456789
All digits have been used so no digit can be added, and thus constraints (1) and (2) are not satisfied. Each operator must follow with a digit (3). By keeping track of the number of digits remaining and the number of operators, and by ensuring that the number of digits remaining is no less than the number of operators, this fault is fixed.
Going back to faulty example, this constraint would come into effect at the following point:
10+23456
At which point adding another digit is no longer possible as it would leave two remaining digits, and three remaining operators.
Implementation
The described algorithm is implemented with recursion. Like the first solver, strings are used to store the solutions. In the new solver, however, these strings are built by each digit and operator. Starting from the leftmost character of the string, each valid addition of digit or operator is considered as another recursive call. The base condition, when recursion ends, is a string containing all digits and operators (14 characters).
def evaluate_str(data):
# Implemented in source
max_solution = (0.0, "")
OPERATORS = ["+", "-", "*", "/"]
DIGITS = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
def process_solution(s):
global max_solution
val = evaluate_str(s)
if val > max_solution[0]:
max_solution = (val, s)
print(f"Found new best {max_solution}")
def find_all(s="", rem_nums=10, rem_ops=4):
if len(s) == 14:
# base condition
process_solution(s)
else:
if rem_nums > rem_ops:
# Choose to add a digit to end
for digit in DIGITS:
if digit not in s:
# recurse
n_s = s + digit
find_all(n_s, rem_nums - 1, rem_ops)
# Choose to add operator to end if last char was not
# already an operator
if s and s[-1] not in OPERATORS:
for op in OPERATORS:
if op not in s:
# recurse
n_s = s + op
find_all(n_s, rem_nums, rem_ops - 1)
find_all()
The rate of the new solver is 267,000 solutions per second — a 1.67× improvement from the last. It is expected to complete after 10.9 hours runtime.
Words about optimization
The process of optimizing an algorithm is to show that a slow task is equivalent to a quick task. If two different tasks produce identical results for relevant inputs, then they can be interchanged freely. Compacting stored data — smaller structures, less indirection — usually increases speed. What follows are small observations leading to less data-intensive tasks.
Observation 1: Integers
Representing solutions as strings, as seen in A
, is a holdover from the permutation solver.
Instead, representing solutions as arrays with integers and characters as in B
yields some extra performance.
A = "0+8753*964-2/1"
B = [0, "+", 8753, "*", 964, "-", 2, "/", 1]
Two complications arise from using the representation in B
. First, appending digits to a number is more involved.
To append a digit to an integer, multiply the integer by 10, and then add the new digit.
For example, to append 1
to 8753
:
8753 × 10 = 87530
87530 + 1 = 87531
The second difficulty in using integers is knowing what digits have already been used. Finding the digits in an integer is fairly easy, but requires a loop and division. A better solution is to use a bitset to record which digits have been used. I will not go into detail about the theory of bitsets, but have linked two sources.
With this observation implemented, the rate is 366,000 solutions per second — a 1.37× improvement from the last. It is expected to complete after 8.0 hours runtime.
Observation 2: Operator Permutations
Note the way the operators are placed. The following two solutions are, aside from operator placement, the same.
120 + 456 - 78 * 9 / 3
120 - 456 / 78 + 9 * 3
Indeed, there are 24 different permutations of operators for the five above numbers, but the five numbers are the same for each.
120 456 78 9 3
In the recursive construction algorithm, it takes time for each level of recursion added. Altogether, the operators add four levels of recursion. The algorithm doesn’t really need to construct the operators in the solution. Instead, the algorithm can just construct the list of numbers, then all 24 different permutations of operators can be applied to the solution at once.
In this way, the computation of operators is pre-baked into the code. The function which evaluates each solution becomes much simpler this way:
def apply_op(a,b,c,d,e, i):
if i == 0:
return (((a * b) + c) - d) / e
elif i == 1:
return (((a * b) + c) / d) - e
elif i == 2:
...
elif i == 23:
return (((a / b) - c) + d) * e
And now the find_all(...)
function is concerned just with finding lists of five numbers.
With this observation implemented, the rate is 4.26 million solutions per second — an 11.6× improvement from the last. It completes after a runtime of 41 minutes.
Observation 3: Equivalent Operations
The previous optimization is built around the incorrect assumption that different permutations of operators produce unique values and thus must all be considered. Sure, if we have a fixed list of numbers a,b,c,d,e, then the following different permutations of operators produce different values (left-to-right operations):
a+b-c*d/e
a-b+c*e/d
Despite these being different expressions, their difference is made redundant by the fact that find_all(...)
will permute through each number.
As an example let’s consider the list of numbers [12, 34, 56, 78, 90]
.
If we apply both above permutations of operations to this set, they clearly produce different results.
12+34-56*78/90 = -8.666...
12-34+56*78/90 = 29.466...
See, different!
The redundancy shows when we consider a different list of numbers, [12, 56, 34, 78, 90]
(swapped 2&3), and then apply the permutations of operators to it:
12+56-34*78/90 = 29.466...
12-56+34*78/90 = -8.666...
Since first list is [a,b,c,d,e]
and the second is [a,c,b,d,e]
, and by the commutative property:
Applying these sets of numbers with those permutations of operators clearly produces duplicate results.
find_all(...)
considers these number list permutations, so we don’t need to evaluate both of those operations in this case.
Further, we can generalize this to any permutation of operators where two operators of the same class are applied sequentially.
The below 14 groups create unique results:
Group 1:
(((a + b) - c) * d) / e
(((a - b) + c) * d) / e
(((a + b) - c) / d) * e
(((a - b) + c) / d) * e
Group 2:
(((a * b) / c) + d) - e
(((a * b) / c) - d) + e
(((a / b) * c) + d) - e
(((a / b) * c) - d) + e
Group 3:
(((a * b) + c) - d) / e
(((a * b) - c) + d) / e
Group 4:
(((a + b) / c) * d) - e
(((a + b) * c) / d) - e
Group 5:
(((a - b) * c) / d) + e
(((a - b) / c) * d) + e
Group 6:
(((a / b) + c) - d) * e
(((a / b) - c) + d) * e
Group 7:
(((a * b) + c) / d) - e
Group 8:
(((a * b) - c) / d) + e
Group 9:
(((a + b) / c) - d) * e
Group 10:
(((a + b) * c) - d) / e
Group 11:
(((a - b) * c) + d) / e
Group 12:
(((a - b) / c) + d) * e
Group 13:
(((a / b) + c) * d) - e
Group 14:
(((a / b) - c) * d) + e
Only one operator permutation needs to be evaluated for each group. By removing duplicate lists of operators, the rate grows to 7.1 million solutions per second — a 1.7× improvement from the last. It finishes after 20 minutes.
Rust Implementation
The last few optimizations are more hardware-focused and therefor Rust — a compiled language — is a better fit than Python. I won’t be getting into assembly or SIMD, rather just basic hardware-minded tweaks. The initial rust port looks similar to the last Python version except more verbose, so I’ll just jump to the stats.
The Rust port crunches 255.5 million solutions per second — a 35.0× improvement from the last. It finishes after 41 seconds.
Observation 1: Stack Allocation
The current Rust implementation uses a Vec<u32>
to represent the numbers in a solution.
Vectors (Vec
) don’t directly store data, but instead store a reference to data somewhere on the heap.
Vectors are handy as the can be resized because they’re heap allocated,
but they can be slow because accessing data requires following a reference.
Also, the allocations for a Vector’s data can be slow.
In most programs, Vec
s are suitably quick, however, when constructing, destroying, and appending to them, their drawbacks show.
Arrays are more optimal.
Rust arrays are fixed-sized and are quick to access because they’re stored on the stack.
All the needed features from Vec
can be imitated with a wrapper struct.
struct StackVec {
store: [u32; 5],
len: u8,
}
Each number (of five) in a solution is kept in store
.
The number of currently stored numbers is kept in len
allowing push
and pop
operations.
impl StackVec {
fn push(&mut self, val: u32) {
self.store[self.len as usize] = val;
self.len += 1;
}
fn pop(&mut self) -> u32 {
self.len -= 1;
self.store[self.len as usize]
}
}
By simply replacing Vec
the rate climbs to 648.0 million solutions per second — a 2.5× improvement from the last.
It finishes after 16.4 seconds.
Observation 2: Multithreading
My computer has six cores on which programs can run. Given that the previous implementation runs on a single core, it is reasonable to assume that a multithreaded version would be 6× faster.
The strategy for multithreading is simple. Each thread works idependantly with different starting digits (0 through 9), finds the largest of the solution with that starting digit, and then reports that back. The largest value of all the threads is selected as the optimal solution.
The threading code itself is fairly simple:
// create threads for starting digits 0 through 9
let mut best_solution = BestSolution::default();
let mut handles = (0..=9)
.map(|i| {
thread::spawn(move || {
let mut best_solution = BestSolution::default();
let mut sv = StackVec::default();
sv.push(i);
let du = set_bit(0, i as u16);
find_all(sv, du, 9, 4, &mut best_solution);
best_solution
})
})
.collect::<Vec<_>>();
// wait for threads to finish and find highest value
for handle in handles {
let val = handle.join().unwrap();
if val > best_solution {
best_solution = val;
}
}
println!("Optimal solution found: {best_solution:?}");
As expected, the program is a lot quicker with a rate of 3.8 billion solutions per second — a 5.9× improvement from the last and a 23750× improvement from the first. It finishes in 2.8 seconds.
So, what’s the optimal solution? 14,481,480. It’s a lot larger than I had expected. There are a bunch of ways to make it, but they all follow the form of
(((8 - 0) / 1) + 7) * 965432
At this point, it’s clear the optimal solution will contain -0
and /1
.
It would have been a reasonable optimization to exclude those from computation before, but it felt like cheating.
Future Work
I’d like to approach this problem from the perspective of algorithmic optimization at some point. Brute-force algorithms are nice so that I can sleep tight knowing I have the best solution, but if speed is what counts, I feel that optimization would be quicker. My main inspiration for this is AlphaPhoenix’s video on gerrymandering.
Additionally, I’d like to do some combinatorial analysis on how many solutions the problem has. Using the solver to count them all, I know that 10,485,780,480 valid nonunique solutions exist, and 487,710,720 otherwise valid nonunique solutions exist which contain division by zero. How would I use combinatorics to count these with pen and paper? I don’t know, but I think that would be an interesting article.
End of the line
You know, it’s always a bit frightening publishing an article about computer optimization because there’s always someone who knows something that I don’t. I’m almost certain there’s a trick to solve this problem in less than a second using AWK or something — I would very much enjoy to see this.
feedback is welcome. contact me