The quest of a developer

We programmers are blessed with the search for better algorithms. But how do we know this is the best way of doing things? Are there solutions that take less memory, less CPU time or less code?

My question is, can’t we just automate this search for the better algorithm and spend more time eating pie? Or put more bluntly can we replace the programmer by a program?

The quest backpack

A program is nothing more than a bunch of instructions flung at a interpreter/compiler. Sounds easy right? Just to make the quest a little more specific let’s focus on a common problem. That way we learn more about the nature of the beast.

Behold the burden of many high school children: calculating the solution to a quadratic equation:

-b +/- sqrt(b^2 – 4ac) /2*a

For the equation

ax^2 + bx + c = 0

No need to worry I won’t go into the mathematical detail.

But how would I generate a program that does this for me? I want to restrict the amount of instructions that I’d like to use. To do this I’ll use a language that looks a little like assembly language. With instructions like this:

MOVE R1, R2
MULTIPLY R1, R2 
DIV R1, R2
SQRT R1
INVERT R1
...

I added SQRT so I could easily solve the need for square roots in this problem. The registers themselves are floating point registers.

Now imagine we want to find a program in this programming language, how many options do we have?

Actually that’s quite simple:

#dualRegisterInstructions x all dual register variations (4x4 = 16)
                               +
#unaryRegisterInstructions x all registers (4)

In my case this ends up with 88 possibilities for one instruction. So here it is in scale:

Roughly: 100^#instructions

I had the joy of writing the equation in this language in a mere 12 instructions. This means roughly speaking we’d have to iterate a whole 100^12 programs. This is huge.

If we’d need only 1 ms to evaluate one program we’d only need 3,17E+013 years. This is quite a long time, since the universe is only 1.38E+10 years old.

So maybe it’s a bit inefficient. Surely we can do better right?

Performance improvements

First of all there are useless instructions out there. Like move R1, R1 which means you can move data to the same register. I know, pointless. But how much do we gain?

4 instructions. Yay. Still needs multiple big bangs of time.

But hang on, we have plenty of applications that don’t even contribute at all to the end result. How about we only check the applications that have an influence on the outcome?

I found a way to build programs like that by keeping a set of possible registers.

In the end of the program your last instruction must write to the resulting register. For example MUL R1, R4 with R4 being the last result register. The next instruction (well actually the instruction before) can now only use R1 and R4. This complicates the solution slightly but has some impact on the result.

Here are the numbers of combinations it checked:

#instructions #programs growth rate
1 21
2 777 37
3 38073 49
4 2267769 59,5637065637
5 153217785 67,5632240321
6 11271120311,2262 73,5627415004
7 874213552448,044 77,5622589688

Blue numbers are guessed. So even with my neat trick the result isn’t reasonable to wait for.

If you are thinking A* then don’t be mistaken, there is no reason to think that efficient and new solutions are remotely close to each other. I’m still looking for a better way of cutting off useless parts of the program. Maybe crafting the code generator itself is a trade I’ll need to learn.

How the code generator works

The generator creates all kinds of instructions. But rather than explicitly writing out files with the instructions and compiling/interpreting them I just skipped this phase. I keep the instructions in memory and evaluate them there.

I also need to test the programs. To do this I supply input rows with initial register values and the expected value in the result register. If each row its result register contains the correct value, well then we have a candidate solution.

This works remarkably quick. Most programs evaluate in less than a millisecond.

To limit the amount of register values we don’t allow setting explicit values in registers. This because this would make the amount of possible values extremely high. Small numbers are easily reproduced however with a few small tricks.

Conclusion

Well, people that are writing algorithms won’t be replaced by automatic code generation tools any time soon. At least not for big programs. I must admit though that there is still room for improvement in this simple tool.

So can we replace the programmer by a program? Well, not really. There is however  a new way of programming in which you specify the outcome and guide the searching algorithm to efficient programs. Ironically this would again need a programmer.

The technical implementation is here:

https://github.com/denshade/CodeGeneration

Further work

  • Now we do code golf. We find the smallest program. We could also search for the fastest program/least memory.
  • We currently don’t do any control logic whatsoever. I know this is a huge limitation.
  • Since everyone is doing it we are obliged to try out deep learning.
  • Is there a way to be fairly certain about the best program rather than checking them all?
  • No really, is there a more efficient way of calculating the quadratic solution?