Have you heard of the Ackermann function?
This function is simple, but if you try to calculate it naively, soon the computational cost increases and it will never ends.
I've written the function naively, in x86 assembly language to demonstrate some optimization techniques. I could calculate up to ack(3, 11) and the stack overflowed beyond that point.
Optimizations
- Use JMP instead of CALL for tail recursions
- Use TEST + JE for macro fusion
- Use ADD / SUB instead of INC / DEC to eliminate false dependency in flag register
- Use MOV EDX, 1 instead of MOV EDX, 1 to reduce code size
.code ack PROC public ; RCX = m, RDX = n TEST RCX, RCX JE m_zero TEST RDX, RDX JE n_zero PUSH RCX SUB RDX, 1 CALL ack POP RCX SUB RCX, 1 MOV RDX, RAX JMP ack n_zero: MOV EDX, 1 SUB RCX, 1 JMP ack m_zero: MOV RAX, RDX ADD RAX, 1 RET ack ENDP factorial PROC public MOV EAX, 1 CMP RCX, 1 JLE f_ret f_loop: IMUL RAX, RCX SUB RCX, 1 CMP RCX, 1 JNLE f_loop f_ret: RET factorial ENDP END
This program was faster than the C version below by 30%(C version was compiled with Visual C++ 2010 with /O2 option), on a Core 2 Duo processor. The factorial function was almost as fast as the C version.
int64_t factorial_c(int64_t n) { int64_t r = 1; while(n > 1) r *= n--; return r; } int64_t ack_c(int64_t m, int64_t n) { return m == 0 ? n + 1 : n == 0 ? ack_c(m - 1, 1) : ack_c(m - 1, ack_c(m, n - 1)); }