Have you heard of the Ackermann function?
This function is simple, but if you try to calculate it naively, soon the computational cost increases and it will never ends.
I've written the function naively, in x86 assembly language to demonstrate some optimization techniques. I could calculate up to ack(3, 11) and the stack overflowed beyond that point.
Optimizations
- Use JMP instead of CALL for tail recursions
- Use TEST + JE for macro fusion
- Use ADD / SUB instead of INC / DEC to eliminate false dependency in flag register
- Use MOV EDX, 1 instead of MOV EDX, 1 to reduce code size
.code
ack PROC public
; RCX = m, RDX = n
TEST RCX, RCX
JE m_zero
TEST RDX, RDX
JE n_zero
PUSH RCX
SUB RDX, 1
CALL ack
POP RCX
SUB RCX, 1
MOV RDX, RAX
JMP ack
n_zero:
MOV EDX, 1
SUB RCX, 1
JMP ack
m_zero:
MOV RAX, RDX
ADD RAX, 1
RET
ack ENDP
factorial PROC public
MOV EAX, 1
CMP RCX, 1
JLE f_ret
f_loop:
IMUL RAX, RCX
SUB RCX, 1
CMP RCX, 1
JNLE f_loop
f_ret:
RET
factorial ENDP
END
This program was faster than the C version below by 30%(C version was compiled with Visual C++ 2010 with /O2 option), on a Core 2 Duo processor. The factorial function was almost as fast as the C version.
int64_t factorial_c(int64_t n)
{
int64_t r = 1;
while(n > 1) r *= n--;
return r;
}
int64_t ack_c(int64_t m, int64_t n)
{
return m == 0 ? n + 1 : n == 0 ? ack_c(m - 1, 1) : ack_c(m - 1, ack_c(m, n - 1));
}