## 2012年3月27日火曜日

### Ackermann function and optimization

Have you heard of the Ackermann function?

This function is simple, but if you try to calculate it naively, soon the computational cost increases and it will never ends.

I've written the function naively, in x86 assembly language to demonstrate some optimization techniques. I could calculate up to ack(3, 11) and the stack overflowed beyond that point.

### Optimizations

• Use JMP instead of CALL for tail recursions
• Use TEST + JE for macro fusion
• Use ADD / SUB instead of INC / DEC to eliminate false dependency in flag register
• Use MOV EDX, 1 instead of MOV EDX, 1 to reduce code size
```    .code

ack PROC public

; RCX = m, RDX = n
TEST    RCX, RCX
JE      m_zero

TEST    RDX, RDX
JE      n_zero

PUSH    RCX
SUB     RDX, 1
CALL    ack
POP     RCX
SUB     RCX, 1
MOV     RDX, RAX
JMP     ack

n_zero:
MOV     EDX, 1
SUB     RCX, 1
JMP     ack

m_zero:
MOV     RAX, RDX
RET

ack ENDP

factorial PROC public

MOV     EAX, 1
CMP     RCX, 1
JLE     f_ret

f_loop:
IMUL    RAX, RCX
SUB     RCX, 1
CMP     RCX, 1
JNLE    f_loop

f_ret:

RET

factorial ENDP

END```

This program was faster than the C version below by 30%(C version was compiled with Visual C++ 2010 with /O2 option), on a Core 2 Duo processor. The factorial function was almost as fast as the C version.

```int64_t factorial_c(int64_t n)
{
int64_t r = 1;
while(n > 1) r *= n--;
return r;
}

int64_t ack_c(int64_t m, int64_t n)
{
return m == 0 ? n + 1 : n == 0 ? ack_c(m - 1, 1) : ack_c(m - 1, ack_c(m, n - 1));
}
```