2012年3月27日火曜日

Ackermann function and optimization

Have you heard of the Ackermann function?

This function is simple, but if you try to calculate it naively, soon the computational cost increases and it will never ends.

I've written the function naively, in x86 assembly language to demonstrate some optimization techniques. I could calculate up to ack(3, 11) and the stack overflowed beyond that point.

Optimizations

  • Use JMP instead of CALL for tail recursions
  • Use TEST + JE for macro fusion
  • Use ADD / SUB instead of INC / DEC to eliminate false dependency in flag register
  • Use MOV EDX, 1 instead of MOV EDX, 1 to reduce code size
    .code

ack PROC public

    ; RCX = m, RDX = n
    TEST    RCX, RCX
    JE      m_zero

    TEST    RDX, RDX
    JE      n_zero

    PUSH    RCX
    SUB     RDX, 1
    CALL    ack
    POP     RCX
    SUB     RCX, 1
    MOV     RDX, RAX
    JMP     ack

n_zero:
    MOV     EDX, 1
    SUB     RCX, 1
    JMP     ack

m_zero:
    MOV     RAX, RDX
    ADD     RAX, 1
    RET

    ack ENDP

factorial PROC public
    
    MOV     EAX, 1
    CMP     RCX, 1
    JLE     f_ret

f_loop:
    IMUL    RAX, RCX
    SUB     RCX, 1
    CMP     RCX, 1
    JNLE    f_loop

f_ret:

    RET

    factorial ENDP

    END

This program was faster than the C version below by 30%(C version was compiled with Visual C++ 2010 with /O2 option), on a Core 2 Duo processor. The factorial function was almost as fast as the C version.

int64_t factorial_c(int64_t n)
{
    int64_t r = 1;
    while(n > 1) r *= n--;
    return r;
}

int64_t ack_c(int64_t m, int64_t n)
{
    return m == 0 ? n + 1 : n == 0 ? ack_c(m - 1, 1) : ack_c(m - 1, ack_c(m, n - 1));
}

0 件のコメント:

コメントを投稿