Tuesday, March 9, 2010

#gcc: speed-tests, inline assembly & volatile declarations

here are some quick tests with gcc (3.x / 4.x) and axonlib.
note: "axAbs" (alt: fabs) and "axExp" (alt: expf) are part of "axMath.h"

[*] gcc speed comparison (win32)

gcc-3.4.2 is the official mingw build, while gcc-4.4.1 is an unofficial build.

test::

n of iterations:
10e+6

calculation:
expf(axAbs(-3.141592))

flag: -O2
gcc-3.4.2: 143 ms
gcc-4.4.1: 6 ms

flag: -O3
gcc-3.4.2: 130ms
gcc-4.4.1: 5ms

conclusion:
gcc-4.4.1 generates faster code.

[*] gcc: macros vs inline functions.

an article with some more in depth information.

test::

n of iterations:
10e+6

calculation:
axAbs(x) in the forms of macro vs inline and expf(x) for weight.


#define t0(x) \
({ \
  register float _x; \
  __asm__ ( "andl $0x7fffffff, %0;" : "=r" (_x) : "0" ((float)x) ); \
  expf(_x); \
})

// -- vs --

inline float t1(float x)
{
  __asm__ ( "andl $0x7fffffff, %0;" : "=r" (x) : "0" (x) );
  return expf(x);
}


flag:
-O2

gcc-3.4.2:
macro: 143ms
inline: 145ms

gcc-4.4.1:
macro: 5ms
inline: 6ms

conclusion:
inline functions are safer. not much difference in performance.

[*] gcc: to "volatile" or not and its impact on performance.

a volatile declared function or variable will not be optimized.

quote:
(on writing code for hardware control) Any self-respecting optimizing compiler would notice that the loop tests the same memory address over and over again. It would almost certainly arrange to reference memory once only, and copy the value into a hardware register, thus speeding up the loop. This is, of course, exactly what we don't want; this is one of the few places where we must look at the place where the pointer points, every time around the loop.

test:::
asm volatile (..) vs asm (..)

calculation:
axExp(x)

n of iterations:
10e+7

flag::
-O2

gcc-4.4.1:
asm volatile: 298ms
asm: 2ms

gcc-3.4.2:
asm volatile: 291ms
asm: 13ms

flag::
-O3

gcc-4.4.1:
asm volatile: 297ms
asm: 2ms

gcc-3.4.2:
asm volatile: 290ms
asm: 12ms

conclusion:
functions declared as "volatile" perform much slower from both compilers.

----------
tested with athlon xp 1800+

lubomir

0 comments: