|
Automatic CPU Detection
Submitted by |
In many math libraries there are different functions optimized for specific
processors. For example a square root approximation using SSE takes only a
few clock cycles, compared to dozens or even hundreds of clock cycles for a
version that does not use SSE. But the benefits of using SSE can be in vain
if we constantly have to check for processor support.
Here I present you a method that does not only eliminate this inefficiency,
but also does it automatically:
#include "CPUID.hpp"
#include <math.h
float fsqrtInit(float x); // Forward declaration
static float (*fsqrt)(float) = fsqrtInit; // Function pointer
float sqrtSSE(float x) // Fast SSE implemenation
{
__asm
{
movss xmm0, x
rsqrtss xmm0, xmm0
rcpss xmm0, xmm0
movss x, xmm0
}
return x;
}
float fsqrtInit(float x) // Initialization function
{
if(CPUID::supportsSSE())
{
fsqrt = sqrtSSE;
}
else
{
fsqrt = sqrtf;
}
return fsqrt(x);
} |
The trick is to use a function pointer, here fsqrt ("fast sqrt"). When SSE
support is detected, we make it point to our SSE implementation, else we
point it to the regular sqrtf function. We could do this in an
initialization function that is called at the beginning of our application,
but this is easy to forget and cumbersome. My solution is to initialize the
fsqrt function pointer with the initialization function itself!
So when fsqrt is called the first time, it actually calls the initialization
function and then the correct square root function. The second time it is
called, it does not check for SSE support any more, but immediately calls
the best square root function. So the initialization is done exactly once
and you never have to worry about it.
But it obvioulsy does not support inlining. This isn't too bad because a
function call is very optimized on modern processors and much faster than a
mispredicted jump. Furthermore, if you need top performance, you shouldn't
use this trick for such small functions, but optimize the whole the
algorithm. As with all optimizations, only apply it if it matters and when
it matters.
Regards,
Nicolas "Nick" Capens
|
The zip file viewer built into the Developer Toolbox made use
of the zlib library, as well as the zlibdll source additions.
|