"2) most of them are designed to work for the full range of arguments. Often you know that you only need to work on a limited range of one parameter, and you can find much simpler versions for limited ranges. "
June 13, 2009 at 11:13 PM
.. can be turned into multiplies and shifts as I'm sure you know. Often on x86 this is done most efficiently
through use of the "mul high" capability (the fact that you get 64 bit multiply output for free and can just
grab the top dword).
Sadly, C doesn't give you a way to do mul high, AND stupidly MS/Intel don't seen to provide any intrinsic to
do it in a clean way. After much fiddling I've found that this usually works on MSVC :
uint32 Mul32High(uint32 a,uint32 b)
{
return (uint32)( ((uint64) a * b) >>32);
}
but make sure to check your assembly. This should assemble to just a "mul" and "ret edx".
Now, for the actual divider lookup, there are lots of references out there on the net, but most of them
are not terribly useful because 1) lots of them assume expensive multiplies, and 2) most of them are designed
to work for the full range of arguments. Often you know that you only need to work on a limited range of
one parameter, and you can find much simpler versions for limited ranges.
One excellent page is :
Jones on Reciprocal Multiplication
(he actually talks about the limited range simplifications, unlike the canonical references).
The best reference I've found is the AMD Optimization Guide. They have a big section about the theory,
and also two programs "sdiv.exe" and "udiv.exe" that spit out code for you! Unfortunately it's really
fucking hard to find on their web site. sdiv and udiv were shipped on AMD developer CD's and appear to
have disappeared from the web. I've found one
FTP Archive here . You can find the AMD PDF's on their site, as these links may break :
PDF 1 ,
PDF 2 .
And actually this little CodeProject
FindMulShift is not bad; it just does a brute force search for simple mul shift approximations for
limited ranges of numerators.
But it's written in a not-useful way. You should just find the shift that gives you maximum range. So I
did that, it took two seconds, and here's the result for you :
__forceinline uint32 IntegerDivideConstant(uint32 x,uint32 divisor)
{
ASSERT( divisor > 0 && divisor <= 16 );
if ( divisor == 1 )
{
return x;
}
else if ( divisor == 2 )
{
return x >> 1;
}
else if ( divisor == 3 )
{
ASSERT( x < 98304 );
return ( x * 0x0000AAAB ) >> 17;
}
else if ( divisor == 4 )
{
return x >> 2;
}
else if ( divisor == 5 )
{
ASSERT( x < 81920 );
return ( x * 0x0000CCCD ) >> 18;
}
else if ( divisor == 6 )
{
ASSERT( x < 98304 );
return ( x * 0x0000AAAB ) >> 18;
}
else if ( divisor == 7 )
{
ASSERT( x < 57344 );
return ( x * 0x00012493 ) >> 19;
}
else if ( divisor == 8 )
{
return x >> 3;
}
else if ( divisor == 9 )
{
ASSERT( x < 73728 );
return ( x * 0x0000E38F ) >> 19;
}
else if ( divisor == 10 )
{
ASSERT( x < 81920 );
return ( x * 0x0000CCCD ) >> 19;
}
else if ( divisor == 12 )
{
ASSERT( x < 90112 );
return ( x * 0x0000BA2F ) >> 19;
}
else if ( divisor == 13 )
{
ASSERT( x < 98304 );
return ( x * 0x0000AAAB ) >> 19;
}
else if ( divisor == 13 )
{
ASSERT( x < 212992 );
return ( x * 0x00004EC5 ) >> 18;
}
else if ( divisor == 14 )
{
ASSERT( x < 57344 );
return ( x * 0x00012493 ) >> 20;
}
else if ( divisor == 15 )
{
ASSERT( x < 74909 );
return ( x * 0x00008889 ) >> 19;
}
else if ( divisor == 16 )
{
return x >> 4;
}
else
{
CANT_GET_HERE();
return x / divisor;
}
}
Note : an if/else tree is better than a switch() because we're branching on constants. This should all get
removed by the compiler. Some compilers get confused by large switches, even on constants, and fail to remove
them.
7 seems to be the worst number for all of these methods. It only works here up to 57344 (not quite 16 bits).
"06-13-09 - Integer division by constants"
4 Comments -
Hi,
Actually compiler already does this for you... ;)
For example this is what compiler generate for:
return a/15;
MSVC:
mov eax, -2004318071
mul DWORD PTR _a$[esp-4]
shr edx, 3
mov eax, edx
GCC:
9 0003 BA898888 movl $-2004318071, %edx
9 88
10 0008 8B4508 movl 8(%ebp), %eax
11 000b F7E2 mull %edx
12 000d C1EA03 shrl $3, %edx
13 0010 89D0 movl %edx, %eax
Anyway, check this out:
http://hackersdelight.org/magic.htm
BKLA
June 13, 2009 at 3:26 PM
Sigh.
June 13, 2009 at 4:08 PM
The canonical reference for this stuff is Granlund and Montgomery's "Division by invariant integers using multiplication" from 1994Ö
http://portal.acm.org/citation.cfm?id=178249
Peter Montgomery did the math, Torbjörn did the implementation in gcc.
June 13, 2009 at 10:25 PM
"2) most of them are designed to work for the full range of arguments. Often you know that you only need to work on a limited range of one parameter, and you can find much simpler versions for limited ranges. "
June 13, 2009 at 11:13 PM