Googles appar
Huvudmeny

Post a Comment On: cbloom rants

"05-26-09 - Storing Floating Points"

7 Comments -

1 – 7 of 7
Blogger won3d said...

The observation that floats form a piecewise linear approximation of lg is a neat one, and is how Blinn and the old school guys used to do wacky approximations of various transcendentals. Always a crowd pleaser.

But regarding negative exponents, exponents are stored in IEE754 as biased numbers, not 2s-complement. So, isn't your "gap at 0" a question of degrees? Like, correctable via a power-of-2 multiplicative bias (aka scalb). Then, wouldn't your "denorms" just be actual denorms?

May 26, 2009 at 4:10 PM

Blogger cbloom said...

Yeah if I understand you correctly I believe you are right.

Basically with a normal IEEE 32 bit float you have 127*2^23 values between 1.0 and 0. You can reduce that gap by multiplying your number by a large negative power of two.

So, like if you want almost no gap, you multiply your number by 2^-127 first, then I can just grab the E & M straight out of the floating pointer number and it will do the denorm for me.

May 26, 2009 at 4:42 PM

Blogger ryg said...

That should work, but that limits your code to platforms and environments that are really IEEE compliant; e.g. SPUs don't have denormals IIRC and on PCs, you can switch SSE computations to "flush to zero" mode where denormals are automatically truncated to zero instead of going through a painfully slow microcoded path. Other CPUs have a similar mode. You'd have to be careful to make sure that you're in IEEE compliant mode before using real denormals, and even so, brace yourself for a serious performance impact - denormals are very rare in normal computations, so the hardware implementations are seriously slow, as in "microcoded with no concurrent completion of other FP ops" slow. If there's a significant percentage of values in that range (and after all that's the whole point), you're most likely better off implementing this manually. That's something I'd check, though, but at least a few years ago, the penalty for having lots of denormals in your calculation were severe - I've seen slowdowns of factor 200 and more. (DSP apps like IIR filters and comb filters are notorious for this: if your source signal goes to zero and stays there for a few seconds, the time spent in these filters will suddenly explode as the state variables all go denormal around the same time, unless you manually clamp them to zero).

May 26, 2009 at 4:50 PM

Blogger won3d said...

Yeah, it is probably no good to depend on IEE754 compliance or the lack thereof. Then again, I think something like "flush to zero" is pretty common on the platforms you likely care about. Maybe you can just use that as part of the quantizer.

May 26, 2009 at 4:59 PM

Blogger cbloom said...

addendum :

I just discovered this is basically the Lindstrom "fast efficient floating point" method.

May 27, 2009 at 5:58 PM

Blogger cbloom said...

see:

http://www.cc.gatech.edu/~lindstro/

http://www.cs.unc.edu/~isenburg/

in particular :

http://www.cs.unc.edu/~isenburg/lcpfpv/

May 27, 2009 at 6:03 PM

Blogger Tom Forsyth said...

The graphics formats like float16, float11/10, and the 360-specific one that is 7e3 all use tricks like this to focus precision where it's needed - though there's some argument as to whether 7e3 did a good job or if 6e4 would have been a better choice.

As well as changing bit sizes, many of these have biases on the exponents to focus the precision away from 0-1, and all require full-speed support of denormals.

May 28, 2009 at 8:51 AM

You can use some HTML tags, such as <b>, <i>, <a>

This blog does not allow anonymous comments.

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.