E.g., when converting from floating-point samples to 16 bits, there is no universal agreement on the scale. Some (e.g., JACK) think that 1.0 should convert to 32767 and -1.0 to -32767. Within that model, there are two opinions how to get -32768 (the most negative integer representable in 16 bits): "-1.00003" or "you can't". The other viewpoint, implemented, e.g., in ALSA-lib, is that -1.0 should map to -32768, 0.99997 should give 32767 and 1.0 should clip. There is also a third possibility, namely, to use different scale factors for negative and positive sample values, so that 0.0 maps to 0, 1.0 maps to 32767 and -1.0 maps to -32768.

And now some code. If one adopts the 32767.0 scale factor (as in JACK) and declares the -32768 sample value as being unreachable, one can use the following C function for conversion:

int16_t to_int16(float f)

{

if (f > 1.0) f = 1.0;

if (f < -1.0) f = -1.0;

return (int16_t)(f * 0x7fff);

}

As expected, to_int16(1.0) == 32767 and to_int16(-1.0) == -32767. However, the following quite similar function that is supposed to convert floating-point samples to 32-bit signed ones doesn't actually work properly:

int32_t to_int32(float f)

{

if (f > 1.0) f = 1.0;

if (f < -1.0) f = -1.0;

return (int32_t)(f * 0x7fffffff);

}

Indeed, to_int32(1.0) returns the minimum, not the maximum possible 32-bit integer. That happens because, unlike 0x7fff, one cannot represent 0x7fffffff exactly as a floating-point number. So the nearest representable one is substituted, and that's 0x80000000. So the overflow happens, and the function returns a wrong result.

One fix would be to cast 0x7fffffff to a double-precision floating point number. Another (but not 100% equivalent) option is to change the scale factor so that it is close enough, but doesn't cause this overflow, i.e. to 0x7fffff00.

## 2 comments:

There's also a fourth possibility that can losslessly represent all 16 bit values without requiring 2 different scale factors:

When converting int16 samples to float, you first add 0.5 and then divide by 32767.5. When converting float to int16, you multiply by 32767.5, then subtract 0.5 and at the end round fairly to integer (or skip the subtraction and floor the value instead).

That means 32767 converts to 1.0 (and vice versa) and -32768 converts to -1.0. The only downside is that neutral amplitude is not exactly representable this way on the integer side (lying exactly in between -1 and 0 in that case), but it's still damn close and should pose a problem in practice.

Niels, it would be nice if you also mention any software that implements this (entirely valid) possibility.

Post a Comment