Fixed-point math in C#, part 1

Introduction

Recently I've been working on a side project in Unity that required using fixed-point math instead of floating point. With that in mind, I thought I'd cover the creation of a fixed-point math lib from scratch in C# so others can see how it's done and why.

What is fixed-point math?

You've no doubt heard of floating point math. It's the standard in computing for doing math with decimal numbers. Well, fixed-point math is what they did in games back when floats didn't exist or were too slow. The idea behind fixed point is that you use whole integers for all numbers, but you pretend that a decimal exists at a certain digit. To explain why this works, I'll use base 10 as an example.

Let's add the numbers 2.53 and 3, using fixed point math. The idea is this: we'll add two whole numbers instead of two decimals, so instead of adding 2.53 and 3 we're actually going to add 253 and 300 together. This gives us 553. Now just put the decimal point back where it was - the answer is 5.53. Subtracting works the same way. Of course, none of that was probably new to you. But, what about multiplying and dividing?

Here's where we introduce the Qn notation. Qn will be used to denote how many decimal places there are, where n is number of decimal places. For instance, 2.53 is a Q2 number. So, let's multiply 2.53 (a Q2 number) with 3 (a Q0 number). Again, we'll use whole numbers:

253 * 3 = 759

But how do we put the decimal place back? Well, if the lefthand side was a Qx number and the righthand side was a Qy number, the answer will be a Q(x+y) number. Since 3 was a Q0 number and 2.53 was a Q2 number, the answer is a Q2 number and is therefore 7.59, but let's use a decimal for both this time. So we'll use 3.5 instead, a Q1 number:

253 * 35 = 8855

The answer is a Q3 number this time, because it's Q(2+1). Therefore, the answer is 8.855.

This, in a nutshell, is precisely how fixed point math works in a computer (except it's in base 2 instead of base 10). The idea is you take a whole integer, and then decide how many bits to allocate to the whole and fractional portions of that number.

Why on earth do I need this?

Like I said, I recently started working on a project which actually did require fixed point. The project is actually an RTS prototype with support for multiplayer. The thing about an RTS is they often have at least many hundreds of units, if not thousands, at any given time. That's WAY too much to directly serialize over the network - nobody on the planet has internet that good. So instead pretty much every RTS game settles for something called lockstep simulation. The idea is to not serialize game state whatsoever, and instead only serialize player actions. Every client then takes those actions and simulates the game, resulting in the same game state for everyone. This works as long as the game simulation is 100% deterministic - given the same input, it will have the exact same output no matter when it's simulated or what machine it's simulating on.

Therein lies the problem with floats. They are not deterministic. Due to various rules and FPU implementation quirks, floats can have ever-so-slightly different values not just from one machine to the next, but even subsequent runs on the very same machine. The problem with ever-so-slightly different is that it compounds and over time becomes significantly different. This results in diverging simulations, completely destroying any hope of lockstep simulation.

Fixed-point, on the other hand, uses pure integers. Integers are 100% deterministic. 2+2 is always 4, no matter what computer does that calculation. Therefore, fixed point math solves the determinism problem and allows us to do lockstep simulation.

On to the implementation details!

Alright. The first thing is we're going to declare a struct type, 'Fixed'. This struct holds our fixed point value. We're going to be defining a 32-bit Q16 fixed point number (16 bits allocated to the decimal portion).

public struct Fixed
{
    public Int64 rawValue; // the raw Q16 value
}

Now, even though I decided to use a 32-bit 16.16 fixed point number, I'm using a 64-bit value to hold it. The reason for this is because I wanted to avoid integer overflow (which, in early testing, cropped up in a few not-so-unusual cases that didn't even involve large numbers). Another option would have been to leave the internal number as a regular Int32 but cast to Int64 when doing arithmetic, but if you were to know me in real life you'd know that I'm lazy as hell and it's easier just to leave it as an Int64 and avoid the casting.

Next, some helper functions. We're going to define functions which let us extract the whole integer portion, the decimal portion, the decimal portion as a float, and the entire number as a float (remember, this is meant to hook up with Unity, so it's convenient if we can convert an object's fixed-point position into floating point as a transform's position, as an example).

First, the integer portion. This is easy - we just shift away the decimal portion. Old games did this with fixed point object positions to easily calculate the whole integer pixel position of an object's sprite. Remember that we have a Q16 number, so we just shift right by 16 bits. Well, almost. Actually, negative numbers are a little wonky - for example, -10.5 will have an integer portion of -11 and a decimal portion of 0.5 (-11 + 0.5 = 10.5). So we actually need to account for this.

public Int16 IntValue
{
    get
    {
        return (Int16)( this.rawValue >> 16 ) + ( this.rawValue < 0 ? 1 : 0 );
    }
}

Next, the decimal portion. First we're just going to get the raw integer value of the decimal portion, which can be done with bit masking:

public Int16 RawDecimalValue
{
    get
    {
        return (Int16)( this.rawValue & 0xffff );
    }
}

Second, we can use this value to calculate a floating-point version with some simple division, which will produce a number in the range of 0.0 to 1.0:

public float DecimalValue
{
    get
    {
        return this.RawDecimalValue / (float)( 1 << 16 );
    }
}

And finally, we can use these functions to produce a full floating-point value by combining the whole and decimal portions (note in this case we don't use IntValue directly - remember, in the case noted previously adding 0.5 to -11 produced -10.5 which was the actual number represented, which is the behavior we want here):

public float FloatValue
{
    get
    {
        return (int)(this.rawValue >> DECIMAL_SHIFT) + ( this.DecimalValue );
    }
}

Now, we also need some functions to be able to initialize Fixed numbers too. We will expose some static functions which do this for us. First, a function to convert a whole integer to fixed point:

public static Fixed FromWholeInt( int value )
{
    Fixed fixed = new Fixed();
    fixed.rawValue = ( value << 16 );
    return fixed;
}

Note that this function does the exact opposite as converting from fixed to whole integer - instead of shifting away the decimal portion, it shifts in the opposite direction to add a decimal portion.

We can also use a function to initialize a fixed point number from a float (should be used with caution, since we don't want to introduce floating point inconsistency, but is still a useful function to have):

public static Fixed FromFloat( float value )
{
    int wholePortion = (int)System.Math.Truncate( value );
    int decimalPortion = (int)( System.Math.Abs(value - wholePortion) * (1 << 16) );

    if (wholePortion < 0)
        wholePortion--; // If we subtracted 0.5 from -10 or added 0.5 to -11 we'd get integer portion of -11 and decimal portion of 0.5, so we ensure the same behavior exists in a float conversion.

    Fixed fixed = new Fixed();
    fixed.rawValue = (wholePortion << 16) | (decimalPortion & 0xffff);
    return fixed;
}

Next, a function to initialize from a fixed-point integer value (this is helpful for if we need to serialize and deserialize a fixed point value). We'll have the constructor do this one:

public Fixed( int rawValue )
{
    this.rawValue = rawValue;
}

And finally a few explicit casts, just to help make later code more concise (I'm choosing explicit casts instead of implicit casts in order to reduce the chance of accidental casts that weren't strictly intended):

public static explicit operator Fixed( int wholeInteger )
{
    return Fixed.FromWholeInt( wholeInteger );
}

public static explicit operator int( Fixed fixedInt )
{
    return fixedInt.IntValue;
}

public static explicit operator Fixed( float floatVal )
{
    return Fixed.FromFloat( floatVal );
}

public static explicit operator float( Fixed fixedInt )
{
    return fixedInt.FloatValue;
}

Now, we can convert a number of values to and from our custom Q16 fixed point format easily. However, we currently cannot do any arithmetic on these values.

In the next post, I'll cover how to use operator overloading in order to allow us to add, subtract, multiply, and divide fixed-point values just as if they were a built-in value type.

EDIT (1/24/2017) / (1/25/2017)

I made a slight error in the float conversion which resulted in incorrect values when converting negative numbers to float. Additionally, the first attempt at a fix was also incorrect, as it had not been tested with whole number values of 0. It has been fully fixed and tested, and I apologize for any inconvenience or confusion.

EDIT (1/25/2017)

Thanks to CrappyCodingGuy for pointing out the problem with my decimal masking. You want to mask with 0xffff, not 0xff00, since we have a Q16 fixed point number but masking with 0xff00 masks out everything but one byte. That, too, has been fixed, and I apologize for any confusion. The lesson here is don't do bitmath at 2:00AM, kids (which is when I had originally written the code this tut series is based on).

Additionally, the code for initializing from float numbers wasn't fully tested with negative numbers. This, too, has been resolved with a simple Abs call.

EDIT #2 (1/25/2017)

OK seriously, how many times can I make errors here. This is starting to get embarrasing. Fixed (hah) issues with conversion again. Turns out, after testing, a number like -10.5 is actually represented with an integer portion of -11 and a decimal portion of 0.5 (-11 + 0.5 = -10.5). So I fixed some of the conversion functions to replicate this behavior properly.

Posted in Coding & Dev on Jan 15, 2017


blog comments powered by Disqus
Subscribe for updates
RSS
Categories