Forward Plus Rendering

An proof of concept designed to illustrate the basics of Tiled Forward+ rendering in Unity.

The scope of this project was designed to demonstrate the basics of the Tiled Forward+ rendering technique, taking advantage of DX11-class hardware features

The idea behind Tiled Forward+ rendering is to divide the screen into a grid and create a linked list of lights for each grid cell. This linked list is then used by otherwise traditional forward shaders to perform lighting.

Tiled Forward+ solves similar problems that traditionally Deferred Rendering has been used for, but with several added bonuses. Objects can now take advantage of a large number of lights while still using arbitrary fragment shaders and custom lighting models (for example, clear coat, anistropic hair, or subsurface scattering which were previously difficult in a deferred context). Transparent objects can also take advantage of Forward+
Additionally, as Forward+ uses mostly standard forward rendering to render the final image, it can also take advantage of hardware MSAA, which is impossible in deferred rendering.

My tiled forward+ solution is as follows:

  1. First, in the depth pre-pass phase, render a dual-channel floating point buffer containing linearized depth. R contains opaque depth, G contains opaque + transparent depth (explained in next step).
  2. Pass this texture to a compute shader, along with frustum corners. Additionally binds a buffer containing all scene lights, and two buffers which together will form the linked list output of the compute shader.
  3. The compute shader runs on a 64x64 grid, in 32x32x1 thread groups. For each grid cell, the compute shader finds the minimum and maximum depth of that cell and forms a new frustum constrained to that cell and capping the near and far planes. The R channel is used to find max depth, and the G channel is used to find min depth. This was done so that the frustum near plane would expand to include transparent objects.
  4. The compute shader then iterates over each light, checking it against the generated frustum. Any light intersecting the frustum is added to the linked list for that cell.
  5. Additionally, at this point, each light in the scene renders its shadowmap into a global shadow atlas texture. Spotlights, similarly, blit their cookie texture into a light cookie atlas.
  6. The linked list and scene light buffers are also bound as global shader buffers. Once the compute buffer runs, the scene is then rendered as normal. Each object has a shader which selects cell based on screen position, retrieves the start of the linked light list, and iterates each light in that cell (calculating attenuation, sampling the shadowmap from the atlas, and multiplying light cookies).
The actual buffers in question are _SceneLightBuffer, _LinkedLightBuffer, and _LinkedLightOffsetBuffer. _SceneLightBuffer is simply an array of light structs, paired with a _LightCount parameter.
_LinkedLightBuffer is a large buffer which contains the linked lists for each cell. Each entry is two uints, one being an index into the _SceneLightBuffer and the other being a pointer to the next entry in the _LinkedLightBuffer (or a special value which is interpreted as 'none'). It is sized at 64x64x128, 128 being a somewhat arbitrary "average light per cell" value. At an element stride of 8 bytes, this puts _LinkedLightBuffer at about the same memory footprint as an uncompressed 1024x1024 RGBA32 texture (4mb).
_LinkedLightOffsetBuffer is a 64x64 buffer of uints, one per cell. Each uint stores an index into the _LinkedLightBuffer, allowing a fragment shader to easily look up the starting element of the linked list for any grid cell.

In the following demo video, the classic Sponza scene is lit with a directional light with 4 shadow cascades and 3x3 tap PCF filtering, a shadow-casting spot light with simpler 5-tap PCF filtering, and 64 point lights, all at 1080p with 4x MSAA. Additionally, Unity's internal lightmap is sampled for indirect lighting (SH light probes are also supported, but not shown here).
One potential extra step I've experimented with (but isn't shown here) is to use MRTs during the forward rendering step to also render out normal and specular G-Buffers, to be used for various other effects like screen-space raytracing (similar to DOOM). A downside to this approach is that MSAA cannot be used, so a post-process-based AA technique such as Temporal AA would have to be used instead.