Shadow Map Silhouette Revectorization (SMSR)

Shadow Map Silhouette Revectorization

Shadow Map Silhouette Recevorization (SMSR) is a filtering technique which re-approximates shadow silhouette based on MLAA implementation for GPU. SMSR consists of two main passes. First pass searches for discontinuity information. Second pass determines the discontinuity length, orientated normalized discontinuity space, fills new edge area and eventually merges lighting or image buffer with new edge information.


  • SMSR v1.10. In second pass, calculating word-space to light-view space only once per fragment. SMSR filtering time on GTX 580 1920×1200, from 1.50ms to 1.25ms and 1.21ms on a non-presentation version.




author = {Bondarev, Vladimir},
title = {Shadow Map Silhouette Revectorization},
booktitle = {Proceedings of the 18th Meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games},
series = {I3D ’14},
year = {2014},
isbn = {978-1-4503-2717-6},
location = {San Francisco, California},
pages = {162–162},
numpages = {1},
url = {},
doi = {10.1145/2556700.2566651},
acmid = {2566651},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {projection aliasing, real-time, shadow mapping},

Shadow Mapping


Figure 1 – A scene with severely undersampled shadows (64×64 shadow map).

Standard shadow mapping algorithm consist of two main steps:

  1. Generating the shadow map
    Figure 2 - Shadow map 64x64 samples, upscaled 4x.

    Figure 2 – Shadow map 64×64 samples, upscaled 4x. Uniform sample distribution.

    A shadow map is generated by rendering “shadow casting” geometry from the light-view-perspective and saving the distance to the nearest surface (from the light-view) into the depth-buffer. So the “shadow map” is basically a texture containing rays. Ray-position and direction is represented by the pixel-coordinate multiplied by the light-view-matrix and ray-length is the depth.

  2. Light Visibility Test
    The light-visibility test is performed by comparing the distance between camera-view sample and the light-position against projected shadow map sample. If the distance between the camera-view-sample and the light position is greater than the ray-length (provided by the corresponding shadow map sample, projected onto the camera-view-surface), then the light-source is not visible.


Broad utilization of shadow mapping algorithm is to thank to it’s extreme compatibility with the modern hardware rasterization technology (GPU). Modern GPUs can easily render and execute light visibility test in real-time. However, rasterization is also the main cause of shadow aliasing.

Figure 3 - Uniform sample distribution from light-view-space (directional light). Red lines indicate camera view angle.

Figure 3 – Uniform sample distribution from light-view-space (directional light). Red lines indicate camera view angle.

Perspective aliasing - From the light-view-space, a standard shadow mapping algorithm has a uniform sampling distribution (can be seen on figure 2 and 3). For the visibility test, the shadow map is projected into the camera-view-space. The transformation from the light to the camera-view-space results in a non-uniform sample distribution. High sample-destiny far from the camera and low near the camera (figure 3, the near-camera-face has only 2 samples and  far-camera-face has 14 samples). When a shadow map sample becomes larger than a single screen-space pixel, perspective aliasing starts to appear.

Figure 4 - Shadow map grid in screen-space.

Figure 4 – Shadow map sample-grid in screen-space.

On figure 4, a single grid-cell represents one shadow map sample (in screen-space). All shadow map samples are clearly larger than a single screen-space pixel, demonstrating perspective aliasing.


Figure 5 – Left no bias offset, right with bias offset. A single column represents a shadow map sample. Only the center of the sample area represents the correct depth.

Projection aliasing - A single shadow map sample is only able to describe a  2D-plane (surface area) which is parallel-aligned to the light. However, usually the surface is not a flat 2D-plane and thus contains more depth information that a single shadow map sample is able to describe. Because of this limitation some of the surface area projected in the camera space from the light space is incorrectly described and results in incorrect visibility tests (see figure 1, bottom-right surface of the yellow cube). A common “solution” to this problem is to use a small offset which will reduce the visible aliasing (see figure 5, right).

MLAA – CG Implementation

This is a CG implementation of Jorge Jimenez’s HLSL MLAA with some minor adjustments and changes. The implementation consist of three main steps: discontinuity pass, blend weight pass and finally the blend pass. The core difference between Jorge Jimenez’s MLAA and original MLAA by Alexander Reshetov is the use of area-texture (explained in second step) which determines edge-blend-weighting  (area-texture allows MLAA to run efficiently on a GPU) . He is also using bilinear sampling to reduce the amount of required texture fetches. Bilinear sampling allows to fetch 4 nearest pixel-samples simultaneously, which is used for edge-type classification. After an edge is classified, corresponding blend-weight is being looked up on the area-texture. Blend-weights determine the blend-intensity with nearest edge-pixels, resulting in edge re-vectorization.

Test Scene


Test scene, zoom 1x, MLAA disabled.


Test scene, zoom 4x, MLAA disabled.

MLAA Steps

Following three post-processing steps are required to perform MLAA:

  1. Discontinuity Pass – Edge detection.
  2. Blend Weight Pass – Edge-pattern detection and blend weight assignment.
  3. Blend Pass – Blending nearest pixels according to the sub-pixel edge weight.

Important Notes:

  • Shader-code uses “tex2D” and “tex2Dnearest“, “tex2D” for linear sampling and “tex2Dnearest” for nearest sampling. It is very important that nearest and linear filtering is performed correctly, otherwise blend-weighting will be incorrect!
  • tex2Dnearest” implementation is extremely inefficient (due to “tex2Dsize“) and is only meant for easy implementation.
  • My image orientation space is somewhat different that used by Jorge Jimenez’s original MLAA implementation. My image zero coordinate begins from bottom-left corner, instead of traditional top-left corner.
  • The code is some what refactored, makes it for me easier to read and may not be as fast as original implementation.
  • All three fragment shaders use the same vertex shader provided bellow.

Vertex Shader

	float4 pos		: POSITION;
	float2 uv		: TEXCOORD0;

FP_FORWARD_IN main(	float4 inPos	: POSITION,
					float2 inUV		: TEXCOORD0)

	outFP.pos				= inPos;
	outFP.uv				= inUV;

	return outFP;

Step 1 – Discontinuity Pass


Result of discontinuity pass. Zoom 4x

Discontinuity pass detects geometry “edges”. If delta grayscale of a relative left pixel intensity exceeds a certain threshold (in this case 0.1), than the red channel of originating pixel will be filled with 1.0. Same rule for neighboring up pixel, than the green channel will be filled with 1.0.

Fragment Shader

	float4 color0		: 	COLOR0;

float Grayscale(float3 inRGB)
	return dot(inRGB, float3(0.2126f, 0.7152f, 0.0722));

float4 tex2Dnearest(sampler2D inSampler, float2 inUV)
	return tex2Dfetch(inSampler, int4(inUV * tex2Dsize(inSampler, 0).xy, 0, 0));

#define THRESHHOLD	0.1f
FBO_FORWARD_IN main(float4				inPos			: POSITION,
					float2				inUV			: TEXCOORD0,
					uniform sampler2D	inSampler0		: TEXUNIT0,		// Aliased image
					uniform float2		inVP							// Render Target width and height pixels
		My image space is represented from bottom-left corner, instead of traditional
		top-left corner. Resulting in inverted Y offset coordinates (negative becomes
		positive and positive becomes negative on Y axis).


	float2 pixelSize	= 1.0f / inVP;

	float center		= Grayscale(tex2Dnearest(inSampler0, inUV).rgb);
	float left			= Grayscale(tex2Dnearest(inSampler0, inUV + float2(-1,0)*pixelSize).rgb);
	float up			= Grayscale(tex2Dnearest(inSampler0, inUV + float2(0,1)*pixelSize).rgb);
	float treshhold		= THRESHHOLD;

	float2 delta		= abs(center.xx - float2(left, up));
	float2 edges		= step(treshhold.xx, delta);

	outFBO.color0		= float4(edges.rg, 0, 1);

	return outFBO;

Step 2 – Blend Weight Pass


Area texture contains 4*4 = 16 edge patterns. Each patter is represented by a 33×33 sub-texture.

This section of algorithm is dedicated in finding edge patterns and matching corresponding blend weights. Blend weights are precalculated and stored in an area-texture (see image on the right). Red and green channels represent sub-pixel edge positions which determines the blend-weighting of neighboring pixels.

An edge-pattern (matching sub-texture) is identified by performing a cross-edge-search in both directions of current axis. For example, on horizontal discontinuity a search is performed in right and left direction (on the edge-texture, see Discontinuity Pass section). Either a crossing edge is found or maximum search distance is reached. Similar case for vertical discontinuity, up and down search must be also performed.

Absolute values of both search directions (left and right or up and down) are used to construct the sub-texture coordinate (in this case sub-texture coordinate ranges from 0.0 to 0.2), which is used to fetch matching blend weight.


Result of blend-weight pass. Zoom 4x

Fragment Shader

	float4 color0 :	COLOR0;

float4 tex2Dnearest(sampler2D inSampler, float2 inUV)
	return tex2Dfetch(inSampler, int4(inUV * tex2Dsize(inSampler, 0).xy, 0, 0));

#define MAX_SEARCH	8
// inDir can only be axis-aligned: [1,0], [-1,0], [0,1], [0,-1]
float Search(sampler2D inEdgeTex, float2 inPixelSize, float2 inUV, float2 inDir)
	float2 stepSize		= inDir * inPixelSize;
	float2 tc			= inUV + stepSize * 1.5f;
	stepSize			= stepSize * 2.0f;

	float edge			= 0.0f;
	float incrementDir	= dot(inDir, 1.0f);
	float2 axis			= abs(inDir);
	int i				= 0;

	for(; i<MAX_SEARCH; i++)
		edge = dot( tex2D(inEdgeTex, tc).yx * axis, 1.0f );
		if(edge < 0.9f)
		tc += stepSize;

	return min(2.0 * i + 2.0 * edge, 2.0 * MAX_SEARCH) * incrementDir;


float2 Area(sampler2D inTable, float2 inSampleEdgePosition, float inEdge1, float inEdge2)

				Area Texture

			0.0	0.2	0.4	0.6	0.8	1.0

	0.0		p1--p4------p1--p5---
			|   |   |///|   |   |
	0.2		p8--p12--///p9--p13-|
			|   |   |///|   |   |
	0.4		|-------c///--------|
	0.6		p2--p6---///p3--p7--|
			|   |   |///|   |   |
	0.8		p10-p14--///p11-p15-|
			|   |   |///|   |   |
	1.0		---------------------
																|UVs   | Pixels
				AREA_TEXTURE_SIZE		=	| 1.0  | 165
					AREA_SUB_TEXTURE_OFFSET	=	| 0.8  | 132
								AREA_SUB_TEXTURE_SIZE	=	| 0.2  | 33

	*	inEdge1 and  inEdge2 will contain one of following scalers: 0.0, 0.25, 0.5, 0.75, 1.0.

		inEdge1 and inEdge2 represent a pattern code which can be represented by UV texture
		coordinate on the area-texture.

		inEdge1 and inEdge2 determine which sub-texture will be used.
		Example coordinates:
			a. [0.0, 0.0]	= float2(0.0, 0.0) * 0.8	= float2(0.0, 0.0)		= p1
			b. [0.75, 1.0]	= float2(0.75, 1.0) * 0.8	= float2(0.6, 0.8)		= p11
			c. [1.0, 1.0]	= float2(1.0, 1.0) * 0.8	= float2(0.8, 0.8)		= p15
			d. [0.5, 0.5]	= float2(0.5, 0.5) * 0.8	= float2(0.4, 0.4)		= c (unused area)

	*	inSampleEdgePosition determines the UV texture coordinates of the sub-texture.
		The inSampleEdgePosition (.x and .y) range is between 0.0 and 0.2.


    float2 subTextureTC 	= float2(inEdge1, inEdge2) * AREA_SUB_TEXTURE_OFFSET;
	float2 interSubTC		= inSampleEdgePosition / AREA_TEXTURE_SIZE;
	float2 tc				= subTextureTC + interSubTC;
    return tex2Dnearest(inTable, tc).rg;

float4 BlendWeight(sampler2D inEdges, sampler2D inTable, float2 inUV, float2 inPixelSize)
		My image space is represented from bottom-left corner, instead of traditional
		top-left corner. Thus compared to original code (by Jorge Jimenez) resulting
		in inverted Y offset coordinates (negative becomes positive and positive
		becomes negative on Y axis).

		Area texture is also horizontally flipped, since all my texture assets are horizontally
		pre-flipped to match mesh UVs. Double check that in your implementation if this is also required.

	float2 edge		= tex2Dnearest(inEdges, inUV).rg;
	float4 weights	= float4(0,0,0,0);

		float edgeLeft	= Search(inEdges, inPixelSize, inUV, float2(-1,0));
		float edgeRight	= Search(inEdges, inPixelSize, inUV, float2(1,0));

		float edge1		= tex2D(inEdges, inUV.xy + float2(edgeLeft, 0.25f) * inPixelSize).r;
		float edge2		= tex2D(inEdges, inUV.xy + float2(edgeRight+1.0f, 0.25f) * inPixelSize).r;

		weights.rg		= Area(inTable, abs(float2(edgeLeft, edgeRight)), edge1, edge2);

		float edgeUp	= Search(inEdges, inPixelSize, inUV, float2(0,1));
		float edgeDown	= Search(inEdges, inPixelSize, inUV, float2(0,-1));

		float edge1		= tex2D(inEdges, inUV.xy + float2(-0.25f, edgeUp) * inPixelSize).g;
		float edge2		= tex2D(inEdges, inUV.xy + float2(-0.25f, edgeDown-1.0f) * inPixelSize).g;		= Area(inTable, abs(float2(edgeUp, edgeDown)), edge1, edge2);

	return saturate(weights);

FBO_FORWARD_IN main(float4				inPos			: POSITION,
					float2				inUV			: TEXCOORD0,
					uniform sampler2D	inSampler0		: TEXUNIT0,	// Edge texture
					uniform sampler2D	inSampler1		: TEXUNIT1, // Area texture / look up texture
					uniform float2		inVP						// Render Target width and height pixels

	outFBO.color0 = BlendWeight(inSampler0, inSampler1, inUV, 1.0f / inVP);

	return outFBO;

Step 3 – Blend Pass

Connecting pixels are bilinearly reinterpolated based on blend information provided from the blend weight pass.


Result of blend pass, final image result. Zoom 4x.

Fragment Shader

	float4 color0		: 	COLOR0;

FBO_FORWARD_IN main(float4				inPos			: POSITION,
					float2				inUV			: TEXCOORD0,
					uniform sampler2D	inSampler0		: TEXUNIT0,	// Aliased image
					uniform sampler2D	inSampler1		: TEXUNIT1, // Blend weights
					uniform float2		inVP						// Render Target width and height pixels
		My image space is represented from bottom-left corner, instead of traditional
		top-left corner. Resulting in inverted Y offset coordinates (negative becomes
		positive and positive becomes negative on Y axis).


	float2 ps		= 1.0f / inVP;

	float4 topLeft	= tex2D(inSampler1, inUV);
	float bottom	= tex2D(inSampler1, inUV + float2(0,-1)*ps).g;
	float right		= tex2D(inSampler1, inUV + float2(1,0)*ps).a;
	float4 a		= float4(topLeft.r, bottom, topLeft.b, right);
	float4 w		= abs(a * a * a);

	float sum		= dot(w, 1.0f);

		Original code discards to overwrite only applicable pixels for re-vectorization.
		Please see original code:

	if(sum > 1e-5)
		float4 color	= float4(0.0f);
		float4 tc		= float4(0.0f, a.r, 0.0f, -a.g) * ps.xyxy + inUV.xyxy;
		color			+=tex2D(inSampler0, tc.xy) * w.r;
		color			+=tex2D(inSampler0, * w.g;

		tc				= float4(-a.b, 0.0f, a.a, 0.0f) * ps.xyxy + inUV.xyxy;
		color			+=tex2D(inSampler0, tc.xy) * w.b;
		color			+=tex2D(inSampler0, * w.a;

		outFBO.color0	= saturate(color / sum);
		outFBO.color0	= tex2D(inSampler0, inUV);

	return outFBO;



Left MLAA disabled, Right MLAA enabled


Left MLAA disabled, Right MLAA enabled, zoom 4x.

Download Win32 MLAA Example - ATI GPUs may experience incompatibility.

FXAA – Fast Approximate Anti-Aliasing

FXAA  takes a different approach to MLAA edge re-vectorization ideology. Instead of shape patterns, a horizontal and vertical pixel- (luminescence) discontinuities are detected which represent an edge angle (length is the discontinuation magnitude). From the local pixel position in both negative and positive directions a end-to-edge search is performed (the of edge end is determined by a maximum search distance or a threshold which exceeds average discontinuation magnitude). Based on a perpendicular vector from the local to end-of-search pixel position vector, a blending/resampling operation is performed with the corresponding neighboring pixel (pixel position where the perpendicular vector points at). The amount of blending depends on the luminescence magnitude.


Channels: Red = edge, Green = horizontal discontinuation, Blue = vertical discontinuation.


  • Pixel and sub-pixel antialiasing (an edge can be represented by a singe single pixel).
  • Can be integrated into a single fragment shader.
  • Low integration complexity factor.
  • Deferred rendering pipeline friendly.


  • Has tendency to result in a less sharp image quality than MLAA.
  • Requires hardware anisotropic sampling support (should not be a problem on modern hardware).
  • Designed to work on final image with low dynamic range sRGB, HDR images will generate artifacts (solution: clamping channel values between 0 and 1).

SMAA – Enhanced Subpixel Morphological Antialiasing

SMAA is based on Jorgen Jarmizes’s Practical MLAA (imporved version of MLAA, designed to run on a GPU). SMAA extends Practical MLAA by:

  • Increased number and type of edges patterns, which helps to preserve sharp geometric features and also to process diagonal geometry edges.
  • Improved pattern handling with a fast and accurate distance search for a more reliable edge detection.
  • Added multi/supersampling and temporal reprojection
  • Each feature can be enabled and disabled to match the desired result.

img01Edge re-vectorization. MLAA (left) and SMAA(right) with diagonal edge re-vectorization.


  • Designed to run on a GPU. SMAA 4x, less than 3 ms execution time on GeForce GTX 470.
  • Sharper re-vectorization than original MLAA.
  • Improved pattern search.
  • Integrated multisampling and temporal antialiasing (cures ghosting and preserves sub-pixel detail) which can be increased to preserve stability and increase sub-pixel quality for a more complex scenery.
  • Final image quality comparable to SSAA 16x (supersampling) with a much lower execution time and memory consumption.
  • Deferred rendering pipeline friendly.


  • Storing velocity module (for temporal antialiasing) in alpha channel of the color buffer may conflict with deferred rendering pipeline.


Note: Precompiled demo seems to display a more sharper and stable result with MSAA 4x in contrast to SMAA 4x setting.

MLAA – Morphological Antialiasing

MLAA is a universal (independent from rendering pipeline) post-processing techniques which is originally developed by Intel for (ray tracing) image based aliasing removal.

mlaa_img01The algorithm consists of three major steps:

  1. Based on image, recognizing image discontinuities (by comparing neighboring horizontal and vertical pixels).
  2. Based on discontinuities, detecting L, U, Z shapes patterns (the patterns are defined by the lines in between pixels).
  3. Based on shape patterns, blending the corresponding pixels with correct (anti-aliased line) weighting.



The amount of blending each pixel receives depends on the surface area split by the anti-aliased line (created by the pattern-shape).


  • The algorithm is universal and only requires pixel data.
  • Parallelisation friendly.
  • Quality comparable to 4x supersampling


  • The algorithm is designed to run on CPU, which is less attractive for real-time application.
  • Susceptible to temporal artifacts.
  • Unable to correctly anti-alias shapes smaller than a single pixel.
  • Susceptible to temporal artifacts.

Graduation Project: Anti-Aliasing Filter for Regular Shadow Maps

During my specialization at NHTV it came to my attention that filtering techniques (such as Variance Shadow MapsPercentage Closer Filtering) produce a relatively good visual result, with a low implementation complexity and GPU load in contrast to sample redistribution techniques.

Near the end of my specialization, an idea was forming of attempting to use already existing anti-aliasing filtering techniques (such as MLAA, SMAAFXAA, etc) to improve the visual quality of a regular shadow mapping technique (Lance Williams 1978). However the implementation complexity will be relatively high and the GPU load will be higher than regular filtering techniques.

By using a regular screen-spaced AA-Filter, the shadow map quality exerts minor improvement, but insufficient to remove perspective aliasing. Later on I’ve found an interseting document on shadow maps in combination with FXAA. The shadow blockiness (shadow silhouette perspective aliasing) has been significantly reduced, however the the perspective aliasing is still very visible.

Regular anti-aliasing filter is based on a flat-projection, which also takes insufficient parameterization into account to reduce shadow map perspective aliasing. By transforming the 2D screen-plane back into the light-space projection (the space where the original shadow map is being rasterized) and taking the original shadow map parameterization into account, the AA-Filter can be then a more specialized technique to improve visual quality of a regular shadow map.

In following weeks I am going to:

  1. Research existing 2D post-processing anti-aliasing techniques.
  2. Determine the most suitable candidate technique for the implementation.
  3. Develop AA Filter for shadow maps based on implemented aa-filter with space transformation and shadow map parameterization.

If the technique is successful, a further research can be done in certain areas:

  1. Combination with sample redistribution techniques (Cascaded Shadow MapsLight Space Perspective Shadow Maps).
  2. Research exotic AA-Filtering techniques such as NSAA and TXAA.