MLAA – CG Implementation

This is a CG implementation of Jorge Jimenez’s HLSL MLAA with some minor adjustments and changes. The implementation consist of three main steps: discontinuity pass, blend weight pass and finally the blend pass. The core difference between Jorge Jimenez’s MLAA and original MLAA by Alexander Reshetov is the use of area-texture (explained in second step) which determines edge-blend-weighting  (area-texture allows MLAA to run efficiently on a GPU) . He is also using bilinear sampling to reduce the amount of required texture fetches. Bilinear sampling allows to fetch 4 nearest pixel-samples simultaneously, which is used for edge-type classification. After an edge is classified, corresponding blend-weight is being looked up on the area-texture. Blend-weights determine the blend-intensity with nearest edge-pixels, resulting in edge re-vectorization.


Test Scene

mlaa_vb_original

Test scene, zoom 1x, MLAA disabled.

mlaa_vb_original_4x

Test scene, zoom 4x, MLAA disabled.


MLAA Steps

Following three post-processing steps are required to perform MLAA:

  1. Discontinuity Pass – Edge detection.
  2. Blend Weight Pass – Edge-pattern detection and blend weight assignment.
  3. Blend Pass – Blending nearest pixels according to the sub-pixel edge weight.

Important Notes:

  • Shader-code uses “tex2D” and “tex2Dnearest“, “tex2D” for linear sampling and “tex2Dnearest” for nearest sampling. It is very important that nearest and linear filtering is performed correctly, otherwise blend-weighting will be incorrect!
  • tex2Dnearest” implementation is extremely inefficient (due to “tex2Dsize“) and is only meant for easy implementation.
  • My image orientation space is somewhat different that used by Jorge Jimenez’s original MLAA implementation. My image zero coordinate begins from bottom-left corner, instead of traditional top-left corner.
  • The code is some what refactored, makes it for me easier to read and may not be as fast as original implementation.
  • All three fragment shaders use the same vertex shader provided bellow.

Vertex Shader

struct FP_FORWARD_IN
{
	float4 pos		: POSITION;
	float2 uv		: TEXCOORD0;
};

FP_FORWARD_IN main(	float4 inPos	: POSITION,
					float2 inUV		: TEXCOORD0)
{
	FP_FORWARD_IN outFP;

	outFP.pos				= inPos;
	outFP.uv				= inUV;

	return outFP;
}

Step 1 – Discontinuity Pass

mlaa_vb_step1_4x

Result of discontinuity pass. Zoom 4x

Discontinuity pass detects geometry “edges”. If delta grayscale of a relative left pixel intensity exceeds a certain threshold (in this case 0.1), than the red channel of originating pixel will be filled with 1.0. Same rule for neighboring up pixel, than the green channel will be filled with 1.0.

Fragment Shader

struct FBO_FORWARD_IN
{
	float4 color0		: 	COLOR0;
};

float Grayscale(float3 inRGB)
{
	return dot(inRGB, float3(0.2126f, 0.7152f, 0.0722));
}

float4 tex2Dnearest(sampler2D inSampler, float2 inUV)
{
	return tex2Dfetch(inSampler, int4(inUV * tex2Dsize(inSampler, 0).xy, 0, 0));
}

#define THRESHHOLD	0.1f
FBO_FORWARD_IN main(float4				inPos			: POSITION,
					float2				inUV			: TEXCOORD0,
					uniform sampler2D	inSampler0		: TEXUNIT0,		// Aliased image
					uniform float2		inVP							// Render Target width and height pixels
					)
{
	/*
		My image space is represented from bottom-left corner, instead of traditional
		top-left corner. Resulting in inverted Y offset coordinates (negative becomes
		positive and positive becomes negative on Y axis).
	*/

	FBO_FORWARD_IN	outFBO;

	float2 pixelSize	= 1.0f / inVP;

	float center		= Grayscale(tex2Dnearest(inSampler0, inUV).rgb);
	float left			= Grayscale(tex2Dnearest(inSampler0, inUV + float2(-1,0)*pixelSize).rgb);
	float up			= Grayscale(tex2Dnearest(inSampler0, inUV + float2(0,1)*pixelSize).rgb);
	float treshhold		= THRESHHOLD;

	float2 delta		= abs(center.xx - float2(left, up));
	float2 edges		= step(treshhold.xx, delta);

	outFBO.color0		= float4(edges.rg, 0, 1);

	return outFBO;
}

Step 2 – Blend Weight Pass

BlendWeights33x33_original

Area texture contains 4*4 = 16 edge patterns. Each patter is represented by a 33×33 sub-texture.

This section of algorithm is dedicated in finding edge patterns and matching corresponding blend weights. Blend weights are precalculated and stored in an area-texture (see image on the right). Red and green channels represent sub-pixel edge positions which determines the blend-weighting of neighboring pixels.

An edge-pattern (matching sub-texture) is identified by performing a cross-edge-search in both directions of current axis. For example, on horizontal discontinuity a search is performed in right and left direction (on the edge-texture, see Discontinuity Pass section). Either a crossing edge is found or maximum search distance is reached. Similar case for vertical discontinuity, up and down search must be also performed.

Absolute values of both search directions (left and right or up and down) are used to construct the sub-texture coordinate (in this case sub-texture coordinate ranges from 0.0 to 0.2), which is used to fetch matching blend weight.

mlaa_vb_step2_4x

Result of blend-weight pass. Zoom 4x

Fragment Shader

struct FBO_FORWARD_IN
{
	float4 color0 :	COLOR0;
};

float4 tex2Dnearest(sampler2D inSampler, float2 inUV)
{
	return tex2Dfetch(inSampler, int4(inUV * tex2Dsize(inSampler, 0).xy, 0, 0));
}

#define MAX_SEARCH	8
// inDir can only be axis-aligned: [1,0], [-1,0], [0,1], [0,-1]
float Search(sampler2D inEdgeTex, float2 inPixelSize, float2 inUV, float2 inDir)
{
	float2 stepSize		= inDir * inPixelSize;
	float2 tc			= inUV + stepSize * 1.5f;
	stepSize			= stepSize * 2.0f;

	float edge			= 0.0f;
	float incrementDir	= dot(inDir, 1.0f);
	float2 axis			= abs(inDir);
	int i				= 0;

	for(; i<MAX_SEARCH; i++)
	{
		edge = dot( tex2D(inEdgeTex, tc).yx * axis, 1.0f );
		if(edge < 0.9f)
			break;
		tc += stepSize;
	}

	return min(2.0 * i + 2.0 * edge, 2.0 * MAX_SEARCH) * incrementDir;
}

#define AREA_SUB_TEXTURE_COUNT	5
#define AREA_SUB_TEXTURE_SIZE	33
#define AREA_SUB_TEXTURE_OFFSET (1.0f / AREA_SUB_TEXTURE_COUNT * (AREA_SUB_TEXTURE_COUNT-1.0f))
#define AREA_TEXTURE_SIZE		(AREA_SUB_TEXTURE_SIZE * AREA_SUB_TEXTURE_COUNT)

float2 Area(sampler2D inTable, float2 inSampleEdgePosition, float inEdge1, float inEdge2)
{
	/*

				Area Texture

			0.0	0.2	0.4	0.6	0.8	1.0

	0.0		p1--p4------p1--p5---
			|   |   |///|   |   |
	0.2		p8--p12--///p9--p13-|
			|   |   |///|   |   |
	0.4		|-------c///--------|
			|///////////////////|
	0.6		p2--p6---///p3--p7--|
			|   |   |///|   |   |
	0.8		p10-p14--///p11-p15-|
			|   |   |///|   |   |
	1.0		---------------------
																|UVs   | Pixels
															  --|------|-------
				AREA_TEXTURE_SIZE		=	| 1.0  | 165
					AREA_SUB_TEXTURE_OFFSET	=	| 0.8  | 132
								AREA_SUB_TEXTURE_SIZE	=	| 0.2  | 33

	*	inEdge1 and  inEdge2 will contain one of following scalers: 0.0, 0.25, 0.5, 0.75, 1.0.

		inEdge1 and inEdge2 represent a pattern code which can be represented by UV texture
		coordinate on the area-texture.

		inEdge1 and inEdge2 determine which sub-texture will be used.
		Example coordinates:
			a. [0.0, 0.0]	= float2(0.0, 0.0) * 0.8	= float2(0.0, 0.0)		= p1
			b. [0.75, 1.0]	= float2(0.75, 1.0) * 0.8	= float2(0.6, 0.8)		= p11
			c. [1.0, 1.0]	= float2(1.0, 1.0) * 0.8	= float2(0.8, 0.8)		= p15
			d. [0.5, 0.5]	= float2(0.5, 0.5) * 0.8	= float2(0.4, 0.4)		= c (unused area)

	*	inSampleEdgePosition determines the UV texture coordinates of the sub-texture.
		The inSampleEdgePosition (.x and .y) range is between 0.0 and 0.2.

	*/

    float2 subTextureTC 	= float2(inEdge1, inEdge2) * AREA_SUB_TEXTURE_OFFSET;
	float2 interSubTC		= inSampleEdgePosition / AREA_TEXTURE_SIZE;
	float2 tc				= subTextureTC + interSubTC;
    return tex2Dnearest(inTable, tc).rg;
}

float4 BlendWeight(sampler2D inEdges, sampler2D inTable, float2 inUV, float2 inPixelSize)
{
	/*
		My image space is represented from bottom-left corner, instead of traditional
		top-left corner. Thus compared to original code (by Jorge Jimenez) resulting
		in inverted Y offset coordinates (negative becomes positive and positive
		becomes negative on Y axis).

		Area texture is also horizontally flipped, since all my texture assets are horizontally
		pre-flipped to match mesh UVs. Double check that in your implementation if this is also required.
	*/

	float2 edge		= tex2Dnearest(inEdges, inUV).rg;
	float4 weights	= float4(0,0,0,0);

	if(edge.y)
	{
		float edgeLeft	= Search(inEdges, inPixelSize, inUV, float2(-1,0));
		float edgeRight	= Search(inEdges, inPixelSize, inUV, float2(1,0));

		float edge1		= tex2D(inEdges, inUV.xy + float2(edgeLeft, 0.25f) * inPixelSize).r;
		float edge2		= tex2D(inEdges, inUV.xy + float2(edgeRight+1.0f, 0.25f) * inPixelSize).r;

		weights.rg		= Area(inTable, abs(float2(edgeLeft, edgeRight)), edge1, edge2);
	}

	if(edge.x)
	{
		float edgeUp	= Search(inEdges, inPixelSize, inUV, float2(0,1));
		float edgeDown	= Search(inEdges, inPixelSize, inUV, float2(0,-1));

		float edge1		= tex2D(inEdges, inUV.xy + float2(-0.25f, edgeUp) * inPixelSize).g;
		float edge2		= tex2D(inEdges, inUV.xy + float2(-0.25f, edgeDown-1.0f) * inPixelSize).g;

		weights.ba		= Area(inTable, abs(float2(edgeUp, edgeDown)), edge1, edge2);
	}

	return saturate(weights);
}

FBO_FORWARD_IN main(float4				inPos			: POSITION,
					float2				inUV			: TEXCOORD0,
					uniform sampler2D	inSampler0		: TEXUNIT0,	// Edge texture
					uniform sampler2D	inSampler1		: TEXUNIT1, // Area texture / look up texture
					uniform float2		inVP						// Render Target width and height pixels
					)
{
	FBO_FORWARD_IN	outFBO;

	outFBO.color0 = BlendWeight(inSampler0, inSampler1, inUV, 1.0f / inVP);

	return outFBO;
}

Step 3 – Blend Pass

Connecting pixels are bilinearly reinterpolated based on blend information provided from the blend weight pass.

mlaa_vb_step3_4x

Result of blend pass, final image result. Zoom 4x.

Fragment Shader

struct FBO_FORWARD_IN
{
	float4 color0		: 	COLOR0;
};

FBO_FORWARD_IN main(float4				inPos			: POSITION,
					float2				inUV			: TEXCOORD0,
					uniform sampler2D	inSampler0		: TEXUNIT0,	// Aliased image
					uniform sampler2D	inSampler1		: TEXUNIT1, // Blend weights
					uniform float2		inVP						// Render Target width and height pixels
					)
{
	/*
		My image space is represented from bottom-left corner, instead of traditional
		top-left corner. Resulting in inverted Y offset coordinates (negative becomes
		positive and positive becomes negative on Y axis).
	*/

	FBO_FORWARD_IN	outFBO;

	float2 ps		= 1.0f / inVP;

	float4 topLeft	= tex2D(inSampler1, inUV);
	float bottom	= tex2D(inSampler1, inUV + float2(0,-1)*ps).g;
	float right		= tex2D(inSampler1, inUV + float2(1,0)*ps).a;
	float4 a		= float4(topLeft.r, bottom, topLeft.b, right);
	float4 w		= abs(a * a * a);

	float sum		= dot(w, 1.0f);

	/*
		Original code discards to overwrite only applicable pixels for re-vectorization.
		Please see original code: http://www.iryoku.com/mlaa/
	*/

	if(sum > 1e-5)
	{
		float4 color	= float4(0.0f);
		float4 tc		= float4(0.0f, a.r, 0.0f, -a.g) * ps.xyxy + inUV.xyxy;
		color			+=tex2D(inSampler0, tc.xy) * w.r;
		color			+=tex2D(inSampler0, tc.zw) * w.g;

		tc				= float4(-a.b, 0.0f, a.a, 0.0f) * ps.xyxy + inUV.xyxy;
		color			+=tex2D(inSampler0, tc.xy) * w.b;
		color			+=tex2D(inSampler0, tc.zw) * w.a;

		outFBO.color0	= saturate(color / sum);
	}
	else
	{
		outFBO.color0	= tex2D(inSampler0, inUV);
	}

	return outFBO;
}

Results

mlaa_vb_compare

Left MLAA disabled, Right MLAA enabled

mlaa_vb_compare_4x

Left MLAA disabled, Right MLAA enabled, zoom 4x.

Download Win32 MLAA Example - ATI GPUs may experience incompatibility.

Comments are closed.