[工作積累] UE4 TAA ReProjection的精度處理

本文轉載自查看原文 2017-07-21 23:11 939 工作積累/ TAA

先貼一個UE4 TAA的slide
https://de45xmedrsdbp.cloudfront.net/Resources/files/TemporalAA_small-59732822.pdf

里面細節問題很多，先記錄一下目前想到和遇到的問題，便於備忘，后面有空的話再記錄。

TAA用到的Velocity和抖動對精度要求比較高，特別是大場景下誤差容易比較大，UE4做了一系列的處理來保持精度。

投影矩陣的精度

 1     static const FMatrix InvertProjectionMatrix( const FMatrix& M )
 2     {
 3         if( M.M[1][0] == 0.0f &&
 4             M.M[3][0] == 0.0f &&
 5             M.M[0][1] == 0.0f &&
 6             M.M[3][1] == 0.0f &&
 7             M.M[0][2] == 0.0f &&
 8             M.M[1][2] == 0.0f &&
 9             M.M[0][3] == 0.0f &&
10             M.M[1][3] == 0.0f &&
11             M.M[2][3] == 1.0f &&
12             M.M[3][3] == 0.0f )
13         {
14             // Solve the common case directly with very high precision.
15             /*
16             M = 
17             | a | 0 | 0 | 0 |
18             | 0 | b | 0 | 0 |
19             | s | t | c | 1 |
20             | 0 | 0 | d | 0 |
21             */
22 
23             double a = M.M[0][0];
24             double b = M.M[1][1];
25             double c = M.M[2][2];
26             double d = M.M[3][2];
27             double s = M.M[2][0];
28             double t = M.M[2][1];
29 
30             return FMatrix(
31                 FPlane( 1.0 / a, 0.0f, 0.0f, 0.0f ),
32                 FPlane( 0.0f, 1.0 / b, 0.0f, 0.0f ),
33                 FPlane( 0.0f, 0.0f, 0.0f, 1.0 / d ),
34                 FPlane( -s/a, -t/b, 1.0f, -c/d )
35             );
36         }
37         else
38         {
39             return M.Inverse();
40         }
41     }

可以看到對投影矩陣的取反做了特殊處理，來提高浮點計算精度。

視圖矩陣的精度

1     FVector DeltaTranslation = InPrevViewMatrices.GetPreViewTranslation() - InViewMatrices.GetPreViewTranslation();
2     FMatrix InvViewProj = InViewMatrices.ComputeInvProjectionNoAAMatrix() * InViewMatrices.GetTranslatedViewMatrix().GetTransposed();
3     FMatrix PrevViewProj = FTranslationMatrix(DeltaTranslation) * InPrevViewMatrices.GetTranslatedViewMatrix() * InPrevViewMatrices.ComputeProjectionNoAAMatrix();
4 
5     ViewUniformShaderParameters.ClipToPrevClip = InvViewProj * PrevViewProj;

將視圖矩陣的位移和旋轉拆開來（T*R），以避免大地圖上，相機位置過大造成的誤差。

Reprojection是把當前幀Clip space或NDC space的點重新投影到上一幀的位置

Reprojection = (V*P)^-1 * (PrevV*PrevP)

= P^-1 * V^-1 * PrevV * PrevP

其中V為視圖矩陣，P為投影矩陣。

而UE4的視圖矩陣實際使用TR的方式（類似變換矩陣的TRS和RTS的順序，區別是試圖矩陣沒有縮放， https://docs.microsoft.com/en-us/dotnet/framework/winforms/advanced/why-transformation-order-is-significant

注意如果是相同的T和R， T*R和R*T結果是不一樣的，這里的TR，是在ViewMatrix結果一樣的前提上，拆解出的另外兩個矩陣，從幾何分析的角度也可以得出）

Reprojection = P^-1 * (T*R)^-1 * (PrevT*PrevR) * prevP

= P^-1 * R^-1 * T^-1 * PrevT * PrevR * PrevP

= P^-1 * R^-1 * (T^-1 * PrevT) * PrevR * PrevP

其中:
Reprojection是UE4代碼里的ViewUniformShaderParameters.ClipToPrevClip，
R是ViewMatrix的旋轉部分（UE4代碼中的GetTranslatedViewMatrix），
T是ViewMatrix的位移部分，

T^-1*PrevT 就是UE4代碼中的 FTranslationMatrix(DeltaTranslation) 。

這樣做的好處是，把兩個絕對位置轉換為一個相對位移，從而避免絕對位置過大而導致的矩陣Inverse的精度問題。

同時，視圖矩陣的旋轉部分 R 是正交矩陣，所以Transpose等價於Inverse，這樣不僅是效率的提高，更重要的是避免了Inverse導致的浮點誤差。

VelocityBuffer的精度

 1 // for velocity rendering, motionblur and temporal AA
 2 // velocity needs to support -2..2 screen space range for x and y
 3 // texture is 16bit 0..1 range per channel
 4 float2 EncodeVelocityToTexture(float2 In)
 5 {
 6     // 0.499f is a value smaller than 0.5f to avoid using the full range to use the clear color (0,0) as special value
 7     // 0.5f to allow for a range of -2..2 instead of -1..1 for really fast motions for temporal AA
 8     return In * (0.499f * 0.5f) + 32767.0f / 65535.0f;
 9 }
10 // see EncodeVelocityToTexture()
11 float2 DecodeVelocityFromTexture(float2 In)
12 {
13     const float InvDiv = 1.0f / (0.499f * 0.5f);
14     // reference
15 //    return (In - 32767.0f / 65535.0f ) / (0.499f * 0.5f);
16     // MAD layout to help compiler
17     return In * InvDiv - 32767.0f / 65535.0f * InvDiv;
18 }

VelocityBuffer在TAA里用來reproject移動的物體，包括蒙皮動畫和位移旋轉動畫等。

為了提高精度，VelocityBuffer可以使用Float32。Unity使用的是Float16x2 (R16G16F)，不同的是，UE4使用的是INT16x2 (R16G16_INT)。
對於Float16，最小精度是2^-10。如果映射到Int16，去掉等價的符號位，精度是2^-15。這樣在占用同樣顯存大小的情況下，提高了精度。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 [UE4]UE4是單線程的嗎？ Ue4的UE_LOG [工作積累] shadowmap 改進 UE4 Xml讀寫 UE4 Spline UE4 Xcode調試 [UE4]接口 [UE4]Grab抓取 ue4導入staticMesh UE4 常用數學