引擎設計跟蹤(九.14.2f) 最近更新: OpenGL ES & tools

本文轉載自查看原文 2015-01-11 00:05 855 glGetProgramBinary/ 3D引擎設計/ Tile Based/ glProgramBinary/ glBindAttributeLocation

之前骨骼動畫的IK暫時放一放, 最近在搞GLES的實現. 之前除了GLES沒有實現, Android的代碼移植已經完畢:

[原]跨平台編程注意事項(三): window 到 android 的移植

總的來說上次移植的改動不是很大, 主要是DLL與.so之間的調整和適配, 還有些C++標准相關的編譯錯誤. 數據包的加載/初始化/配置文件和插件的加載測試可用了, 但GLES沒有實現, 所以上次的移植只能在真機上空跑.

最近想在業余時間抽空把GLES的空白填上, 目前接口調整差不多了, GLES runtime正在填實現.

1.先簡單說下Tile Based Rendering GPU的原理和注意事項

TBR方式會將屏幕空間划分為若干個Tile, 每個tile比屏幕小, 比如32x32.
TBR會把幾何數據在屏幕空間划分到每個tile, 然后對每個Tile進行渲染, 幾何數據可能是跨很多tile的, 所以需要一直保存, 而且drawcall的幾何數據越多, 耗費的內存越大.
TBR的架構, GPU內部有針對Tile的快速內存(fast memory, 暫時先叫tile cache吧), 訪問速度很快. 但是video memory一般不是卡載的物理顯存, 而是使用系統主存, video memory 到cache的傳輸相對比較慢.
由於Tile Cache的存在, 去讀寫depth 和color 都很快.這和現代PC的GPU不同. TBR的blending, depth write/test, multisample相對來說都會快些. 對於深度不同的像素, 即使重復着色, 也只是在Tile Cache上進行, 最終一次寫入到video memory(實際中使用發現alpha blend仍然是比較慢的操作).
由於Tile Cache到video memory很慢, 所以GLES提供了InvalidateFrameBuffer的hint, 對於這種架構, 可以避免cache和memory之間的額外傳輸.
如果GPU有hidden surface removal 特性(PVR GPU), GPU會去排序這個幾何數據, 只在Tile Cache上繪制可見的部分, pixel負載小很多. 所以app在繪制的時候, solid物體不需要按距離排序, 但是discard/texkill 會導致其特性失效. 對於失效的情況, 或者沒有該特性的GPU, 仍然可以利用early z: 使用傳統方式的pre-z pass先寫深度.
Tile Based GPU的幾何負載(三角形數量)相對要比現代PC的GPU要低很多. 現代PC幾百萬的三角形是小意思, 但是Tile based需要保存這些幾何數據, 用於各個tile的渲染, 內存和運行開銷都比較大.

2.GLES和D3D接口統一

渲染接口基本類似, 有等價的實現, 主要在shader接口:

GL/ES是運行時link program, 他的shader是中間對象.D3D一般是離線編譯然后運行時直接載入.

GLES3有glGetProgramBinary和glProgramBinary, 可以保存和加載編譯后的shader.但是編譯和保存仍然要在target device上做.

Blade之前的接口是IShader => D3D9VertexShader : has a IDirect3DVertexShader9

=> D3D9FragementShader : has a IDirect3DPixelShader9

現在的接口把shader類型合並, 不再有不同類型的的shader 對象, 而是一個shader包含了vs和fs等對象

IShader => D3D9Shader : has a (IDirect3DVertexShader9 & IDirect3DPixelShader9 )

=> GLESShader : has a gl program

同時IRenderDevice:: setShader( EShaderType, HSHADER& ) 改為setShader(HSAHDER&)

GLSL/ES的shader, 所有的uniform和vertex input stream(vertex attribute) 都沒有semantic. 需要用戶自己根據名字來綁定和設置.

對於uniform, 因為Blade的shader resouce 會額外的保存一個semantic map, 用於更新引擎內置的變量, 比如WORLD_MATRIX, EYE_POS等等, 所以uniform的綁定和更新沒有問題.

而對於vertex atribute, 現在的做法是, 把這些變量使用固定的名字替換. 比如 HLSL中的POSITION0, 對應的GLSL, 其變量名字叫做blade_position0.

這樣就可以在運行時glBindAttributeLocation, 綁定到VBO上.

3.工具

打包工具BPK已經有了,runtime也在android上測試可用. 目前需要的工具有:　shader compiler, texture compressor.

shader compiler使用的是HLSL2GLSL:

先說下windows下現有的shader compiler:

offline:

HLSL ==(TexShaderSerializer::load) ==> D3DSoftwareShader : compiled binary == (BinarySerializer::save) ==> binary shader : with semantic map

runtime:

binary shader ==(BinarySerializer::load)==> D3DShader

GLES下已經做的shader compiler:

offline:

HLSL ==(TexShaderSerializer::load) ==> D3DSoftwareShader : compiled binary with HLSL text ==(replace with GLSL)==>

binary with GLSL text ==(HybridShaderSerializer::save)==> hybird shader :　text with binary semnatic map

runtime:

hybrid shader ==(HybridShaderSerializer::load)==> GLESShader

對於GLES3.0, 可以在啟動時將shader(program)保存為binary(只保存一次), 這樣shader以后不用再編譯, 加載速度會快很多.這個以后也會做.

(https://software.intel.com/en-us/articles/opengl-es-30-precompiled-shaders)

GLES2的擴展有glShaderBinary, 不過是保存鏈接前的shader, 而不是鏈接后的program.

starting up precompile: once and for all

hybrid shader ==(HybridShaderSerializer::load)==> GLESShader ==(BinaryShaderSerializer::save) ==> binary shader

runtime:

binary shader ==(BinaryShaderSerilizer::load) ==> GLESShader

需要記錄的是IShader是渲染設備/API相關的接口, 其接口抽象位於foundation library, 實現在另一個DLL/so. 而ShaderResource和所有的ShaderSerializer是可復用的,平台無關的. 整個Graphics Subsystem是平台無關的, 具體平台相關的優化(比如Tile Based)需要用渲染配置文件(這個文件的范例.xml以前記錄過)來做, 還有Blade::IRenderDevice內部的implementation來做針對的處理.

shader compiler因為用了三方庫, 所以目前做完了, 可以轉換為GLSL ES 3.0, 等runtime填充玩, 有了壓縮紋理格式就可以測試了.

texture compressor是把紋理壓縮成目標平台使用的格式, 這里Blade准備用的是ETC2/EAC. 之前blade在windows上是實時壓縮, 因為看到國外有的引擎這么做, 主要優點是用png保存在磁盤節約磁盤空間,png的壓縮比要比S3TC高. 但是使用中發現,對於大貼圖, 加載稍微有點慢, 而且對於移動端, 在線壓縮也不是好方法,這個之前提到過. 以后的方案改為先離線壓縮好貼圖, 所有平台統一使用這種預壓縮方式.

texture compresstor的話, 最近工作太忙, 沒有太多業余時間. 可能也沒時間去手寫, 會用三方庫來做壓縮. 目前還沒做, 后面會做. 還要做的是, 梳理目標平台數據生成/打包流程. 即綜合shader compiler, texture compressor, BPK packager, 一次性生成最終數據的build/project script.

其他的游戲數據, 已經設計成跨平台的, 理論上也應該是跨平台, 不需要做任何額外處理. blade現有的x86和x64用的都是相同的數據或者BPK數據包. 但是android上面可能需要調試.

最后, HLSL之前的uniform semantic解析, 是放在文件的注釋里面的:

 1 //!BladeShaderHeader
 2  2 //![VertexShader]
 3  3 //!Entry=TerrainVSMain
 4  4 //!Profile=vs_3_0
 5  5 //![FragmentShader]
 6  6 //!Entry=TerrainPSMain
 7  7 //!Profile=ps_3_0
 8  8 
 9 #include "inc/light.hlsl"
10 #include "inc/common.hlsl"
11 #include "inc/terrain_common.hlsl"
12 
13 //![Semantics] 14 //!wvp_matrix = WORLD_VIEWPROJ_MATRIX 15 //!world_translate = WORLD_POSITION
16 
17 
18 void TerrainVSMain(    
19     float2 hpos        : POSITION0,
20     float2 vpos        : POSITION1,
21     float4 normal    : NORMAL0,        //ubyte4-n normal
22 
23     uniform float4x4 wvp_matrix,
24     uniform float4 world_translate,
25     uniform float4 scaleFactor,        //scale
26     uniform float4 UVInfo,            //uv information
27     
28     out    float4 outPos : POSITION,
29     out    float4 outUV  : TEXCOORD0,
30     out float4 outBlendUV : TEXCOORD1,
31     out float3 outWorldPos : TEXCOORD2,
32     out float3 outWorldNormal : TEXCOORD3
33     )
34 {
35     float4 pos = float4(hpos.x, getMorphHeight(vpos, hpos+world_translate.xz, eye_position.xz), hpos.y, 1);
36     pos = pos*scaleFactor;
37 
38     float blendOffset = UVInfo[0];
39     float tileSize = UVInfo[1];
40     float blockSize = UVInfo[2];
41     float blockUVMultiple = UVInfo[3];
42 
43     //normalUV
44     outUV.xy = pos.xz*(tileSize-1)/(tileSize*tileSize) + 0.5/tileSize;
45     //block repeat UV
46     outUV.zw = pos.xz*blockUVMultiple/blockSize;
47     //blendUV
48     outBlendUV.xy = pos.xz*(tileSize-1)/(tileSize*tileSize) + blendOffset/tileSize;
49     outBlendUV.zw = pos.xz/tileSize;
50 
51     //use local normal as world normal, because our terrain has no scale/rotations
52     outWorldNormal = expand_vector(normal).xyz;    //ubytes4 normal ranges 0-1, need convert to [-1,1]
53     
54     //don't use full transform because our terrain has no scale/rotation
55     outWorldPos = pos.xyz+world_translate.xyz;
56 
57     outPos = mul(pos, wvp_matrix);
58 }

現在去掉了注釋中的聲明, 改成了HLSL的格式. 之前因為D3D的Effect才支持解析uniform的semantic, 所以誤以為, 這種格式只有.FX才支持, 如果直接用D3DCompile會報錯.

但是前幾天試了一下, D3DCompile不會對unform的semantic報錯, 只是直接忽略掉它了. 所以全部改成這種格式.

需要稍微加點代碼手動解析semantic, 用tokenizer就可以了.

 1 //!BladeShaderHeader
 2 //![Shader]
 3 //!VSEntry=TerrainVSMain
 4 //!VSProfile=vs_3_0
 5 //!FSEntry=TerrainPSMain
 6 //!FSProfile=ps_3_0
 7 
 8 #include "inc/light.hlsl"
 9 #include "inc/common.hlsl"
10 #include "inc/terrain_common.hlsl"
11 
12 
13 void TerrainVSMain(    
14     float2 hpos        : POSITION0,
15     float2 vpos        : POSITION1,
16     float4 normal    : NORMAL0,        //ubyte4-n normal
17 
18  uniform float4x4 wvp_matrix : WORLD_VIEWPROJ_MATRIX, 19  uniform float4 world_translate : WORLD_POSITION, 20     uniform float4 scaleFactor : _SHADER_,        //per shader custom variable: scale
21     uniform float4 UVInfo : _SHADER_,            //per shader custom variable: uv information
22     
23     out    float4 outPos : POSITION,
24     out    float4 outUV  : TEXCOORD0,
25     out float4 outBlendUV : TEXCOORD1,
26     out float3 outWorldPos : TEXCOORD2,
27     out float3 outWorldNormal : TEXCOORD3
28     )
29 {
30     float4 pos = float4(hpos.x, getMorphHeight(vpos, hpos+world_translate.xz, eye_position.xz), hpos.y, 1);
31     pos = pos*scaleFactor;
32 
33     float blendOffset = UVInfo[0];
34     float tileSize = UVInfo[1];
35     float blockSize = UVInfo[2];
36     float blockUVMultiple = UVInfo[3];
37 
38     //normalUV
39     outUV.xy = pos.xz*(tileSize-1)/(tileSize*tileSize) + 0.5/tileSize;
40     //block repeat UV
41     outUV.zw = pos.xz*blockUVMultiple/blockSize;
42     //blendUV
43     outBlendUV.xy = pos.xz*(tileSize-1)/(tileSize*tileSize) + blendOffset/tileSize;
44     outBlendUV.zw = pos.xz/tileSize;
45 
46     //use local normal as world normal, because our terrain has no rotations
47     outWorldNormal = expand_vector(normal).xyz;    //ubytes4 normal ranges 0-1, need convert to [-1,1]
48     
49     //don't use full transform because our terrain has no scale/rotation
50     outWorldPos = pos.xyz+world_translate.xyz;
51 
52     outPos = mul(pos, wvp_matrix);
53 }

關於shader變量, WORLD_VIEWPORJ_MATRIX是blade的FX framework內置的變量, 而"_SHADER_"這個semantic, 僅僅是表示這個變量是模塊自定義的shader變量, framework沒有內置, 用戶模塊(如例子中的地形模塊)需要根據變量名字, 直接設置/更新該變量. 至少需要設置一次, 如果沒有變化, 就不需要再更新它的值. 這個變量的CPU數據是由material/FX framework 自動根據變量類型分配的內存, 保留在shader/instance/global shader constant table里面.

后面有空了做ETC2/EAC的紋理壓縮. 目前移植相對來說工作量不大, 可能適配和優化會花時間. 主要還是平台無關的core feature都不完善, 以后會集中做這些, 否則移植了意義也不是很大. 只要core feature和游戲代碼有了, 即使出了新平台應該也能很快適配. 當然游戲的工程量跟引擎不是一個數量級, 希望以后有機會可以跟人合作.

GLES 3.0 有了UBO, 這也是一個優化點. 不過我覺得UBO的接口不暴露出來比較好, 而是放在IRenderDevice的implementation里面, 這樣對於沒有constant buffer的API來說, 可以不用關心其接口.

當然也可以抽象出接口, 對於不支持的API(比如Direct3D9),可以用某些方法模擬, 之前提到過Ogre的數組緩沖方式, 最后一次性提交.

這個特性先放一放, 以后實現DX11/DX12的時候, 可以綜合對照一下, 看看接口如何抽象最好.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 OpenGL ES之glUniform函數 Vulkan vs OpenGL ES Android OpenGL ES 開發（四）: OpenGL ES 繪制形狀 Android OpenGL ES 開發（六）: OpenGL ES 添加運動效果 OpenGL ES: (5) OpenGL的基本概念、OpenGL ES 在屏幕產生圖片的過程、OpenGL管線(pipeline) OpenGL ES學習資料總結 OpenGL ES for Android 環境搭建 [原] OpenGL ES 學習筆記 (一) Android OpenGL ES 畫球體 OpenGL ES學習筆記（三）——紋理