From 8fc89b2f851ecbfd35ef6eabd546f6a7e4413483 Mon Sep 17 00:00:00 2001 From: wycwang Date: Thu, 23 Feb 2023 15:43:07 +0800 Subject: [PATCH 1/4] add GLSL_cluster_culling_shader --- .../GLSL_HUAWEI_cluster_culling_shader.txt | 276 ++++++++++++++++++ 1 file changed, 276 insertions(+) create mode 100644 extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt diff --git a/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt b/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt new file mode 100644 index 0000000..a0335eb --- /dev/null +++ b/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt @@ -0,0 +1,276 @@ +Name + HUAWEI_cluster_culling_shader + +Name Strings + + GL_HUAWEI_cluster_culling_shader + +Contact + + YuChang Wang , HUAWEI + + +Contributors + + YuChang Wang, HUAWEI + +Status + + Draft + + +Version + + Last Modified Date: 2022-11-14 + Revision: 1 + +Dependencies + + This extension can be applied to OpenGL GLSL versions 4.60.7 + (#version 460) and higher. + + This extension can be applied to OpenGL ES ESSL versions 3.20 + (#version 320) and higher. + + This extension is written against revision 7 of the OpenGL Shading Language version 4.60, + dated July 10, 2019, and can be applied to OpenGL ES ESS version 3.20, dated July 10, 2019. + + This extension interacts with revision 43 of the GL_KHR_vulkan_glsl extension, dated October 25, 2017. + + +Overview + + This extension allowing application to use a new programmable shader type -- Cluster Culling Shader -- + to execute geometry culling on GPU. This mechanism does not require pipeline barrier between compute shader + and other rendering pipeline. + + This new shader types have execution environments similar to that of compute shaders, where a collection of + shader invocations form a workgroup and cooperate to perfrom coarse level culling and emit one or more + drawing command to the subsequent rendering pipeline to draw visible clusters. + + +Modifications to the OpenGL Shading Language Specification, Version 4.60.7 + + Including the following line in a shader can be used to control the language features described in this extension: + #extension GL_HUAWEI_cluster_culling_shader : + where is as specified in section 3.3. + A new preprocessor #define is added to the OpenGL Shading Language: + #define GL_HUAWEI_cluster_culling_shader + + + Modify the introduction to Chapter 2, Overview of OpenGL Shading (p.6) + + (modify first paragraph) ... Currently, these processors are the vertex, + tessellation control, tessellation evaluation, geometry, fragment, + compute, and cluster culling processors. + + (modify second paragraph) ... The specific languages will be referred to + by the name of the processor they target: vertex, tessellation control, + tessellation evaluation, geometry, fragment, compute, or cluster culling. + + Insert new sections at the end of Chapter 2 (p.8) + + Section 2.7, Cluster Culling Processor + + Cluster Culling Shader(CCS) is similar to the existing compute shader; its main purpose is to provide an + execution environment in order to perform coarse-level geometry culling and level-of-detail selection more + efficiently on GPU. + + The traditional 2-pass GPU culling solution using compute shader needs a pipeline barrier between compute + pipeline and graphics pipeline, sometimes, in order to optimize performance, an additional compaction + process may also be required. this extension improve the above mention shortcomings which can allow compute + shader directly emit visible clusters to following graphics pipeline. + + A set of new built-in output variables are used to express visible cluster, in addition, a new built-in + function is used to emit these variables from CCS to subsequent rendering pipeline, then IA can use these + variables to fetches vertices of visible cluster and drive vertex shader to shading these vertices. + As stated above, both IA and vertex shader are perserved, vertex shader still used for vertices position + shading, instead of directly outputting a set of transformed vertices from compute shader, this makes CCS + more suitable for mobile GPUs. + + + Modify Section 4.3.4, Input Variables (p. 50) + (add below sentence after the last paragraph, p53) + + All built-in input variables of Cluster Culling Shader are the same as Compute Shader, no other new ones are added. + + + Modify Section 4.3.6, Output Variables(p.54) + (modify last paragraph to add cluster culling shaders, p.54) + + It is a compile-time error to declare a vertex, tessellation evaluation, + tessellation control, geometry or cluster culling shader output that contains any of the following: ... + + + Modify Section 4.3.8, Shared Variables(p.57) + (modify first paragraph of the section, p57) + The shared qualifier is used to declare variables that have storage shared between all work items in a compute, + cluster culling shader local work group. Variables declared as shared may only be used in compute, cluster culling + shaders. ... + + + Modify Section 4.4, Layout Qualifiers, p. 62 + (modify the layout qualifier table, pp. 63-66) + + Layout Qualifier | Qualifier | Individual | Block | Block | Allowed interfaces + | only | variabl | | Member | + -------------------+-----------+------------+-------+--------+-------------------- + local_size_x = | | | | | compute in + local_size_y = | X | | | | cluster culling in + local_size_z = | | | | | + ---------------------+-----------+------------+-------+--------+-------------------- + + + + Modify Section in 4.4.1, Cluster Culling Shader Inputs, p.66 + (add below sentence after the last paragraph, p76) + (note: the content of this section is nearly identical to the content of section 4.4.1, Compute Shader Inputs) + There are no layout location qualifiers for cluster culling shader inputs. Layout qualifier identifiers for cluster + culling shader inputs are the work group size qualifiers: + + layout-qualifier-id : + local_size_x = integer-constant-expression + local_size_y = integer-constant-expression + local_size_z = integer-constant-expression + + These cluster culling shader input layout qualifers behave identically to the + equivalent compute shader qualifiers and specify a fixed local group size + used for each cluster culling shader work group. If no size is specified in any of + the three dimensions, a default size of one will be used. + + Modify Section 7.1, Built-In Language Variables (p.138) + (add 7.1.7 Cluster Culling Shader Special Variable , p.148) + (modify 7.1.7. Compatibility Profile Built-In Language Variables to 7.1.8. Compatibility Profile Built-In Language Variables, p.148) + + In the cluster culling language, built-in variables are intrinsically declared as: + + const uvec3 gl_WorkGroupSize; + in uvec3 gl_WorkGroupID; + in uvec3 gl_LocalInvocationID; + in uvec3 gl_GlobalInvocationID; + in uint gl_LocalInvocationIndex; + + // type 1 (non-indexed mode) + out gl_PerClusterHUAWEI + { + uint gl_VertexCountHUAWEI; + uint gl_InstanceCountHUAWEI; + uint gl_FirstVertexHUAWEI; + uint gl_FirstInstanceHUAWEI; + uint gl_ClusterIDHUAWEI; + } + // type 2 (indexed mode) + out gl_PerClusterHUAWEI + { + uint gl_IndexCountHUAWEI; + uint gl_InstanceCountHUAWEI; + uint gl_FirstIndexHUAWEI ; + int gl_VertexOffsetHUAWEI; + uint gl_FirstInstanceHUAWEI; + uint gl_ClusterIDHUAWEI; + } + + + Cluster culling shader input variables + gl_WorkGroupSize, gl_WorkGroupID, gl_LocalInvocationID, gl_GlobalInvocationID, gl_LocalInvocationIndex are used + in the same fashion as the corresponding input variables in the computer shader. + + + Cluster culling shader output variables + cluster culling shader have the following built-in output variables. + + gl_IndexCountHUAWEI is the number of vertices to draw in indexed mode. + gl_VertexCountHUAWEI is the number of vertices to draw. + gl_InstanceCountHUAWEI is the number of instances to draw. + gl_FirstIndexHUAWEI is the base index within the index buffer. + gl_FirstVertexHUAWEI is the index of the first vertex to draw. + gl_VertexOffsetHUAWEI is the value added to the vertex index before indexing into the vertex buffer. + gl_FirstInstanceHUAWEI is the instance ID of the first instance to draw. + gl_ClusterIDHUAWEI is the index of cluster being rendered by this drawing command. + + + (modify the discussion of the built-in variables shared with compute shaders, which starts on p. 147) + The built-in constant gl_WorkGroupSize is a compute, clust culling shader + constant containing the local work-group size of the shader. The size ... + + The built-in variable gl_WorkGroupID is a compute, cluster culling shader + input variable containing the three-dimensional index of the global work + group that the current invocation is executing in. ... + + The built-in variable gl_LocalInvocationID is a compute, cluster culling + shader input variable containing the three-dimensional index of the local + work group within the global work group that the current invocation is + executing in. ... + + The built-in variable gl_GlobalInvocationID is a compute, cluster culling + shader input variable containing the global index of the current work + item. This value uniquely identifies this invocation from all other + invocations across all local and global work groups initiated by the + current DispatchCompute or DispatchMeshTasksNV call or by a previously + executed task shader. ... + + The built-in variable gl_LocalInvocationIndex is a compute, cluster culling + hader input variable that contains the one-dimensional representation of + the gl_LocalInvocationID. + + + Modify Section 8.16, Shader Invocation Control Functions, p. 201 + (modify first paragraph of the section, p. 201) + The shader invocation control function is available only in tessellation + control, compute, and cluster culling shaders. It is used + to control the relative execution order of multiple shader invocations + used to process a patch (in the case of tessellation control shaders) or a + local work group (in the case of compute, cluster culling shaders), which + are otherwise executed with an undefined relative order. + + (modify the last paragraph, p. 201) + For compute, cluster culling shaders, the barrier() function may be placed + within flow control, but that flow control must be uniform flow control. + + Modify Section 8.17, Shader Memory Control Functions, p. 201 + + (modify table of functions, p. 202) + + void memoryBarrierShared() + + Control the ordering of memory transactions to shared variables issued + within a single shader invocation. + + Only available in compute, cluster culling shaders. + + + void groupMemoryBarrier() + + Control the ordering of all memory transactions issued within a single + shader invocation, as viewed by other invocations in the same work + group. + + Only available in compute, cluster culling shaders. + + + (modify last paragraph, p. 202) + + ... all of the above variable types. The functions memoryBarrierShared() + and groupMemoryBarrier() are available only in compute, cluster culling + shaders; the other functions are available in all shader types. + + + (modify last paragraph, p. 203) + + ... When using the function groupMemoryBarrier(), this ordering guarantee + applies only to other shader invocations in the same compute, cluster culling shader work group; all other memory barrier + functions provide the guarantee to all other shader invocations. ... + + + + + +Issues + + TBD + +Revision History + + Rev. Date Changes + ------ ----------------- ---------------------------------------- + 1 2022-11-14 Initial draft \ No newline at end of file From 67aa888ea051900df241c5ca17e80696ef4dc6f3 Mon Sep 17 00:00:00 2001 From: wycwang Date: Sat, 4 Mar 2023 13:05:45 +0800 Subject: [PATCH 2/4] Update README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index a4c3c6d..e2ecfed 100644 --- a/README.md +++ b/README.md @@ -80,3 +80,5 @@ which normatively accepts SPIR-V but does not normatively consume a high-level s - [GL_EXT_mesh_shader](https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GLSL_EXT_mesh_shader.txt) - [GL_EXT_opacity_micromap](https://github.com/KhronosGroup/GLSL/blob/master/extensions/ext/GLSL_EXT_opacity_micromap.txt) - [GL_NV_shader_invocation_reorder](https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_shader_invocation_reorder.txt) +- [GL_HUAWEI_cluster_culling_shader](https://github.com/KhronosGroup/GLSL/blob/master/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt) + From 78787dc192b4976f7eef812666f8d90131e14fd2 Mon Sep 17 00:00:00 2001 From: wycwang Date: Thu, 9 Mar 2023 12:01:06 +0800 Subject: [PATCH 3/4] Update GLSL_HUAWEI_cluster_culling_shader.txt --- extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt b/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt index a0335eb..c6894f6 100644 --- a/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt +++ b/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt @@ -16,7 +16,7 @@ Contributors Status - Draft + Complete Version @@ -267,10 +267,10 @@ Modifications to the OpenGL Shading Language Specification, Version 4.60.7 Issues - TBD + None. Revision History Rev. Date Changes ------ ----------------- ---------------------------------------- - 1 2022-11-14 Initial draft \ No newline at end of file + 1 2022-11-14 Initial draft From 6939802579d5bdd3629edac2c394c0cccf6529b7 Mon Sep 17 00:00:00 2001 From: wycwang Date: Mon, 4 Dec 2023 15:47:27 +0800 Subject: [PATCH 4/4] add cluster shading rate --- .../GLSL_HUAWEI_cluster_culling_shader.txt | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt b/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt index c6894f6..a612b3f 100644 --- a/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt +++ b/extensions/huawei/GLSL_HUAWEI_cluster_culling_shader.txt @@ -21,8 +21,8 @@ Status Version - Last Modified Date: 2022-11-14 - Revision: 1 + Last Modified Date: 2023-11-28 + Revision: 2 Dependencies @@ -86,7 +86,9 @@ Modifications to the OpenGL Shading Language Specification, Version 4.60.7 variables to fetches vertices of visible cluster and drive vertex shader to shading these vertices. As stated above, both IA and vertex shader are perserved, vertex shader still used for vertices position shading, instead of directly outputting a set of transformed vertices from compute shader, this makes CCS - more suitable for mobile GPUs. + more suitable for mobile GPUs. furthermore, CCS can also determine what shading rate a cluster uses for + rendering, e.g. use the distance between the cluster and the view point to determine the shading rate of + the cluster. This capability enables the cluster culling shader to reduce the rendering loading more effectively. Modify Section 4.3.4, Input Variables (p. 50) @@ -158,6 +160,7 @@ Modifications to the OpenGL Shading Language Specification, Version 4.60.7 uint gl_FirstVertexHUAWEI; uint gl_FirstInstanceHUAWEI; uint gl_ClusterIDHUAWEI; + uint gl_ClusterShadingRateHUAWEI; } // type 2 (indexed mode) out gl_PerClusterHUAWEI @@ -167,7 +170,8 @@ Modifications to the OpenGL Shading Language Specification, Version 4.60.7 uint gl_FirstIndexHUAWEI ; int gl_VertexOffsetHUAWEI; uint gl_FirstInstanceHUAWEI; - uint gl_ClusterIDHUAWEI; + uint gl_ClusterIDHUAWEI; + uint gl_ClusterShadingRateHUAWEI; } @@ -187,10 +191,11 @@ Modifications to the OpenGL Shading Language Specification, Version 4.60.7 gl_VertexOffsetHUAWEI is the value added to the vertex index before indexing into the vertex buffer. gl_FirstInstanceHUAWEI is the instance ID of the first instance to draw. gl_ClusterIDHUAWEI is the index of cluster being rendered by this drawing command. + gl_ClusterShadingRateHUAWEI is the shading rate of cluster being rendered by this drawing command. (modify the discussion of the built-in variables shared with compute shaders, which starts on p. 147) - The built-in constant gl_WorkGroupSize is a compute, clust culling shader + The built-in constant gl_WorkGroupSize is a compute, cluster culling shader constant containing the local work-group size of the shader. The size ... The built-in variable gl_WorkGroupID is a compute, cluster culling shader @@ -263,8 +268,6 @@ Modifications to the OpenGL Shading Language Specification, Version 4.60.7 - - Issues None. @@ -273,4 +276,5 @@ Revision History Rev. Date Changes ------ ----------------- ---------------------------------------- - 1 2022-11-14 Initial draft + 1 2022-11-14 Initial draft + 2 2023-11-28 Add per-cluster shading rate