Alderlake GT1上的着色器编译器:SIMD32着色器效率低下
当我在Alderlake GT1集成的GPU上编译并链接GLSL着色器时,我会收到警告:
simd32着色器效率低下
通过 gldebugmessagecallbackarbbbackarb 机制。
我想调查是否可以避免这种效率低下,但我不确定如何获得有关此警告的更多信息。
驱动程序的完整输出,对于此着色器:
WRN [Shader Compiler][Other]{Notification}: VS SIMD8 shader: 11 inst, 0 loops, 40 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 176 to 112 bytes.
WRN [API][Performance]{Notification}: SIMD32 shader inefficient
WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
邮件是在片段着色器编译期间创建的。
我的Vertex着色器:
#version 150
in mediump vec2 position;
out lowp vec4 clr;
uniform mediump vec2 rotx;
uniform mediump vec2 roty;
uniform mediump vec2 translation;
uniform lowp vec4 colour;
void main()
{
gl_Position.x = dot( position, rotx ) + translation.x;
gl_Position.y = dot( position, roty ) + translation.y;
gl_Position.z = 1.0;
gl_Position.w = 1.0;
clr = colour;
}
我的片段着色器:
#version 150
in lowp vec4 clr;
out lowp vec4 fragColor;
void main()
{
fragColor = clr;
}
也就是说,我怀疑它是特定于阴暗器的,因为它似乎为我在此平台上使用的每个着色器报告了这一点吗?
GL渲染器: Mesa Intel(r)图形(ADL-S GT1)
OS: ubuntu 22.04
gpu: alderlake-alderlake-s gt1
api : OpenGL 3.2 Core Profile
GLSL版本: 150
When I compile and link my GLSL shader on an Alderlake GT1 integrated GPU, I get the warning:
SIMD32 shader inefficient
This warning is reported via glDebugMessageCallbackARB mechanism.
I would like to investigate if I can avoid this inefficiency, but I am not sure how to get more information on this warning.
The full output from the driver, for this shader:
WRN [Shader Compiler][Other]{Notification}: VS SIMD8 shader: 11 inst, 0 loops, 40 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 176 to 112 bytes.
WRN [API][Performance]{Notification}: SIMD32 shader inefficient
WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
The messages are created during the fragment shader compiling, by the way.
My vertex shader:
#version 150
in mediump vec2 position;
out lowp vec4 clr;
uniform mediump vec2 rotx;
uniform mediump vec2 roty;
uniform mediump vec2 translation;
uniform lowp vec4 colour;
void main()
{
gl_Position.x = dot( position, rotx ) + translation.x;
gl_Position.y = dot( position, roty ) + translation.y;
gl_Position.z = 1.0;
gl_Position.w = 1.0;
clr = colour;
}
My fragment shader:
#version 150
in lowp vec4 clr;
out lowp vec4 fragColor;
void main()
{
fragColor = clr;
}
That said, I doubt it is shader specific, because it seems to report this for every shader I use on this platform?
GL RENDERER: Mesa Intel(R) Graphics (ADL-S GT1)
OS: Ubuntu 22.04
GPU: AlderLake-S GT1
API: OpenGL 3.2 Core Profile
GLSL Version: 150
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这似乎来自梅萨的一部分,这是英特尔碎片着色器编译器。
查看此代码,看来编译器有三个选项:使用 simd8 , simd16 或 simd32 。这是指宽度,而不是零位。因此,SIMD32是32宽的SIMD。
编译器使用启发式词来查看SIMD32版本是否会有效,如果没有,则跳过该选项。
当然,这种启发式方法可以弄错它,因此可以选择迫使BRW编译器尝试SIMD32。
环境变量设置
intel_debug = do32
也会告诉编译器尝试SIMD32。当我在系统上对此进行测试时,我确实观察到驾驶员现在报告了三个不同的结果:
观察到,在这种情况下,启发式肯定正确:循环是SIMD8的近50倍?
有趣的事实:brw 代表broadwater Gen4图形。但是Gen12 Intel GPU仍然使用此编译器。
This seems to come from an Intel fragment shader compiler, that is part of Mesa.
brw_fs.cpp
Looking at this code, it seems that the compiler has three options: to use SIMD8, SIMD16 or SIMD32. This refers to widths, not to bits. So SIMD32 is 32-wide SIMD.
The compiler uses a heuristic to see if the SIMD32 version will be efficient, and if not, it skips that option.
Of course, this heuristic can get it wrong, so there is an option to force the BRW compiler to try SIMD32 regardless.
The environment variable setting
INTEL_DEBUG=do32
will tell the compiler to try the SIMD32 as well.When I tested this on my system, I indeed observed that the driver now reports three different results:
Observe that in this case, the heuristic definitely got it right: almost 50 times more cycles than SIMD8?
Fun fact: BRW stands for Broadwater, gen4 graphics. But gen12 Intel GPUs still use this compiler.