Alderlake GT1上的着色器编译器:SIMD32着色器效率低下

发布于 2025-02-06 14:54:43 字数 1928 浏览 3 评论 0原文

当我在Alderlake GT1集成的GPU上编译并链接GLSL着色器时,我会收到警告:

simd32着色器效率低下

通过 gldebugmessagecallbackarbbbackarb 机制。

我想调查是否可以避免这种效率低下,但我不确定如何获得有关此警告的更多信息。

驱动程序的完整输出,对于此着色器:

WRN [Shader Compiler][Other]{Notification}: VS SIMD8 shader: 11 inst, 0 loops, 40 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 176 to 112 bytes.

WRN [API][Performance]{Notification}: SIMD32 shader inefficient

WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

邮件是在片段着色器编译期间创建的。

我的Vertex着色器:

#version 150
in mediump vec2 position;
out lowp vec4 clr;
uniform mediump vec2 rotx;
uniform mediump vec2 roty;
uniform mediump vec2 translation;
uniform lowp vec4 colour;
void main()
{
    gl_Position.x = dot( position, rotx ) + translation.x;
    gl_Position.y = dot( position, roty ) + translation.y;
    gl_Position.z = 1.0;
    gl_Position.w = 1.0;
    clr = colour;
}

我的片段着色器:

#version 150
in  lowp vec4 clr;
out lowp vec4 fragColor;
void main()
{
    fragColor = clr;
}

也就是说,我怀疑它是特定于阴暗器的,因为它似乎为我在此平台上使用的每个着色器报告了这一点吗?

GL渲染器: Mesa Intel(r)图形(ADL-S GT1)

OS: ubuntu 22.04

gpu: alderlake-alderlake-s gt1

api : OpenGL 3.2 Core Profile

GLSL版本: 150

When I compile and link my GLSL shader on an Alderlake GT1 integrated GPU, I get the warning:

SIMD32 shader inefficient

This warning is reported via glDebugMessageCallbackARB mechanism.

I would like to investigate if I can avoid this inefficiency, but I am not sure how to get more information on this warning.

The full output from the driver, for this shader:

WRN [Shader Compiler][Other]{Notification}: VS SIMD8 shader: 11 inst, 0 loops, 40 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 176 to 112 bytes.

WRN [API][Performance]{Notification}: SIMD32 shader inefficient

WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

The messages are created during the fragment shader compiling, by the way.

My vertex shader:

#version 150
in mediump vec2 position;
out lowp vec4 clr;
uniform mediump vec2 rotx;
uniform mediump vec2 roty;
uniform mediump vec2 translation;
uniform lowp vec4 colour;
void main()
{
    gl_Position.x = dot( position, rotx ) + translation.x;
    gl_Position.y = dot( position, roty ) + translation.y;
    gl_Position.z = 1.0;
    gl_Position.w = 1.0;
    clr = colour;
}

My fragment shader:

#version 150
in  lowp vec4 clr;
out lowp vec4 fragColor;
void main()
{
    fragColor = clr;
}

That said, I doubt it is shader specific, because it seems to report this for every shader I use on this platform?

GL RENDERER: Mesa Intel(R) Graphics (ADL-S GT1)

OS: Ubuntu 22.04

GPU: AlderLake-S GT1

API: OpenGL 3.2 Core Profile

GLSL Version: 150

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

云朵有点甜 2025-02-13 14:54:43

这似乎来自梅萨的一部分,这是英特尔碎片着色器编译器。

查看此代码,看来编译器有三个选项:使用 simd8 simd16 simd32 。这是指宽度,而不是零位。因此,SIMD32是32宽的SIMD。

编译器使用启发式词来查看SIMD32版本是否会有效,如果没有,则跳过该选项。

当然,这种启发式方法可以弄错它,因此可以选择迫使BRW编译器尝试SIMD32。

环境变量设置intel_debug = do32也会告诉编译器尝试SIMD32。

当我在系统上对此进行测试时,我确实观察到驾驶员现在报告了三个不同的结果:

WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

WRN [Shader Compiler][Other]{Notification}: FS SIMD32 shader: 10 inst, 0 loops, 928 cycles, 0:0 spills:fills, 2 sends, scheduled with mode top-down, Promoted 0 constants, compacted 160 to 96 bytes.

观察到,在这种情况下,启发式肯定正确:循环是SIMD8的近50倍?

有趣的事实:brw 代表broadwater Gen4图形。但是Gen12 Intel GPU仍然使用此编译器。

This seems to come from an Intel fragment shader compiler, that is part of Mesa.

brw_fs.cpp

Looking at this code, it seems that the compiler has three options: to use SIMD8, SIMD16 or SIMD32. This refers to widths, not to bits. So SIMD32 is 32-wide SIMD.

The compiler uses a heuristic to see if the SIMD32 version will be efficient, and if not, it skips that option.

Of course, this heuristic can get it wrong, so there is an option to force the BRW compiler to try SIMD32 regardless.

The environment variable setting INTEL_DEBUG=do32 will tell the compiler to try the SIMD32 as well.

When I tested this on my system, I indeed observed that the driver now reports three different results:

WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

WRN [Shader Compiler][Other]{Notification}: FS SIMD32 shader: 10 inst, 0 loops, 928 cycles, 0:0 spills:fills, 2 sends, scheduled with mode top-down, Promoted 0 constants, compacted 160 to 96 bytes.

Observe that in this case, the heuristic definitely got it right: almost 50 times more cycles than SIMD8?

Fun fact: BRW stands for Broadwater, gen4 graphics. But gen12 Intel GPUs still use this compiler.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文