AMD 64 位双核优化

发布于 2024-07-04 17:20:20 字数 279 浏览 9 评论 0原文

我们有一个图形密集型应用程序,它似乎在 AMD 64 位双核平台上遇到了一些问题,而这些问题在 Intel 平台上并不明显。

运行应用程序会导致 CPU 以 100% 的速度运行,特别是在使用阴影和照明代码 (Open GL) 时。

有谁知道 AMD 处理器可能导致此问题的具体问题,或者知道在哪里查找问题,和/或优化代码库以避免这些问题的方法?

注意,该应用程序通常在中档硬件上运行良好,我的开发机器有一个 nvidia gtx260 卡,因此电源不足应该不是问题

We have a graphics intensive application that seems to be experiencing problems on AMD 64 bit Dual Core platforms that are not apparent on Intel platforms.

Running the application causes the CPU to run at 100%, in particular when using code for shadows and lighting (Open GL).

Does anyone know of specific issues with AMD processors that could cause this or know where to track down the problem, and/or ways to optimize the code base to avoid these issues?

note, the application generally works well on mid range hardware, my dev machine has an nvidia gtx260 card in, so lack of power should not be an issue

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

々眼睛长脚气 2024-07-11 17:20:20

我会投资分析软件来追踪问题的实际原因。

在 Linux 上,Valgrind(包含 Cachegrind 和 Callgrind)+ KCacheGrind 可以找出所有繁重的函数调用发生的位置。

此外,使用完整的调试符号进行编译,它甚至可以在慢速函数调用时显示汇编代码。

如果您使用的是英特尔特定编译器,这可能是您问题的一部分(但不是确定的),请尝试 GCC 系列。

此外,如果您还没有深入了解 OpenMP 和线程,您可能还想深入了解。

I would invest in profiling software to trace down the actual cause of the problem.

On linux, Valgrind ( which contains Cachegrind & Callgrind ) + KCacheGrind can make working out where all the heavy function calls are going on.

Also, compile with full debug symbols and it can even show the assembley code at the slow function calls.

If you're using an Intel Specific compiler, this may be part of your problem ( not definate tho ), and try the GCC family.

Also, you may want to dive into OpenMP and Threads if you haven't already.

长梦不多时 2024-07-11 17:20:20

嗯 - 如果您使用阴影,GPU 应该处于负载状态,因此 GPU 渲染帧的速度不可能比 CPU 发送图形数据的速度快。 在这种情况下,100% 负载是可以的,甚至是预期的。

它可能只是一个无聊的 OpenGL 驱动程序,确实在某个地方的自旋锁中消耗了 CPU 周期。 要了解到底发生了什么,我建议您运行一个分析工具,例如 AMD 的 Code Analyst(我上次使用它时免费的)。

花几分钟分析一下你的程序,看看时间都花在哪里了。 如果您在 opengl 驱动程序中看到一个大峰值,而不是在您的应用程序中看到一个大峰值,请获取新的驱动程序。 否则你至少知道发生了什么。

顺便说一句 - 让我猜一下,您使用的是 ATI 卡,对吧? 我不想冒犯任何 ATI 粉丝,但他们的 OpenGL 驱动器并不十分出色。 如果您不幸运,您甚至可能使用该卡不支持的功能或由于芯片错误而被禁用的功能。 在这种情况下,驱动程序将回退到软件光栅化模式。 即使您的程序是单线程的,这也会大大减慢速度并给您带来 100% 的 CPU 负载。

Hm - if you use shadows the GPU should be under load, so it's unlikely that the GPU renders the frames faster than the CPU sends graphic data. In this case 100% load is ok and even expected.

It could simply be a borked OpenGL driver that does burns CPU-cycles in a spinlock somewhere. To find out what's exactly going on I suggest you run a profiling tool such as Code Analyst from AMD (free last time I've used it).

Profile your program a couple of minutes and take a look where the time is spent. If you see a big peak in the opengl drivers and not in your application get a new driver. Otherwise you at least get an idea what's going on.

Btw - let me guess, you're using an ATI card, right? I don't want to offend any ATI fans out there, but their OpenGL-drives are not exactly stellar. If you're unlucky you may even used a feature that the card does not support or that is disabled due to a silicon bug. The driver will fallback into software rasterization mode in this case. This will slow down things a lot and give you a 100% CPU-Load even if your program is single-threaded.

酒几许 2024-07-11 17:20:20

根据您完成阴影和其他图形代码的方式,您可能已经“脱离了快速路径”并且图形驱动程序已开始进行软件模拟。 如果您有复杂的管道,或者在着色器代码中使用太多条件(或太多指令),则可能会发生这种情况。

我会确保这个特定的显卡支持您正在使用的所有功能。

Depending on how you've done your shadows and other graphics code, it possible that youve "fallen off the fast path" and the graphics driver has started doing software emulation. This can happen if you have complicated pipelines, or are using too many conditionals (or just too many instructions) in shader code.

I would make sure that this particular graphics card supports all the features you are using.

农村范ル 2024-07-11 17:20:20

迟到的答案在这里。

不知道这是否相关,但在某些 win32 OpenGL 驱动程序中,SwapBuffers() 在等待 vsync 时不会让出 CPU,因此很容易获得 100% CPU 利用率。

我使用的解决方案是测量自上次 SwapBuffers() 完成以来的时间,这告诉我下一个垂直同步还有多远。 因此,在调用 SwapBuffers() 之前,我会进行短暂的 Sleep() 操作,直到检测到即将发生垂直同步为止。 这样 SwapBuffers() 就不必等待 vsync 很长时间,因此不会过度占用 CPU。

请注意,您可能必须使用 timeBeginPeriod() 来获得足够的 Sleep() 精度,才能可靠地工作。

Late answer here.

Dunno if this is related, but in some win32 OpenGL drivers, SwapBuffers() will not yield the CPU while waiting for vsync, making it very easy to get 100% CPU utilisation.

The solution I use to this is to measure the time since the last SwapBuffers() completed, which tells me how far away the next vsync is. So before calling SwapBuffers(), I take short Sleep()s until I detect that vsync is imminent. This way SwapBuffers() doesn't have to wait long for vsync, and so doesn't hog the CPU too badly.

Note that you may have to use timeBeginPeriod() to get sufficient Sleep() precision for this to work reliably.

百合的盛世恋 2024-07-11 17:20:20

请注意,AMD64 是一种 NUMA 架构 - 如果您使用的是多处理器设备,可能会在超传输总线上运行大量内存访问,这会比本地内存慢,并且可以解释该行为。

单插槽上的内核之间不会出现这种情况,因此如果您不使用多插槽计算机,请随意忽略这一点。

Linux 具有 NUMA 意识(即,它具有通过本地存储体分配内存并将进程绑定到特定 CPU 的系统服务)。 我相信 Win 2k3 服务器、2k8 和 Vista 可以识别 NUMA,但 XP 不能。 大多数专有的 UNIX 变体(例如 Solaris)也具有 NUMA 支持。

Note that AMD64 is a NUMA architecture - if you are using a multi-processor box you may be running lots of memory accesses across the hypertransport bus which will be slower than the local memory and may explain the behaviour.

This will not be the case between cores on a single socket so feel free to ignore this if you are not using a multiple-socket machine.

Linux is NUMA aware (i.e. it has system services to allocate memory by local bank and bind processes to specific CPU's). I believe that Win 2k3 server, 2k8 and Vista are NUMA aware but XP is not. Most of the proprietary unix variants such as Solaris have NUMA support as well.

貪欢 2024-07-11 17:20:20

此外,缓存不是共享的,这可能会导致在多个线程之间共享数据时性能下降。

Also the cache is not shared, which might cause a lack of performance when sharing data among multiple threads.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文