我应该用 C/C++ 重写我的 DSP 例程吗? 或者我擅长 C# 不安全指针?

发布于 2024-07-08 04:24:54 字数 475 浏览 11 评论 0原文

我目前正在编写一个 C# 应用程序,该应用程序执行大量数字信号处理,其中涉及大量小型微调内存 xfer 操作。 我使用不安全指针编写了这些例程,它们的性能似乎比我最初想象的要好得多。 但是,我希望应用程序尽可能快。

用 C 或 C++ 重写这些例程是否会获得任何性能优势,或者我应该坚持使用不安全指针? 我想知道与 C/C++ 相比,不安全指针在性能方面带来了什么。

编辑:我在这些例程中没有做任何特殊的事情,只是普通的 DSP 内容:缓存友好的数据从一个数组传输到另一个数组,其中包含大量乘法、加法、移位等。 我希望 C/C++ 例程看起来与 C# 例程几乎相同(如果不相同)。

编辑:非常感谢大家的所有聪明答案。 我了解到,除非进行某种 SSE 优化,否则仅通过直接移植不会显着提高性能。 假设所有现代 C/C++ 编译器都可以利用它,我期待着尝试一下。 如果有人对结果感兴趣,请告诉我,我会将其发布到某个地方。 (不过可能需要一段时间)。

I'm currently writing a C# application that does a lot of digital signal processing, which involves a lot of small fine-tuned memory xfer operations. I wrote these routines using unsafe pointers and they seem to perform much better than I first thought. However, I want the app to be as fast as possible.

Would I get any performance benefit from rewriting these routines in C or C++ or should I stick to unsafe pointers? I'd like to know what unsafe pointers brings to the table in terms of performance, compared to C/C++.

EDIT: I'm not doing anything special inside these routines, just the normal DSP stuff: cache friendly data transfers from one array to the other with a lot of multiplications, additions, bit shiftings etc. in the way. I'd expect the C/C++ routines to look pretty much the same (if not identical) as their C# counterparts.

EDIT: Thanks a lot to everyone for all the clever answers. What I've learned is that I won't get any significant boost in performance just by doing a direct port, unless some sort of SSE optimization takes place. Assuming that all modern C/C++ compilers can take advantage of it I'm looking forward to give it a try. If someone is interested in the results just let me know and I'll post them somewhere. (May take a while though).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(13

坐在坟头思考人生 2024-07-15 04:24:54

实际上,我已经完成了几乎完全符合您要求的工作,只是在图像处理领域。 我从 C# 不安全指针开始,然后转向 C++/CLI,现在我用 C++ 编写所有代码。 事实上,从那时起,我从 C++ 中的指针更改为 SSE 处理器指令,所以我一路走来。 还没有达到汇编器,虽然我不知道是否需要,我在 CodeProject 上看到一篇文章,表明 SSE 可以和内联汇编器一样快,如果你想要我可以找到它。

在我进行过程中,我的算法从使用不安全指针的 C# 中的每秒大约 1.5-2 帧,变为现在的每秒 40 帧。 C# 和 C++/CLI 肯定比 C++ 慢,即使使用指针,我也无法使用这些语言达到每秒 10 帧以上。 当我切换到 C++ 后,我立即获得了每秒 15-20 帧的速度。 经过一些更巧妙的更改,SSE 使我的速度达到了每秒 40 帧。 所以,是的,根据我的经验,如果你想要速度,那么值得下去。 有明显的性能增益。

I've actually done pretty much exactly what you're asking, only in an image processing area. I started off with C# unsafe pointers, then moved into C++/CLI and now I code everything in C++. And in fact, from there I changed from pointers in C++ to SSE processor instructions, so I've gone all the way. Haven't reached assembler yet, although I don't know if I need to, I saw an article on CodeProject that showed SSE can be as fast as inline assembler, I can find it if you want me to.

What happened as I went along was my algorithm went from around 1.5-2 frames per second in C# with unsafe pointers, to 40 frames per second now. C# and C++/CLI were definitely slower than C++, even with pointers, I haven't been able to get above 10 frames per second with those languages. As soon as I switched to C++, I got something like 15-20 frames per second instantly. A few more clever changes and SSE got me up to 40 frames per second. So yes, it is worth going down if you want speed in my experience. There is a clear performance gain.

空袭的梦i 2024-07-15 04:24:54

优化 DSP 代码的另一种方法是使其缓存友好。 如果您有很多过滤器要应用于您的信号,您应该将所有过滤器应用于每个点,即您最内层的循环应该在过滤器上而不是在数据上,例如:

for each n do t´[n] = h(g(f(t[n])))

这样您将减少缓存的垃圾并且将最有可能获得良好的速度提升。

Another way to optimize DSP code is to make it cache friendly. If you have a lot of filters to apply to your signal you should apply all the filters to each point, i.e. your innermost loop should be over the filters and not over data, e.g.:

for each n do t´[n] = h(g(f(t[n])))

This way you will trash the cache a lot less and will most likely gain a good speed increase.

野鹿林 2024-07-15 04:24:54

我认为你应该用 C++(托管或非托管)或 C# 编写 DSP 例程,使用可靠的设计,但不要尝试从一开始就优化所有内容,然后你应该分析你的代码并找到瓶颈并尝试优化它们离开。

尝试从一开始就生成“最佳”代码将会分散您对编写工作代码的注意力。 请记住,80% 的优化只会影响 20% 的代码,因为在很多情况下,只有 10% 的代码负责 90% 的 CPU 时间。 (YMMV,因为它取决于应用程序的类型)

当我试图优化图形工具包中 alpha 混合的使用时,我首先尝试以“裸机”方式使用 SIMD:内联汇编器。 很快我发现使用 SIMD 内在函数比纯汇编更好,因为编译器能够通过重新排列各个操作码并最大化 CPU 中不同处理单元的使用来进一步优化具有内在函数的可读 C++。

不要低估编译器的力量!

I think you should write your DSP routines either in C++ (managed or unmanaged) or in C#, using a solid design but without trying to optimize everything from the start, and then you should profile your code and find the bottlenecks and try to optimize those away.

Trying to produce "optimal" code from the start is going to distract you from writing working code in the first place. Remember that 80% of your optimization is only going to affect 20% of your code as in a lot of cases only 10% of your code is responsible for 90% of your CPU time. (YMMV, as it depends on the type of application)

When I was trying to optimize our use of alpha blending in our graphics toolkit, I was trying to use SIMD the "bare metal" way first: inline assembler. Soon I found out that it's better to use the SIMD intrinsics over pure assembly, since the compiler is able to optimize readable C++ with intrinsics further by rearranging the individual opcodes and maximize the use of the different processing units in the CPU.

Don't underestimate the power of your compiler!

り繁华旳梦境 2024-07-15 04:24:54

我可以获得任何性能优势吗
用 C/C++ 重写这些例程
或者我应该坚持使用不安全的指针?

从理论上讲,这并不重要 - 完美的编译器会将代码(无论是 C 还是 C++)优化为最好的汇编程序。

然而,实际上,C 几乎总是更快,特别是对于指针类型算法 - 它尽可能接近机器代码,而无需进行汇编编码。

C++ 在性能方面没有带来任何好处 - 它是作为 C 的面向对象版本构建的,为程序员提供了更多的功能和易用性。 虽然对于某些事情它会执行得更好,因为给定的应用程序将受益于面向对象的观点,但它并不意味着执行得更好 - 它的目的是提供另一个抽象级别,以便更容易地编写复杂的应用程序。

因此,不,切换到 C++ 可能不会带来性能提升。

然而,对你来说,找出答案可能比避免花时间在它上面更重要——我认为移植它并分析它将是一项值得的活动。 如果您的处理器有某些用于 C++ 或 Java 使用的指令,并且编译器知道它们,那么它很可能能够利用 C 中不可用的功能。不太可能,但有可能。

然而,众所周知,DSP 处理器是非常复杂的野兽,越接近汇编,获得的性能就越好(即,您的代码需要手工调整得越多)。 C 比 C++ 更接近汇编。

-亚当

Would I get any performance benefit
from rewriting these routines in C/C++
or should I stick to unsafe pointers?

In theory it wouldn't matter - a perfect compiler will optimize the code, whether C or C++, into the best possible assembler.

In practice, however, C is almost always faster, especially for pointer type algorithms - It's as close as you can get to machine code without coding in assembly.

C++ doesn't bring anything to the table in terms of performance - it is built as an object oriented version of C, with a lot more capability and ease of use for the programmer. While for some things it will perform better because a given application will benefit from an object oriented point of view, it wasn't meant to perform better - it was meant to provide another level of abstraction so that programming complex applications was easier.

So, no, you will likely not see a performance increase by switching to C++.

However, it is likely more important for you to find out, than it is to avoid spending time on it - I think it would be a worthwhile activity to port it over and analyze it. It is quite possible that if your processor has certain instructions for C++ or Java usage, and the compiler knows about them, it may be able to take advantage of features unavailable in C. Unlikely, but possible.

However, DSP processors are notoriously complex beasts, and the closer you get to assembly, the better performance you can get (ie, the more hand-tuned your code is). C is much closer to assembly than C++.

-Adam

梦魇绽荼蘼 2024-07-15 04:24:54

首先让我回答有关“安全”与“不安全”的问题:您在帖子中说“我希望应用程序尽可能快”,这意味着您不想混淆“安全”或“托管”指针(甚至不提垃圾收集)。

关于您选择的语言:
C/C++ 让您可以更轻松地处理底层数据,而无需承担与当今每个人都在使用的精美容器相关的任何开销。 是的,被容器拥抱着很好,可以防止您出现段错误……但是与容器相关的更高级别的抽象毁坏了您的性能。

在我的工作中,我们的代码必须快速运行。 一个例子是我们工作中的多相重采样器,它使用指针和掩码操作以及定点 DSP 滤波...如果没有对内存和位操作的低级控制,这些巧妙的技巧实际上都不可能实现 ==> 所以我说坚持使用 C/C++。

如果你真的想变得聪明,用低级 C 编写所有 DSP 代码。然后将其与更安全的容器/托管指针混合在一起......当它达到速度时,你需要取下辅助轮......它们会减慢速度你太沮丧了。

(仅供参考,关于取下辅助轮:您需要额外离线测试您的 C DSP 代码,以确保它们的指针使用良好...o/w 它会出现段错误。)

编辑:ps“段错误”是一种奢侈适合所有 PC/x86 开发人员。 当您编写嵌入式代码时...段错误仅意味着您的处理器将进入 wuides 并且只能通过电源循环恢复;)。

First let me answer the question about "safe" vs "unsafe": You said in your post "I want the app to be as fast as possible" and that means you don't want to mess with "safe" or "managed" pointers ( don't even mention garbage collection ).

Regarding your choice of languages:
C/C++ lets you work with the underlying data much much more easily without any of the overhead associated with the fancy containers that everyone is using these days. Yes it is nice to be cuddled by containers that prevent you from seg-faulting... but the higher-level of abstraction associated with containers RUINS your performance.

At my job our code has to run fast. An example is our polyphase-resamplers at work that play with pointers and masking operations and fixed point DSP filtering ... none of these clever tricks are really possible without low level control of the memory and bit manipulations ==> so I say stick with C/C++.

If you really want to be smart write all your DSP code in low level C. And then intermingle it with the more safe containers/managed pointers... when it gets to speed you need to take off the training wheels... they slow you down too much.

( FYI, regarding taking the training wheels off: you need to test your C DSP code extra offline to make sure their pointer usage is good... o/w it will seg fault. )

EDIT: p.s. "seg faulting" is a LUXURY for all you PC/x86 developers. When you are writing embedded code... a seg fault just means your processor will go into the wuides and only be recovered by power cycling ;).

何必那么矫情 2024-07-15 04:24:54

为了了解如何获得性能提升,最好了解可能导致瓶颈的代码部分。

由于您谈论的是小型内存传输,我假设所有数据都适合 CPU 的缓存。 在这种情况下,您可以获得的唯一好处就是了解如何使用 CPU 的内在函数。 通常,最熟悉 CPU 内在函数的编译器是 C 编译器。 所以在这里,我认为你可以通过移植来提高性能。

另一个瓶颈是 CPU 和内存之间的路径 - 由于应用程序中存在大量内存传输而导致缓存未命中。 最大的好处在于最大限度地减少缓存未命中,这取决于您使用的平台以及数据的布局(是本地的还是通过内存分布的?)。

但由于您已经在使用不安全的指针,因此您可以自己控制这一点,所以我的猜测是:在这方面,您不会从 C(或 C++)的移植中受益匪浅。

结论:您可能希望将应用程序的一小部分移植到 C 中。

In order to know how you would get a performance gain, it's good to know the portions of code that could cause bottlenecks.

Since you're talking about small memory transfers, I assume all data will fit in the CPU's cache. In that case, the only gain you can achieve would be by knowing how to work the CPU's intrinsics. Typically, the compiler most familiar with the CPU's intrinsics is a C compiler. So here, I think you may improve performance by porting.

Another bottleneck will be on the path between CPU and memory - cache misses due to the big number of memory transfers in your application. The biggest gain will then lie in minimizing cache misses, which depend on the platform you use, and on the layout of your data (is it local or spread out through memory?).

But since you're already using unsafe pointers, you have that bit under your own control, so my guess is: on that aspect, you won't benefit much from a port to C (or C++).

Concluding: you may want to port small portions of your application into C.

动次打次papapa 2024-07-15 04:24:54

看到您已经在编写不安全的代码,我认为将其转换为 C dll 并从 C# 中调用它们会相对容易。 在确定程序中最慢的部分后执行此操作,然后将其替换为 C。

Seeing that you're writing in unsafe code already, I presume it would be relatively easy to convert that to a C dll and call them from within C#. Do this after you have identified the slowest parts of your program and then replace them with C.

°如果伤别离去 2024-07-15 04:24:54

你的问题很大程度上是哲学问题。 答案是:在进行分析之前不要进行优化。

你问你是否会获得进步。 好吧,你将获得 N% 的进步。 如果这就足够了(就像您需要在某些嵌入式系统上 20 毫秒内执行 200 次的代码),那就没问题了。 但如果还不够怎么办?

你必须先衡量,然后找出代码的某些部分是否可以用相同的语言重写,但速度更快。 也许您可以重新设计数据结构以避免不必要的计算。 也许你可以跳过一些内存重新分配。 也许某件事可以用线性复杂度来完成,但却用二次复杂度来完成。 在测量之前您不会看到它。 这通常比用另一种语言重写所有内容要少得多地浪费时间。

Your question is largely philosophical. The answer is this: dont't optimize until you profile.

You ask whether you'll gain improvement. Okay, you will gain improvement by N percent. If that's enough (like you need code that executes 200 times in 20 milliseconds on some embedded system) you're fine. But what if it's not enough?

You have to measure first and then find whether some parts of code could be rewritten in the same language but faster. Maybe you can redesign data structures to avoid unnecessary computations. Maybe you can skip on some memory reallocation. Maybe something is done with quadratic complexity when it could be done with linear complexity. You won't see it until you've measured it. This is usually much less of waste of time than just rewriting everything in another language.

空城缀染半城烟沙 2024-07-15 04:24:54

C# 不支持 SSE(但是,有一个用于 SSE 操作的 mono 项目)。 因此使用 SSE 的 C/C++ 肯定会更快。

但是,您必须小心托管到本机和本机到托管的转换,因为它们非常昂贵。 尽可能在这两个世界中停留尽可能长的时间。

C# has no support for SSE (yet, there is a mono project for SSE operations). Therefor C/C++ with SSE would definitely be faster.

You must, however, be careful with managed-to-native and native-to-managed transitions, as they are quite expensive. Stay as long in either world as possible.

歌入人心 2024-07-15 04:24:54

您真的希望应用程序尽可能快还是只是足够快? 这会告诉您下一步应该做什么。

Do you really want the app to be as fast as possible or simply fast enough? That tells you what you should do next.

还在原地等你 2024-07-15 04:24:54

如果您坚持使用手动操作,而不在汇编程序或类似程序中进行手动优化,那么 C# 应该没问题。 不幸的是,这是一种只能通过实验才能真正回答的问题。 您已经处于非托管指针空间中,因此我的直觉是直接移植到 C++ 不会在速度上出现显着差异。

不过,我应该说,我最近也遇到了类似的问题,在尝试了 英特尔集成性能基元 库。 我们在那里看到的性能改进非常令人印象深刻。

If you're insistent on sticking with your hand-roll, without hand-optimising in assembler or similar, the C# should be fine. Unfortunately, this is the kind of question that can only really be answered experimentally. You're already in unmanaged pointer space, so my gut feel is that a direct port to C++ would not see a significant difference in speed.

I should say, though, that I had a similar issue recently, and we ended up throwing away the hand-roll after trying the Intel Integrated Performance Primitives library. The performance improvements we saw there were very impressive.

嘦怹 2024-07-15 04:24:54

Mono 2.2 现在具有 SIMD 支持,您可以两全其美托管代码和原始速度。

您可能还想看看在 C# 中使用 SSE 可能吗?

Mono 2.2 now has SIMD support with this you can have the best of both worlds managed code and raw speed.

You might also want to have a look at Using SSE in c# is it possible?

青柠芒果 2024-07-15 04:24:54

我建议,如果您的 DSP 代码中有任何需要优化的算法,那么您应该用汇编语言编写它们,而不是 C 或 C++。

一般来说,对于现代处理器和硬件,没有那么多场景需要或保证需要进行优化。 您是否确实发现了任何性能问题? 如果没有,那么最好坚持现有的。 在大多数简单算术情况下,不安全的 C# 不太可能比 C/C++ 慢很多。

您考虑过 C++/CLI 吗? 那么你就可以两全其美了。 如果需要的话,它甚至允许您使用内联汇编器。

I would suggest that if you have any algorithms in your DSP code that need to be optimised then you should really be writing them in assembly, not C or C++.

In general, with modern processors and hardware, there aren't that many scenarios that require or warrant the effort involved in optimisation. Have you actually identified any performance issues? If not then it's probably best to stick with what you have. Unsafe C# is unlikely to be significantly slower than C/C++ in most cases of simple arithmetic.

Have you considered C++/CLI? You could have the best of both worlds then. It would even allow you to use inline assembler if required.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文