Visual Studio 编译器标志 /arch 和性能
我刚刚注意到,在我们的项目中,“启用增强指令集”标志未设置,可能只是一个疏忽。
在启用该标志之前,我想问是否有人看到启用它后有任何实际性能改进?
我想我们会看到我们的应用程序不断进行基于浮点的计算的一些改进,但这不是主要部分。
I just noticed that in our project have left the "Enable Enhanced Instruction Set" flag left unset, probably just an oversight.
Before enabling the flag I would like to ask if anyone have seen any real-world performance improvements enabling it ?
I guess we will see some improvement our application constantly do floating point based calucations, but its not a major part,.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
简而言之:此设置仅启用直接映射到 SSE 指令的某些内部函数。在普通的 C++ 程序中,您不会使用这些内部函数,因此此设置不会提高性能。
如果您需要更高的性能,您可以尝试找到一个编译器来重写您的代码以使用 SSE 指令(英特尔声称其编译器可以),但选择多核(使用 openMP 或 .net 4.0)或使用 GPU 可能更明智,比 SSE 更快、更灵活。
So in a nutshell: This setting only enables certain intrinsic functions that map directly on SSE instructions. In normal C++ programs you don't use these intrinsic functions, so this setting won't improve performance.
If you need more performance, you could try to find a compiler that rewrites your code to use SSE instructions (intel claims its compiler can), but its probably smarter to go for multicore (with openMP or .net 4.0), or use the GPU, which is faster and more flexible than SSE.
性能优势将取决于您的项目是否使用密集的数学计算。对于许多任务(网络、文本处理、数据管理)来说,情况并非如此,因为那里没有(或几乎没有)浮点运算。因此,根本不会有任何性能提升。
使用编译器生成的 SSE/SSE2 指令不会产生最高性能。首先,您无法控制实际的代码生成。在某些情况下,您需要在旧系统上使用旧版 (x87) 代码,并在新系统上使用支持 SSE/SSE2 的代码。您可能还想在最新的系统上利用 SSE3。为此,我建议使用 cpuid 指令检查处理器类型,然后切换到可以充分利用处理器功能的实现。然后,您可以在针对 SSE/SSE2 的实现中使用编译器内部函数。要针对 SSE3,您需要一个专用的库,我正在尝试在互联网上找到该库。
我相信,必须存在执行处理器功能分析并允许最佳代码切换的库。我也只是需要一些时间上网看看。
The performance benefit will depend on whether you project uses intensive mathematical computations. For many tasks (networking, text processing, data management) this simply isn't the case as no (or almost no) floating-point operations are used there. Hence, there will be no performance boost at all.
Using SSE/SSE2 instructions generated by the compiler would not generate top performance. First, you won't have any control on actual code generation. There are scenarios where you need to use legacy (x87) code on an old system and SSE/SSE2-enabled code on a new system. You might also want to take advantage of SSE3 on most newest systems. For that purpose, I'd recommend to check the processor type using the
cpuid
instruction and then switch to an implementation that could take most advantage of the processor capabilities. You can then use compiler intrinsics in the implementations targeting SSE/SSE2. To target SSE3, you'll need a dedicated library which I'm trying to locate on the internet.I believe, there must exist libraries that perform the analysis of processor capabilities and allow for optimal code switcing. I just need some time to look on the net also.