对齐和性能
用于比较 char *
和 memcmp
的例程 strcmp
对于其他所有内容,它们在以某种方式对齐的内存块(在 x86_64 上)上运行速度是否更快(如何?)? libc 是否使用 SSE
来执行此例程?
Routines strcmp
for comparing char *
and memcmp
for everything else, do they run faster on memory block (on x86_64) which is somehow aligned (how?)? Does libc use SSE
for this routines?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这取决于对齐重要或 SIMD 指令可用的架构,通常例程将在前导字节上操作,然后执行数据允许的尽可能多的宽对齐操作,然后在尾随字节上操作。
前导字节和尾随字节是否对数据处理时间有显着影响可以通过实验来确定。
It depends, but on architectures where alignment matters or where SIMD instructions are available, typically the routines will operate on leading bytes, then do as many wide aligned operations as the data allows, then operate on trailing bytes.
Whether the leading and trailing bytes are contributing significantly to the processing time for your data can be determined by experiment.
如果您担心比较的性能,您应该看看著名的 Boyer-摩尔算法和这篇文章来自 GNU Grep 作者 Mike Haertel。
他解释了如何能够快速地在数据块中搜索某些内容。
他的总结非常清楚要做什么:
If you worry about performance for comparison, you should take a look at well-known Boyer-Moore alogrithm and this post from GNU Grep author, Mike Haertel.
He explains how one can manage to be really fast about searching something in a data block.
His summary is quite clear about what to do :