“动态存储”与memcpy
我正在使用一个库,该库使用“memcpy”来模拟可直接访问的动态存储数据结构。值得注意的是,我正在研究小数据集产生的数值运算。如何确定链表在效率方面是否比 memcpy 更合适?
从我在文献和网上发现的情况来看,基准被认为是相当邪恶的。
我正在处理大约 30 个小尺寸元素(根据经验)(3 个分量向量:空间中的点)。
在这种情况下你会使用什么:
1)memcpy +直接访问 2)链表+线性搜索时间
谢谢!
I'm working with a library that uses "memcpy" to simulate dynamical storage data structure with direct access. It's important to note that I'm working on numerical operations that result with small data sets. How can I determine if a linked list would be more appropriate than memcpy in terms of efficiency?
From what I've found in the literature and online, benchmarks are considered quite evil.
I'm dealing with around 30 elements (from experience) of small size (3 component vectors : points in space).
What would you use in this case:
1) memcpy + direct access
2) linked list + linear search time
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果你真的那么关心性能,你应该测量它,即对你的代码进行基准测试(这不是邪恶的,这是常见的做法;邪恶的是过早的优化)。
但请注意,至少对于 GNU/Linux 上的最新 GCC(例如 GCC 4.6),并且至少通过 -O2、
memcpy
和 -O2 进行优化时。memset
半神奇地(通过__builtin_memcpy
或类似技巧)转换为非常高效的代码。对于大量的小数据元素,我猜想缓存考虑因素是性能的主导因素。
If you really care that much about performance, you should measure it, i.e. benchmark your code (this is not evil, it is common practice; what is evil is premature optimization).
But be aware that, at least with recent GCC (e.g. GCC 4.6) on GNU/Linux and when optimized by at least -O2,
memcpy
&memset
are semi-magically (thru__builtin_memcpy
or similar tricks) transformed to quite efficient code.And for large set of small data elements, I would guess that caching consideration are dominant w.r.t. performance.
分析或基准测试不是邪恶的。它们是找出更多选项中哪一个更有效的最佳方法。随着当今优化器的“智能”,反直觉的选择实际上可能被证明是最有效的。我建议您运行一个基准测试并根据该基准进行选择。唯一可能出错的方法是不提供有效的输入,这涵盖了大多数情况。
Profiling, or benchmarks, are not evil. They are the best way to figure out which of more options is more efficient. With the "smartness" of optmizers nowadays, the counter-intuitive option might actually prove to be the most efficient. I suggest you run a benchmark and choose based on that. The only way you can go wrong is not providing valid input, that covers most cases.
当您处理如此少量的数据时,您为什么要担心?
基准测试仅适用于大量计算 - 以限制操作系统的其他影响。
As you a dealing with such a small amount of data - why are you worrying?
Benchmarking only really works with lots of computations - to limit the other effects from the OS.
对于如此小的数据集(30 * 12 字节),所有数据都位于缓存行内。所以我确信它会比列表更快。如果您使用列表,您仍然需要分配一块内存,在大多数操作系统上,这比复制这么小的一块内存需要更多的时间。
With a so small dataset ( 30 * 12 bytes ), all your data is inside a cache line. So I4m sure it will be quicker than a list. If you use a list, you still need to allocate a piece of memory, which, on most OS's takes more time than copying such a small piece of memory.