使用指向结构的指针数组,还是仅使用结构数组?
我正在为微控制器开发 C 语言的 FFT 算法,并且无法决定是将输入数据的实部和虚部仅存储在结构数组中,还是使用指向结构数组的指针。我面临着相互矛盾的要求,即代码必须在少量内存中运行,但又必须尽可能快。我相信指向结构的指针数组将具有更大的内存开销,但是我的代码中有一行基本上如下所示:
for (uint8_t i = 0; i < RECORD_SIZE; i++)
{
uint8_t decimateValue = fft_decimate(i);
fftData[i]->realPart = fftTempData[decimateValue]->realPart;
fftData[i]->imPart = fftTempData[decimateValue]->imPart;
}
我在想,如果我使用指向结构的指针数组,如上面的示例所示,编译后的代码会更快,因为它只是重新排列指针,而不是像结构数组实现那样实际复制两个数据结构之间的所有数据。如果上面的代码运行得尽可能快,我愿意牺牲一些额外的内存。感谢您的任何建议。
I'm working on a FFT algorithm in C for a microcontroller, and am having trouble deciding on whether to have the real and imaginary parts of the input data stored in just an array of structs, or use pointers to array of structs. I'm facing the conflicting requirements that that the code has to run in a tiny amount of memory, and yet also be as fast as possible. I believe the array of pointers to structs will have a somewhat larger memory overhead, but there's a line in my code basically like the following:
for (uint8_t i = 0; i < RECORD_SIZE; i++)
{
uint8_t decimateValue = fft_decimate(i);
fftData[i]->realPart = fftTempData[decimateValue]->realPart;
fftData[i]->imPart = fftTempData[decimateValue]->imPart;
}
I'm thinking that if I use an array of pointers to structs as in the above example that the compiled code will be faster as it is just reshuffling the pointers, rather than actually copying all the data between the two data structures as an array-of-structures implementation would. I'm willing to sacrifice some extra memory if the above section of code runs as fast as possible. Thanks for any advice.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
每次通过指针数组访问数据时,都会有两次内存访问。这通常会导致管道停顿,即使在微控制器上也是如此(除非它是一个非常小的没有管道的微控制器)。
然后你必须考虑数据的大小。指针有多大? 2字节? 4字节?结构有多大? 4字节? 8字节?
如果结构体的大小是指针的两倍,则对数据进行混洗的成本将是指针的一半。然而,以任何其他方式读取或修改数据将更加昂贵。所以这取决于你的程序做什么。如果您花费大量时间读取数据而只花费很少时间对其进行整理,请针对读取数据进行优化。其他人说得对——个人资料。确保在您的微控制器上进行分析,而不是在您的工作站上。
Every time you access data through an array of pointers, you have two memory accesses. This often comes with a pipeline stall, even on microcontrollers (unless it's a really small microcontroller with no pipeline).
Then you have to consider the size of the data. How big is a pointer? 2 bytes? 4 bytes? How big are the structs? 4 bytes? 8 bytes?
If the struct is twice as big as a pointer, shuffling the data will be half as expensive with pointers. However, reading or modifying the data in any other way will be more expensive. So it depends on what your program does. If you spend a lot of time reading the data and only a little time shuffling it, optimize for reading the data. Other people have it right -- profile. Make sure to profile on your microcontroller, not on your workstation.
如果您的结构体非常小,那么使用结构体数组并将它们打乱实际上会更快。如果您的结构很大,并且仅围绕指针进行洗牌,则此特定操作会更快。
等一下...再看一眼,您的代码中似乎没有围绕指针进行改组,而是正在访问这些指针引用的结构体的字段;实际上,您仍在对结构本身进行改组,而不是对指针进行改组。这将比移动指针慢,并且比仅移动结构慢,因为它必须取消引用指针,然后仍然移动结构。
If your structs are very small, it will actually be faster to have an array of structs and shuffle them around. If your structs are large, this specific action will be faster if you are only shuffling around pointers.
Wait a minute... on second glance, it appears in your code that you are not shuffling around pointers, but you are accessing fields of the structs that those pointers reference; in effect you are still shuffling the structs themselves, not the pointers. This is going to be slower than moving pointers and also slower than just moving structs since it has to dereference the pointers and then still move the struct anyway.
你说得对。指针数组会更快,但是内存使用上会有开销。如果有内存可以使用指针,请使用它们。
You're right. The array of pointers will be faster, but there will be an overhead in memory usage. If have memory to use the pointers, use them.
第一:这要看情况。轮廓。
缓存局部性将在这里占据主导地位。我希望结构体非常小(代表复数?)。在 FFT 中,我希望通过将实部和虚部存储在单独的数组中获得更多收益。
然后,您可以在 CPU 核心之间分配负载。
如果涉及更大的块(比如 1024 个样本块),我强烈怀疑改组指针会更有效。它还允许您更轻松地从多个线程处理相同(只读)数据。移动内存是使许多迭代器失效的一种特定方式,通常您希望任务(即线程)在数据的子范围上工作,即:它们拥有的只是一个迭代器子范围。
First: It depends. Profile.
Cache locality is going to reign here. I expect the structs to be very small (representing complex numbers?). In FFT I'd expect a lot more gain from storing the real and imaginary parts in separate arrays.
You could then split the load between CPU cores.
If it is about larger chunks (say 1024 sample blocks), I strongly suspect that shuffling pointers is way more efficient. It will also allow you to - much more easily - work on the same (readonly) data from several threads. Moving memory around is a certain way to invalidate a lot of iterators, and usually you want tasks (i.e. threads) to work on a subrange of your data, i.e.: all they have is an iterator subrange.