具有对齐成员的对象的动态分配 - 可能的解决方案?
我正在考虑使用 SSE 来加速我的项目中的一些代码。这通常需要我正在处理的数据进行 16 字节对齐。对于静态分配,我认为 __declspec(align(16)) 可以解决问题,但我的问题是:在进行动态分配时确保这种情况的最佳方法是什么?特别是在分配的对象不直接需要对齐但使用具有对齐要求的对象作为成员的情况下(从而更容易忘记确保其正确对齐)。我提出了以下解决方案:
始终假设任何潜在的非静态分配的数据都是未对齐的,并使用未对齐的加载指令。据我所知,这很慢,在这种情况下可能根本不值得为 SSE 烦恼。我可以实现它并测试它的性能,但我宁愿在投入这么多工作之前询问更好的解决方案,只是为了发现它不值得或者有其他解决方案。
要非常小心,仅使用
_aligned_malloc
/_aligned_free
来分配任何需要对齐的对象以及使用这些对象作为成员的任何对象。这可能很容易忘记,因此容易出错。全局重载
new
/delete
和/或创建自定义malloc
/free
函数来对齐内存,然后将它们用于一切。然而,从字面上对齐动态分配的所有内容可能不是最好的主意。使用重载的
new
/delete
运算符创建一个基类,然后确保任何需要对齐的类以及任何使用它们作为成员的类都会继承它。然后只需使用new
/delete
进行大多数/所有动态分配。可能比 2 更不容易出错。我没有想到或我不知道的其他方式?
选项 1.-3。可能不是最好的想法。 4. 呢?我提到的事情有错吗?关于这个主题的建议、意见、有用的链接?
先感谢您 :)
I'm considering using SSE to speed up some code in my project. This usually requires 16 byte alignment of data I'm working on. For static allocation I suppose __declspec(align(16))
solves the problem but my problem is: what's the best way to make sure that is the case when doing dynamic allocations? Especially in cases where allocated object does not directly require alignment but uses objects with alignment requirement as members (thus making it much easier to forget about making sure it's properly aligned). I came up with following solutions:
Always assume that any potentially non-statically allocated data is unaligned and use unaligned load instructions. From what I've read this is slow and it might be not worth to bother with SSE at all in this case. I can implement that and test how it performs but I'd rather ask about better solutions before I put so much work into it just to find out that it's not worth it or that there is another solution.
Be very careful and use only
_aligned_malloc
/_aligned_free
to allocate any object that requires alignment and any object that uses those as members. This is probably very easy to forget and thus error prone.Overload
new
/delete
globally and/or create custommalloc
/free
functions that align memory and then use those for everything. However it's probably not the best idea to literally align everything that is dynamically allocated.Create a base class with overloaded
new
/delete
operators then make sure that any class that requires alignment and any class that uses those as members inherits it. Then just usenew
/delete
for most/all dynamic allocations. Probably less error prone then 2.Some other way I didn't think of or I'm not aware of?
Options 1.-3. are a probably not the best ideas. What about 4.? Am I wrong about anything I mentioned? Suggestions, opinions, useful links on this topic?
Thank you in advance :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 Windows 上,malloc 是 16 字节对齐的 (msdn)。如果您的平台 malloc 的对齐要求较低,则需要对 SSE 使用的对象使用 malloc 的对齐版本。
编辑:如果您有需要 SSE 支持的特定对象类,您可以仅为该类重新定义 new/delete。
On Windows, malloc is 16-byte aligned (msdn). If your platform malloc has lower alignment requirements, you need to use aligned versions of malloc for objects used by SSE.
EDIT: If you have a specific class of objects that need SSE support you can redefine new/delete for that class only.
不确定这是否适合您的目的,但您可以使用 Doug Lea 的分配器 并定义 MALLOC_ALIGNMENT 宏来适合您的需要(最多 128 字节)。
您甚至不需要替换默认分配器 - 您应该能够仅使用 Doug Lea 特定的
dlmalloc
和dlfree
来满足您的 SSE 需求,并继续使用默认分配器其他一切的分配器。Not sure if this is practical for your purposes, but you could employ Doug Lea's allocator and define MALLOC_ALIGNMENT macro to suit your needs (up to 128bytes).
You don't even need to replace the default allocator - you should be able to use Doug Lea-specific
dlmalloc
anddlfree
for your SSE needs only, and continue to use default allocator for everything else.