以下问题 是相关的,但是答案是旧的,并且来自用户的评论 marc Glisse 表明,由于C ++ 17解决了这个问题,因此有新的方法可能无法充分讨论。
我试图使对齐的内存正常为SIMD工作,同时仍可以访问所有数据。
在英特尔上,如果我创建一个类型 __ M256
的浮点矢量,并将我的大小减少8倍,则可以使我对齐内存。
例如 std :: vector< __ M256> mvec_a((n*m)/8);
以稍微黑的方式,我可以将指针施加到向量元素的浮点,这使我可以访问单个浮点值。
取而代之的是,我更喜欢将正确对齐的 std :: vector< float>
,因此可以将其加载到 __ M256
和其他simd类型中而无需segfaulting。
我一直在研究 aligned_alloc 。
这可以给我一个正确对齐的C风格数组:
auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));
但是我不确定如何为 std :: vector&lt; float&gt;
做到这一点。给出 std :: vector&lt; float&gt;
marr_a
似乎是不可能的。
我已经看到一些建议,应该写自定义分配器< /a>,但这似乎是很多工作,也许使用现代C ++有更好的方法?
The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed.
I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.
On Intel, if I create a float vector of type __m256
, and reduce my size by a factor of 8, it gives me aligned memory.
E.g. std::vector<__m256> mvec_a((N*M)/8);
In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.
Instead, I would prefer to have an std::vector<float>
which is correctly aligned, and thus can be loaded into __m256
and other SIMD types without segfaulting.
I've been looking into aligned_alloc.
This can give me a C-style array that is correctly aligned:
auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));
However I'm unsure how to do this for std::vector<float>
. Giving the std::vector<float>
ownership of marr_a
doesn't seem to be possible.
I've seen some suggestions that I should write a custom allocator, but this seems like a lot of work, and perhaps with modern C++ there is a better way?
发布评论
评论(2)
STL容器采用分配模板参数,可用于对齐其内部缓冲区。指定的分配器类型必须至少实现 ,
DealLocate
和value_type
。与这些 /a>,这种分配器的实现避免了依赖平台的对齐的malloc调用。相反,它使用 c ++ 17 Aligned new new 操作员。
在这里是Godbolt上的完整示例。
然后可以像这样使用此分配器:
STL containers take an allocator template argument which can be used to align their internal buffers. The specified allocator type has to implement at least
allocate
,deallocate
, andvalue_type
.In contrast to these answers, this implementation of such an allocator avoids platform-dependent aligned malloc calls. Instead, it uses the C++17 aligned
new
operator.Here is the full example on godbolt.
This allocator can then be used like this:
标准C ++库中的所有容器,包括向量,都具有可选的模板参数,实现自己的工作并不是很多工作:
您将不得不编写一些实现您的代码分配器,但这并不比您已经编写的代码更多。如果您不需要pre-c ++ 17支持,则只需要实现分配()和 dealLocate()方法,就是这样。
All containers in the standard C++ library, including vectors, have an optional template parameter that specifies the container's allocator, and it is not really a lot of work to implement your own one:
You will have to write a little bit of code that implements your allocator, but it wouldn't be much more code than you already written. If you don't need pre-C++17 support you only need to implement the allocate() and deallocate() methods, that's it.