向量的数据如何对齐?

发布于 2024-12-20 10:23:17 字数 109 浏览 4 评论 0 原文

如果我想使用 SSE 处理 std::vector 中的数据,我需要 16 字节对齐。我怎样才能做到这一点?我需要编写自己的分配器吗?或者默认分配器是否已与 16 字节边界对齐?

If I want to process data in a std::vector with SSE, I need 16 byte alignment. How can I achieve that? Do I need to write my own allocator? Or does the default allocator already align to 16 byte boundaries?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

孤君无依 2024-12-27 10:23:17

C++ 标准需要分配函数(malloc()operator new())来分配与任何标准类型适当对齐的内存。由于这些函数不接收对齐要求作为参数,因此实际上这意味着所有分配的对齐方式都是相同的,并且是具有最大对齐要求的标准类型,通常是 long double< /code> 和/或 long long (参见 提升 max_align 联合)。

向量指令(例如 SSE 和 AVX)比标准 C++ 分配函数具有更强的对齐要求(128 位访问需要 16 字节对齐,256 位访问需要 32 字节对齐)。可以使用 posix_memalign() 或 memalign() 来满足此类具有更强对齐要求的分配。


在 C++17 中,分配函数接受类型为 < 的附加参数a href="https://en.cppreference.com/w/cpp/memory/new/align_val_t" rel="noreferrer">std::align_val_t

您可以像这样使用它:

#include <immintrin.h>
#include <memory>
#include <new>

int main() {
    std::unique_ptr<__m256i[]> arr{new(std::align_val_t{alignof(__m256i)}) __m256i[32]};
}

此外,在 C++17 中,标准分配器已更新以尊重类型的对齐,因此您可以简单地执行以下操作:

#include <immintrin.h>
#include <vector>

int main() {
    std::vector<__m256i> arr2(32);
}

或者(C++11 中不涉及和支持堆分配):

#include <immintrin.h>
#include <array>

int main() {
    std::array<__m256i, 32> arr3;
}

C++ standard requires allocation functions (malloc() and operator new()) to allocate memory suitably aligned for any standard type. As these functions don't receive the alignment requirement as an argument, in practice it means that the alignment for all allocations is the same, and is that of a standard type with the largest alignment requirement, which often is long double and/or long long (see boost max_align union).

Vector instructions, such as SSE and AVX, have stronger alignment requirements (16-byte aligned for 128-bit access and 32-byte aligned for 256-bit access) than that provided by the standard C++ allocation functions. posix_memalign() or memalign() can be used to satisfy such allocations with stronger alignment requirements.


In C++17 the allocation functions accept an additional argument of type std::align_val_t.

You can make use of it like:

#include <immintrin.h>
#include <memory>
#include <new>

int main() {
    std::unique_ptr<__m256i[]> arr{new(std::align_val_t{alignof(__m256i)}) __m256i[32]};
}

Moreover, in C++17 the standard allocators have been updated to respect type's alignment, so you can simply do:

#include <immintrin.h>
#include <vector>

int main() {
    std::vector<__m256i> arr2(32);
}

Or (no heap allocation involved and supported in C++11):

#include <immintrin.h>
#include <array>

int main() {
    std::array<__m256i, 32> arr3;
}
原来分手还会想你 2024-12-27 10:23:17

您应该将自定义分配器与 std:: 容器一起使用,例如 vector。不记得谁写了下面的代码,但我使用了一段时间并且它似乎有效(您可能必须将 _aligned_malloc 更改为 _mm_malloc,具体取决于编译器/ platform):

#ifndef ALIGNMENT_ALLOCATOR_H
#define ALIGNMENT_ALLOCATOR_H

#include <stdlib.h>
#include <malloc.h>

template <typename T, std::size_t N = 16>
class AlignmentAllocator {
public:
  typedef T value_type;
  typedef std::size_t size_type;
  typedef std::ptrdiff_t difference_type;

  typedef T * pointer;
  typedef const T * const_pointer;

  typedef T & reference;
  typedef const T & const_reference;

  public:
  inline AlignmentAllocator () throw () { }

  template <typename T2>
  inline AlignmentAllocator (const AlignmentAllocator<T2, N> &) throw () { }

  inline ~AlignmentAllocator () throw () { }

  inline pointer adress (reference r) {
    return &r;
  }

  inline const_pointer adress (const_reference r) const {
    return &r;
  }

  inline pointer allocate (size_type n) {
     return (pointer)_aligned_malloc(n*sizeof(value_type), N);
  }

  inline void deallocate (pointer p, size_type) {
    _aligned_free (p);
  }

  inline void construct (pointer p, const value_type & wert) {
     new (p) value_type (wert);
  }

  inline void destroy (pointer p) {
    p->~value_type ();
  }

  inline size_type max_size () const throw () {
    return size_type (-1) / sizeof (value_type);
  }

  template <typename T2>
  struct rebind {
    typedef AlignmentAllocator<T2, N> other;
  };

  bool operator!=(const AlignmentAllocator<T,N>& other) const  {
    return !(*this == other);
  }

  // Returns true if and only if storage allocated from *this
  // can be deallocated from other, and vice versa.
  // Always returns true for stateless allocators.
  bool operator==(const AlignmentAllocator<T,N>& other) const {
    return true;
  }
};

#endif

像这样使用它(如果需要,将 16 更改为另一种对齐方式):

std::vector<T, AlignmentAllocator<T, 16> > bla;

但是,这只能确保 std::vector 使用的内存块是 16 字节对齐的。如果 sizeof(T) 不是 16 的倍数,则某些元素将不会对齐。根据您的数据类型,这可能不是问题。如果Tint(4字节),则仅加载索引为4倍数的元素。如果为double(8字节),则仅加载索引为4的倍数的元素。 2 的倍数等。

真正的问题是,如果您将类用作 T,在这种情况下,您必须在类本身中指定对齐要求(同样,根据编译器的不同,这可能会有所不同) ; 该示例适用于 GCC):

class __attribute__ ((aligned (16))) Foo {
    __attribute__ ((aligned (16))) double u[2];
};

我们快完成了!如果您使用 Visual C++(至少版本 2010),您将无法将 std::vector 与您指定对齐的类一起使用,因为 <代码>std::vector::调整大小。

编译时,如果出现以下错误:

C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector(870):
error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned

You will have to hack your stl::vector header file:

  1. Locate the vector header file [C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector]
  2. 找到 void resize( _Ty _Val ) 方法 [VC2010 上的第 870 行]
  3. 将其更改为 void resize( const _Ty&_Val)

You should use a custom allocator with std:: containers, such as vector. Can't remember who wrote the following one, but I used it for some time and it seems to work (you might have to change _aligned_malloc to _mm_malloc, depending on compiler/platform):

#ifndef ALIGNMENT_ALLOCATOR_H
#define ALIGNMENT_ALLOCATOR_H

#include <stdlib.h>
#include <malloc.h>

template <typename T, std::size_t N = 16>
class AlignmentAllocator {
public:
  typedef T value_type;
  typedef std::size_t size_type;
  typedef std::ptrdiff_t difference_type;

  typedef T * pointer;
  typedef const T * const_pointer;

  typedef T & reference;
  typedef const T & const_reference;

  public:
  inline AlignmentAllocator () throw () { }

  template <typename T2>
  inline AlignmentAllocator (const AlignmentAllocator<T2, N> &) throw () { }

  inline ~AlignmentAllocator () throw () { }

  inline pointer adress (reference r) {
    return &r;
  }

  inline const_pointer adress (const_reference r) const {
    return &r;
  }

  inline pointer allocate (size_type n) {
     return (pointer)_aligned_malloc(n*sizeof(value_type), N);
  }

  inline void deallocate (pointer p, size_type) {
    _aligned_free (p);
  }

  inline void construct (pointer p, const value_type & wert) {
     new (p) value_type (wert);
  }

  inline void destroy (pointer p) {
    p->~value_type ();
  }

  inline size_type max_size () const throw () {
    return size_type (-1) / sizeof (value_type);
  }

  template <typename T2>
  struct rebind {
    typedef AlignmentAllocator<T2, N> other;
  };

  bool operator!=(const AlignmentAllocator<T,N>& other) const  {
    return !(*this == other);
  }

  // Returns true if and only if storage allocated from *this
  // can be deallocated from other, and vice versa.
  // Always returns true for stateless allocators.
  bool operator==(const AlignmentAllocator<T,N>& other) const {
    return true;
  }
};

#endif

Use it like this (change the 16 to another alignment, if needed):

std::vector<T, AlignmentAllocator<T, 16> > bla;

This, however, only makes sure the memory block std::vector uses is 16-bytes aligned. If sizeof(T) is not a multiple of 16, some of your elements will not be aligned. Depending on your data-type, this might be a non-issue. If T is int (4 bytes), only load elements whose index is a multiple of 4. If it's double (8 bytes), only multiples of 2, etc.

The real issue is if you use classes as T, in which case you will have to specify your alignment requirements in the class itself (again, depending on compiler, this might be different; the example is for GCC):

class __attribute__ ((aligned (16))) Foo {
    __attribute__ ((aligned (16))) double u[2];
};

We're almost done! If you use Visual C++ (at least, version 2010), you won't be able to use an std::vector with classes whose alignment you specified, because of std::vector::resize.

When compiling, if you get the following error:

C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector(870):
error C2719: '_Val': formal parameter with __declspec(align('16')) won't be aligned

You will have to hack your stl::vector header file:

  1. Locate the vector header file [C:\Program Files\Microsoft Visual Studio 10.0\VC\include\vector]
  2. Locate the void resize( _Ty _Val ) method [line 870 on VC2010]
  3. Change it to void resize( const _Ty& _Val ).
傲世九天 2024-12-27 10:23:17

您可以使用 之前的建议编写自己的分配器.org/doc/libs/1_65_0/boost/align/aligned_allocator.hpp" rel="noreferrer">boost::alignment::aligned_allocator 对于 std::vector 如下:

#include <vector>
#include <boost/align/aligned_allocator.hpp>

template <typename T>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, 16>>;

Instead of writing your own allocator, as suggested before, you can use boost::alignment::aligned_allocator for std::vector like this:

#include <vector>
#include <boost/align/aligned_allocator.hpp>

template <typename T>
using aligned_vector = std::vector<T, boost::alignment::aligned_allocator<T, 16>>;
一梦等七年七年为一梦 2024-12-27 10:23:17

编写您自己的分配器。 allocatedeallocate 是重要的。这是一个例子:

pointer allocate( size_type size, const void * pBuff = 0 )
{
    char * p;

    int difference;

    if( size > ( INT_MAX - 16 ) )
        return NULL;

    p = (char*)malloc( size + 16 );

    if( !p )
        return NULL;

    difference = ( (-(int)p - 1 ) & 15 ) + 1;

    p += difference;
    p[ -1 ] = (char)difference;

    return (T*)p;
}

void deallocate( pointer p, size_type num )
{
    char * pBuffer = (char*)p;

    free( (void*)(((char*)p) - pBuffer[ -1 ] ) );
}

Write your own allocator. allocate and deallocate are the important ones. Here is one example:

pointer allocate( size_type size, const void * pBuff = 0 )
{
    char * p;

    int difference;

    if( size > ( INT_MAX - 16 ) )
        return NULL;

    p = (char*)malloc( size + 16 );

    if( !p )
        return NULL;

    difference = ( (-(int)p - 1 ) & 15 ) + 1;

    p += difference;
    p[ -1 ] = (char)difference;

    return (T*)p;
}

void deallocate( pointer p, size_type num )
{
    char * pBuffer = (char*)p;

    free( (void*)(((char*)p) - pBuffer[ -1 ] ) );
}
潜移默化 2024-12-27 10:23:17

简短回答:

如果 sizeof(T)*vector.size() > 16 然后是。
假设您的向量使用普通分配器

注意:只要alignof(std::max_align_t) >= 16,因为这是最大对齐。

长答案:

2017 年 8 月 25 日更新新标准 n4659

如果它针对任何大于 16 的值进行对齐,那么它也会针对 16 正确对齐。

6.11 对齐(第 4/5 段)

对齐方式表示为 std::size_t 类型的值。有效对齐仅包括基本类型的alignof表达式返回的值以及附加的实现定义的值集(可能为空)。每个对齐值应为 2 的非负整数幂。

对齐有从弱到强或更严格的对齐顺序。更严格的对齐具有更大的对齐值。满足对齐要求的地址也满足任何较弱的有效对齐要求。

new 和 new[] 返回对齐的值,以便对象正确对齐其大小:

8.3.4 新(第 17 段)

[ 注意:当分配函数返回非 null 值时,它必须是指向已为对象保留空间的存储块的指针。假设存储块已适当对齐并且具有所请求的大小。如果对象是数组,则创建的对象的地址不一定与块的地址相同。 ——尾注]

注意大多数系统都有最大对齐。动态分配的内存不需要对齐到大于此的值。

6.11 对齐(第 2 段)

基本对齐由小于或等于支持的最大对齐的对齐来表示
通过在所有上下文中的实现,它等于alignof(std::max_align_t) (21.2)。对齐方式
当一个类型用作完整对象的类型和用作
子对象的类型。

因此,只要分配的向量内存大于 16 字节,它就会在 16 字节边界上正确对齐。

Short Answer:

If sizeof(T)*vector.size() > 16 then Yes.
Assuming you vector uses normal allocators

Caveat: As long as alignof(std::max_align_t) >= 16 as this is the max alignment.

Long Answer:

Updated 25/Aug/2017 new standard n4659

If it is aligned for anything that is greater than 16 it is also aligned correctly for 16.

6.11 Alignment (Paragraph 4/5)

Alignments are represented as values of the type std::size_t. Valid alignments include only those values returned by an alignof expression for the fundamental types plus an additional implementation-defined set of values, which may be empty. Every alignment value shall be a non-negative integral power of two.

Alignments have an order from weaker to stronger or stricter alignments. Stricter alignments have larger alignment values. An address that satisfies an alignment requirement also satisfies any weaker valid alignment requirement.

new and new[] return values that are aligned so that objects are correctly aligned for their size:

8.3.4 New (paragraph 17)

[ Note: when the allocation function returns a value other than null, it must be a pointer to a block of storage in which space for the object has been reserved. The block of storage is assumed to be appropriately aligned and of the requested size. The address of the created object will not necessarily be the same as that of the block if the object is an array. — end note ]

Note most systems have a maximum alignment. Dynamically allocated memory does not need to be aligned to a value greater than this.

6.11 Alignment (paragraph 2)

A fundamental alignment is represented by an alignment less than or equal to the greatest alignment supported
by the implementation in all contexts, which is equal to alignof(std::max_align_t) (21.2). The alignment
required for a type might be different when it is used as the type of a complete object and when it is used as
the type of a subobject.

Thus as long as your vector memory allocated is greater than 16 bytes it will be correctly aligned on 16 byte boundaries.

琴流音 2024-12-27 10:23:17

对一个过时(但重要)问题的当代回答。

正如其他人所说,立即想到编写自己的 Allocator 类[模板]。从 C++11 到 C++17,实现主要限于(按标准)使用 alignas 和放置 new。 C++17 提升了 C11 的aligned_alloc,这很方便。此外,C++17 的 std::pmr 命名空间(标头 )引入了 polymorphic_allocator 类模板和 memory_resource 多态分配的抽象接口,很大程度上受到 Boost 的启发。除了允许真正通用的动态代码之外,这些代码已被证明在某些情况下可以提高速度;在这种情况下,您的 SIMD 代码的性能会更好。

Contemporary answer to a dated (but important) question.

Writing your own Allocator class [template] immediately comes to mind, as said by others. Since C++11 and until C++17, an implementation would be mostly limited (by standard) to using alignas and placement new. C++17 lifts C11's aligned_alloc which is convenient. Furthermore, C++17's std::pmr namespace (header <memory_resource>) introduces the polymorphic_allocator class template and the memory_resource abstract interface for polymorphic allocations, heavily inspired by Boost. Aside from allowing for truly generic, dynamic code, these have been shown to offer speed improvements in some cases; in which case, your SIMD code will perform even better.

单身狗的梦 2024-12-27 10:23:17

该标准要求 newnew[] 返回与任何数据类型对齐的数据,其中应包括 SSE。 MSVC 是否真正遵循该规则是另一个问题。

The Standard mandates that new and new[] return data aligned for any data type, which should include SSE. Whether or not MSVC actually follows that rule is another question.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文