使std ::矢量分配对齐记忆的现代方法

发布于 2025-02-12 14:52:01 字数 1409 浏览 3 评论 0 原文

以下问题 是相关的,但是答案是旧的,并且来自用户的评论 marc Glisse 表明,由于C ++ 17解决了这个问题,因此有新的方法可能无法充分讨论。

我试图使对齐的内存正常为SIMD工作,同时仍可以访问所有数据。

在英特尔上,如果我创建一个类型 __ M256 的浮点矢量,并将我的大小减少8倍,则可以使我对齐内存。

例如 std :: vector< __ M256> mvec_a((n*m)/8);

以稍微黑的方式,我可以将指针施加到向量元素的浮点,这使我可以访问单个浮点值。

取而代之的是,我更喜欢将正确对齐的 std :: vector< float> ,因此可以将其加载到 __ M256 和其他simd类型中而无需segfaulting。

我一直在研究 aligned_alloc

这可以给我一个正确对齐的C风格数组:

auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));

但是我不确定如何为 std :: vector&lt; float&gt; 做到这一点。给出 std :: vector&lt; float&gt; marr_a 似乎是不可能的

我已经看到一些建议,应该写自定义分配器< /a>,但这似乎是很多工作,也许使用现代C ++有更好的方法?

The following question is related, however answers are old, and comment from user Marc Glisse suggests there are new approaches since C++17 to this problem that might not be adequately discussed.

I'm trying to get aligned memory working properly for SIMD, while still having access to all of the data.

On Intel, if I create a float vector of type __m256, and reduce my size by a factor of 8, it gives me aligned memory.

E.g. std::vector<__m256> mvec_a((N*M)/8);

In a slightly hacky way, I can cast pointers to vector elements to float, which allows me to access individual float values.

Instead, I would prefer to have an std::vector<float> which is correctly aligned, and thus can be loaded into __m256 and other SIMD types without segfaulting.

I've been looking into aligned_alloc.

This can give me a C-style array that is correctly aligned:

auto align_sz = static_cast<std::size_t> (32);
float* marr_a = (float*)aligned_alloc(align_sz, N*M*sizeof(float));

However I'm unsure how to do this for std::vector<float>. Giving the std::vector<float> ownership of marr_a doesn't seem to be possible.

I've seen some suggestions that I should write a custom allocator, but this seems like a lot of work, and perhaps with modern C++ there is a better way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

云巢 2025-02-19 14:52:01

STL容器采用分配模板参数,可用于对齐其内部缓冲区。指定的分配器类型必须至少实现 DealLocate value_type

这些 /a>,这种分配器的实现避免了依赖平台的对齐的malloc调用。相反,它使用 c ++ 17 Aligned new new 操作员

在这里是Godbolt上的完整示例。

#include <limits>
#include <new>

/**
 * Returns aligned pointers when allocations are requested. Default alignment
 * is 64B = 512b, sufficient for AVX-512 and most cache line sizes.
 *
 * @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2.
 */
template<typename    ElementType,
         std::size_t ALIGNMENT_IN_BYTES = 64>
class AlignedAllocator
{
private:
    static_assert(
        ALIGNMENT_IN_BYTES >= alignof( ElementType ),
        "Beware that types like int have minimum alignment requirements "
        "or access will result in crashes."
    );

public:
    using value_type = ElementType;
    static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };

    /**
     * This is only necessary because AlignedAllocator has a second template
     * argument for the alignment that will make the default
     * std::allocator_traits implementation fail during compilation.
     * @see https://stackoverflow.com/a/48062758/2191065
     */
    template<class OtherElementType>
    struct rebind
    {
        using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;
    };

public:
    constexpr AlignedAllocator() noexcept = default;

    constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;

    template<typename U>
    constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept
    {}

    [[nodiscard]] ElementType*
    allocate( std::size_t nElementsToAllocate )
    {
        if ( nElementsToAllocate
             > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {
            throw std::bad_array_new_length();
        }

        auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );
        return reinterpret_cast<ElementType*>(
            ::operator new[]( nBytesToAllocate, ALIGNMENT ) );
    }

    void
    deallocate(                  ElementType* allocatedPointer,
                [[maybe_unused]] std::size_t  nBytesAllocated )
    {
        /* According to the C++20 draft n4868 § 17.6.3.3, the delete operator
         * must be called with the same alignment argument as the new expression.
         * The size argument can be omitted but if present must also be equal to
         * the one used in new. */
        ::operator delete[]( allocatedPointer, ALIGNMENT );
    }
};

然后可以像这样使用此分配器:

#include <iostream>
#include <stdexcept>
#include <vector>

template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64>
using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >;

int
main()
{
    AlignedVector<int, 1024> buffer( 3333 );
    if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024 != 0 ) {
        std::cerr << "Vector buffer is not aligned!\n";
        throw std::logic_error( "Faulty implementation!" );
    }

    std::cout << "Successfully allocated an aligned std::vector.\n";
    return 0;
}

STL containers take an allocator template argument which can be used to align their internal buffers. The specified allocator type has to implement at least allocate, deallocate, and value_type.

In contrast to these answers, this implementation of such an allocator avoids platform-dependent aligned malloc calls. Instead, it uses the C++17 aligned new operator.

Here is the full example on godbolt.

#include <limits>
#include <new>

/**
 * Returns aligned pointers when allocations are requested. Default alignment
 * is 64B = 512b, sufficient for AVX-512 and most cache line sizes.
 *
 * @tparam ALIGNMENT_IN_BYTES Must be a positive power of 2.
 */
template<typename    ElementType,
         std::size_t ALIGNMENT_IN_BYTES = 64>
class AlignedAllocator
{
private:
    static_assert(
        ALIGNMENT_IN_BYTES >= alignof( ElementType ),
        "Beware that types like int have minimum alignment requirements "
        "or access will result in crashes."
    );

public:
    using value_type = ElementType;
    static std::align_val_t constexpr ALIGNMENT{ ALIGNMENT_IN_BYTES };

    /**
     * This is only necessary because AlignedAllocator has a second template
     * argument for the alignment that will make the default
     * std::allocator_traits implementation fail during compilation.
     * @see https://stackoverflow.com/a/48062758/2191065
     */
    template<class OtherElementType>
    struct rebind
    {
        using other = AlignedAllocator<OtherElementType, ALIGNMENT_IN_BYTES>;
    };

public:
    constexpr AlignedAllocator() noexcept = default;

    constexpr AlignedAllocator( const AlignedAllocator& ) noexcept = default;

    template<typename U>
    constexpr AlignedAllocator( AlignedAllocator<U, ALIGNMENT_IN_BYTES> const& ) noexcept
    {}

    [[nodiscard]] ElementType*
    allocate( std::size_t nElementsToAllocate )
    {
        if ( nElementsToAllocate
             > std::numeric_limits<std::size_t>::max() / sizeof( ElementType ) ) {
            throw std::bad_array_new_length();
        }

        auto const nBytesToAllocate = nElementsToAllocate * sizeof( ElementType );
        return reinterpret_cast<ElementType*>(
            ::operator new[]( nBytesToAllocate, ALIGNMENT ) );
    }

    void
    deallocate(                  ElementType* allocatedPointer,
                [[maybe_unused]] std::size_t  nBytesAllocated )
    {
        /* According to the C++20 draft n4868 § 17.6.3.3, the delete operator
         * must be called with the same alignment argument as the new expression.
         * The size argument can be omitted but if present must also be equal to
         * the one used in new. */
        ::operator delete[]( allocatedPointer, ALIGNMENT );
    }
};

This allocator can then be used like this:

#include <iostream>
#include <stdexcept>
#include <vector>

template<typename T, std::size_t ALIGNMENT_IN_BYTES = 64>
using AlignedVector = std::vector<T, AlignedAllocator<T, ALIGNMENT_IN_BYTES> >;

int
main()
{
    AlignedVector<int, 1024> buffer( 3333 );
    if ( reinterpret_cast<std::uintptr_t>( buffer.data() ) % 1024 != 0 ) {
        std::cerr << "Vector buffer is not aligned!\n";
        throw std::logic_error( "Faulty implementation!" );
    }

    std::cout << "Successfully allocated an aligned std::vector.\n";
    return 0;
}
情定在深秋 2025-02-19 14:52:01

标准C ++库中的所有容器,包括向量,都具有可选的模板参数,实现自己的工作并不是很多工作:

class my_awesome_allocator {
};

std::vector<float, my_awesome_allocator> awesomely_allocated_vector;

您将不得不编写一些实现您的代码分配器,但这并不比您已经编写的代码更多。如果您不需要pre-c ++ 17支持,则只需要实现分配() dealLocate()方法,就是这样。

All containers in the standard C++ library, including vectors, have an optional template parameter that specifies the container's allocator, and it is not really a lot of work to implement your own one:

class my_awesome_allocator {
};

std::vector<float, my_awesome_allocator> awesomely_allocated_vector;

You will have to write a little bit of code that implements your allocator, but it wouldn't be much more code than you already written. If you don't need pre-C++17 support you only need to implement the allocate() and deallocate() methods, that's it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文