STL向量具有未初始化的存储？

发布于 2024-07-06 07:47:20 字数 661 浏览 5 评论 0原文

我正在编写一个内部循环，需要将 struct 放置在连续存储中。我不知道提前会有多少个struct。我的问题是STL的vector将其值初始化为0，所以无论我做什么，我都会产生初始化的成本加上设置struct成员的成本他们的价值观。

有什么方法可以阻止初始化，或者是否有一个类似 STL 的容器，具有可调整大小的连续存储和未初始化的元素？

（我确信这部分代码需要优化，并且我确信初始化是一个巨大的成本。）

此外，请参阅下面我的评论以了解有关初始化何时发生的说明。

一些代码：

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size()
    memberVector.resize(mvSize + count); // causes 0-initialization

    for (int i = 0; i < count; ++i) {
        memberVector[mvSize + i].d1 = data1[i];
        memberVector[mvSize + i].d2 = data2[i];
    }
}

原文

I'm writing an inner loop that needs to place structs in contiguous storage. I don't know how many of these structs there will be ahead of time. My problem is that STL's vector initializes its values to 0, so no matter what I do, I incur the cost of the initialization plus the cost of setting the struct's members to their values.

Is there any way to prevent the initialization, or is there an STL-like container out there with resizeable contiguous storage and uninitialized elements?

(I'm certain that this part of the code needs to be optimized, and I'm certain that the initialization is a significant cost.)

Also, see my comments below for a clarification about when the initialization occurs.

SOME CODE:

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size()
    memberVector.resize(mvSize + count); // causes 0-initialization

    for (int i = 0; i < count; ++i) {
        memberVector[mvSize + i].d1 = data1[i];
        memberVector[mvSize + i].d2 = data2[i];
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

伪心 2024-07-13 07:47:20

std::vector 必须以某种方式初始化数组中的值，这意味着必须调用某些构造函数（或复制构造函数）。如果您要访问数组的未初始化部分，就像它已初始化一样，则 vector （或任何容器类）的行为是未定义的。

最好的方法是使用 reserve() 和 push_back()，以便使用复制构造函数，避免默认构造。

使用您的示例代码：

struct YourData {
    int d1;
    int d2;
    YourData(int v1, int v2) : d1(v1), d2(v2) {}
};

std::vector<YourData> memberVector;

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size();

    // Does not initialize the extra elements
    memberVector.reserve(mvSize + count);

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a temporary.
        memberVector.push_back(YourData(data1[i], data2[i]));
    }
}

像这样调用 reserve() （或 resize()）的唯一问题是您最终可能会比您更频繁地调用复制构造函数需要。如果您可以对数组的最终大小做出良好的预测，最好在开始时 reserve() 一次空间。如果您不知道最终大小，至少平均副本数量会最少。

在当前版本的 C++ 中，内部循环效率有点低，因为临时值在堆栈上构造，然后复制构造到向量内存，最后临时值被销毁。然而，C++ 的下一个版本有一个称为 R 值引用 (T&&) 的功能，这将有所帮助。

std::vector 提供的接口不允许使用其他选项，即使用一些类似工厂的类来构造默认值以外的值。下面是这个模式在 C++ 中实现的粗略示例：

template <typename T>
class my_vector_replacement {

    // ...

    template <typename F>
    my_vector::push_back_using_factory(F factory) {
        // ... check size of array, and resize if needed.

        // Copy construct using placement new,
        new(arrayData+end) T(factory())
        end += sizeof(T);
    }

    char* arrayData;
    size_t end; // Of initialized data in arrayData
};

// One of many possible implementations
struct MyFactory {
    MyFactory(int* p1, int* p2) : d1(p1), d2(p2) {}
    YourData operator()() const {
        return YourData(*d1,*d2);
    }
    int* d1;
    int* d2;
};

void GetsCalledALot(int* data1, int* data2, int count) {
    // ... Still will need the same call to a reserve() type function.

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a factory
        memberVector.push_back_using_factory(MyFactory(data1+i, data2+i));
    }
}

这样做确实意味着您必须创建自己的向量类。在这种情况下，它也使本应简单的示例变得复杂。但有时使用像这样的工厂函数可能会更好，例如，如果插入以其他值为条件，那么您就必须无条件地构造一些昂贵的临时函数，即使实际上并不需要它。

std::vector must initialize the values in the array somehow, which means some constructor (or copy-constructor) must be called. The behavior of vector (or any container class) is undefined if you were to access the uninitialized section of the array as if it were initialized.

The best way is to use reserve() and push_back(), so that the copy-constructor is used, avoiding default-construction.

Using your example code:

struct YourData {
    int d1;
    int d2;
    YourData(int v1, int v2) : d1(v1), d2(v2) {}
};

std::vector<YourData> memberVector;

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size();

    // Does not initialize the extra elements
    memberVector.reserve(mvSize + count);

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a temporary.
        memberVector.push_back(YourData(data1[i], data2[i]));
    }
}

The only problem with calling reserve() (or resize()) like this is that you may end up invoking the copy-constructor more often than you need to. If you can make a good prediction as to the final size of the array, it's better to reserve() the space once at the beginning. If you don't know the final size though, at least the number of copies will be minimal on average.

In the current version of C++, the inner loop is a bit inefficient as a temporary value is constructed on the stack, copy-constructed to the vectors memory, and finally the temporary is destroyed. However the next version of C++ has a feature called R-Value references (T&&) which will help.

The interface supplied by std::vector does not allow for another option, which is to use some factory-like class to construct values other than the default. Here is a rough example of what this pattern would look like implemented in C++:

template <typename T>
class my_vector_replacement {

    // ...

    template <typename F>
    my_vector::push_back_using_factory(F factory) {
        // ... check size of array, and resize if needed.

        // Copy construct using placement new,
        new(arrayData+end) T(factory())
        end += sizeof(T);
    }

    char* arrayData;
    size_t end; // Of initialized data in arrayData
};

// One of many possible implementations
struct MyFactory {
    MyFactory(int* p1, int* p2) : d1(p1), d2(p2) {}
    YourData operator()() const {
        return YourData(*d1,*d2);
    }
    int* d1;
    int* d2;
};

void GetsCalledALot(int* data1, int* data2, int count) {
    // ... Still will need the same call to a reserve() type function.

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a factory
        memberVector.push_back_using_factory(MyFactory(data1+i, data2+i));
    }
}

Doing this does mean you have to create your own vector class. In this case it also complicates what should have been a simple example. But there may be times where using a factory function like this is better, for instance if the insert is conditional on some other value, and you would have to otherwise unconditionally construct some expensive temporary even if it wasn't actually needed.

STL向量具有未初始化的存储？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（17）

关于作者

相关话题

热门标签

推荐作者

花开雨落又逢春i

全部不再

戏蝶舞

qq_YlUtDo

旧人

qq_vTdajf

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。