STL向量具有未初始化的存储?

发布于 2024-07-06 07:47:20 字数 661 浏览 5 评论 0原文

我正在编写一个内部循环,需要将 struct 放置在连续存储中。 我不知道提前会有多少个struct。 我的问题是STL的vector将其值初始化为0,所以无论我做什么,我都会产生初始化的成本加上设置struct成员的成本他们的价值观。

有什么方法可以阻止初始化,或者是否有一个类似 STL 的容器,具有可调整大小的连续存储和未初始化的元素?

(我确信这部分代码需要优化,并且我确信初始化是一个巨大的成本。)

此外,请参阅下面我的评论以了解有关初始化何时发生的说明。

一些代码:

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size()
    memberVector.resize(mvSize + count); // causes 0-initialization

    for (int i = 0; i < count; ++i) {
        memberVector[mvSize + i].d1 = data1[i];
        memberVector[mvSize + i].d2 = data2[i];
    }
}

I'm writing an inner loop that needs to place structs in contiguous storage. I don't know how many of these structs there will be ahead of time. My problem is that STL's vector initializes its values to 0, so no matter what I do, I incur the cost of the initialization plus the cost of setting the struct's members to their values.

Is there any way to prevent the initialization, or is there an STL-like container out there with resizeable contiguous storage and uninitialized elements?

(I'm certain that this part of the code needs to be optimized, and I'm certain that the initialization is a significant cost.)

Also, see my comments below for a clarification about when the initialization occurs.

SOME CODE:

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size()
    memberVector.resize(mvSize + count); // causes 0-initialization

    for (int i = 0; i < count; ++i) {
        memberVector[mvSize + i].d1 = data1[i];
        memberVector[mvSize + i].d2 = data2[i];
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(17

伪心 2024-07-13 07:47:20

std::vector 必须以某种方式初始化数组中的值,这意味着必须调用某些构造函数(或复制构造函数)。 如果您要访问数组的未初始化部分,就像它已初始化一样,则 vector (或任何容器类)的行为是未定义的。

最好的方法是使用 reserve()push_back(),以便使用复制构造函数,避免默认构造。

使用您的示例代码:

struct YourData {
    int d1;
    int d2;
    YourData(int v1, int v2) : d1(v1), d2(v2) {}
};

std::vector<YourData> memberVector;

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size();

    // Does not initialize the extra elements
    memberVector.reserve(mvSize + count);

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a temporary.
        memberVector.push_back(YourData(data1[i], data2[i]));
    }
}

像这样调用 reserve() (或 resize())的唯一问题是您最终可能会比您更频繁地调用复制构造函数需要。 如果您可以对数组的最终大小做出良好的预测,最好在开始时 reserve() 一次空间。 如果您不知道最终大小,至少平均副本数量会最少。

在当前版本的 C++ 中,内部循环效率有点低,因为临时值在堆栈上构造,然后复制构造到向量内存,最后临时值被销毁。 然而,C++ 的下一个版本有一个称为 R 值引用 (T&&) 的功能,这将有所帮助。

std::vector 提供的接口不允许使用其他选项,即使用一些类似工厂的类来构造默认值以外的值。 下面是这个模式在 C++ 中实现的粗略示例:

template <typename T>
class my_vector_replacement {

    // ...

    template <typename F>
    my_vector::push_back_using_factory(F factory) {
        // ... check size of array, and resize if needed.

        // Copy construct using placement new,
        new(arrayData+end) T(factory())
        end += sizeof(T);
    }

    char* arrayData;
    size_t end; // Of initialized data in arrayData
};

// One of many possible implementations
struct MyFactory {
    MyFactory(int* p1, int* p2) : d1(p1), d2(p2) {}
    YourData operator()() const {
        return YourData(*d1,*d2);
    }
    int* d1;
    int* d2;
};

void GetsCalledALot(int* data1, int* data2, int count) {
    // ... Still will need the same call to a reserve() type function.

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a factory
        memberVector.push_back_using_factory(MyFactory(data1+i, data2+i));
    }
}

这样做确实意味着您必须创建自己的向量类。 在这种情况下,它也使本应简单的示例变得复杂。 但有时使用像这样的工厂函数可能会更好,例如,如果插入以其他值为条件,那么您就必须无条件地构造一些昂贵的临时函数,即使实际上并不需要它。

std::vector must initialize the values in the array somehow, which means some constructor (or copy-constructor) must be called. The behavior of vector (or any container class) is undefined if you were to access the uninitialized section of the array as if it were initialized.

The best way is to use reserve() and push_back(), so that the copy-constructor is used, avoiding default-construction.

Using your example code:

struct YourData {
    int d1;
    int d2;
    YourData(int v1, int v2) : d1(v1), d2(v2) {}
};

std::vector<YourData> memberVector;

void GetsCalledALot(int* data1, int* data2, int count) {
    int mvSize = memberVector.size();

    // Does not initialize the extra elements
    memberVector.reserve(mvSize + count);

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a temporary.
        memberVector.push_back(YourData(data1[i], data2[i]));
    }
}

The only problem with calling reserve() (or resize()) like this is that you may end up invoking the copy-constructor more often than you need to. If you can make a good prediction as to the final size of the array, it's better to reserve() the space once at the beginning. If you don't know the final size though, at least the number of copies will be minimal on average.

In the current version of C++, the inner loop is a bit inefficient as a temporary value is constructed on the stack, copy-constructed to the vectors memory, and finally the temporary is destroyed. However the next version of C++ has a feature called R-Value references (T&&) which will help.

The interface supplied by std::vector does not allow for another option, which is to use some factory-like class to construct values other than the default. Here is a rough example of what this pattern would look like implemented in C++:

template <typename T>
class my_vector_replacement {

    // ...

    template <typename F>
    my_vector::push_back_using_factory(F factory) {
        // ... check size of array, and resize if needed.

        // Copy construct using placement new,
        new(arrayData+end) T(factory())
        end += sizeof(T);
    }

    char* arrayData;
    size_t end; // Of initialized data in arrayData
};

// One of many possible implementations
struct MyFactory {
    MyFactory(int* p1, int* p2) : d1(p1), d2(p2) {}
    YourData operator()() const {
        return YourData(*d1,*d2);
    }
    int* d1;
    int* d2;
};

void GetsCalledALot(int* data1, int* data2, int count) {
    // ... Still will need the same call to a reserve() type function.

    // Note: consider using std::generate_n or std::copy instead of this loop.
    for (int i = 0; i < count; ++i) {
        // Copy construct using a factory
        memberVector.push_back_using_factory(MyFactory(data1+i, data2+i));
    }
}

Doing this does mean you have to create your own vector class. In this case it also complicates what should have been a simple example. But there may be times where using a factory function like this is better, for instance if the insert is conditional on some other value, and you would have to otherwise unconditionally construct some expensive temporary even if it wasn't actually needed.

场罚期间 2024-07-13 07:47:20

在 C++11(和 boost)中,您可以使用 unique_ptr 的数组版本来分配未初始化的数组。 这不完全是一个 stl 容器,但仍然是内存管理的并且是 C++ 风格的,这对于许多应用程序来说已经足够了。

auto my_uninit_array = std::unique_ptr<mystruct[]>(new mystruct[count]);

In C++11 (and boost) you can use the array version of unique_ptr to allocate an uninitialized array. This isn't quite an stl container, but is still memory managed and C++-ish which will be good enough for many applications.

auto my_uninit_array = std::unique_ptr<mystruct[]>(new mystruct[count]);
爱的十字路口 2024-07-13 07:47:20

C++0x 向 vector 添加了一个新的成员函数模板 emplace_back(它依赖于可变参数模板和完美转发),完全摆脱了任何临时变量:

memberVector.emplace_back(data1[i], data2[i]);

C++0x adds a new member function template emplace_back to vector (which relies on variadic templates and perfect forwarding) that gets rid of any temporaries entirely:

memberVector.emplace_back(data1[i], data2[i]);
嗫嚅 2024-07-13 07:47:20

为了澄清reserve()响应:您需要将reserve()与push_back()结合使用。 这样,就不会为每个元素调用默认构造函数,而是调用复制构造函数。 您仍然会受到在堆栈上设置结构然后将其复制到向量的惩罚。 另一方面,如果您使用

vect.push_back(MyStruct(fieldValue1, fieldValue2))

编译器,则可能会直接在属于向量的内存中构造新实例。 这取决于优化器的智能程度。 您需要检查生成的代码才能找到答案。

To clarify on reserve() responses: you need to use reserve() in conjunction with push_back(). This way, the default constructor is not called for each element, but rather the copy constructor. You still incur the penalty of setting up your struct on stack, and then copying it to the vector. On the other hand, it's possible that if you use

vect.push_back(MyStruct(fieldValue1, fieldValue2))

the compiler will construct the new instance directly in the memory thatbelongs to the vector. It depends on how smart the optimizer is. You need to check the generated code to find out.

再见回来 2024-07-13 07:47:20

您可以使用 boost::noinit_adaptor默认初始化< /em> 新元素(内置类型没有初始化):

std::vector<T, boost::noinit_adaptor<std::allocator<T>> memberVector;

只要您不将初始化器传递到 resize 中,它就会默认初始化 新元素。

You can use boost::noinit_adaptor to default initialize new elements (which is no initialization for built-in types):

std::vector<T, boost::noinit_adaptor<std::allocator<T>> memberVector;

As long as you don't pass an initializer into resize, it default initializes the new elements.

2024-07-13 07:47:20

所以问题来了,resize正在调用insert,它正在从默认构造的元素为每个新添加的元素进行复制构造。 为了使其成本为零,您需要编写自己的默认构造函数和自己的复制构造函数作为空函数。 对复制构造函数执行此操作是一个非常糟糕的主意,因为它会破坏 std::vector 的内部重新分配算法。

摘要:您无法使用 std::vector 来做到这一点。

So here's the problem, resize is calling insert, which is doing a copy construction from a default constructed element for each of the newly added elements. To get this to 0 cost you need to write your own default constructor AND your own copy constructor as empty functions. Doing this to your copy constructor is a very bad idea because it will break std::vector's internal reallocation algorithms.

Summary: You're not going to be able to do this with std::vector.

回心转意 2024-07-13 07:47:20

您可以在元素类型周围使用包装类型,并使用不执行任何操作的默认构造函数。 例如:

template <typename T>
struct no_init
{
    T value;

    no_init() { static_assert(std::is_standard_layout<no_init<T>>::value && sizeof(T) == sizeof(no_init<T>), "T does not have standard layout"); }

    no_init(T& v) { value = v; }
    T& operator=(T& v) { value = v; return value; }

    no_init(no_init<T>& n) { value = n.value; }
    no_init(no_init<T>&& n) { value = std::move(n.value); }
    T& operator=(no_init<T>& n) { value = n.value; return this; }
    T& operator=(no_init<T>&& n) { value = std::move(n.value); return this; }

    T* operator&() { return &value; } // So you can use &(vec[0]) etc.
};

使用:

std::vector<no_init<char>> vec;
vec.resize(2ul * 1024ul * 1024ul * 1024ul);

You can use a wrapper type around your element type, with a default constructor that does nothing. E.g.:

template <typename T>
struct no_init
{
    T value;

    no_init() { static_assert(std::is_standard_layout<no_init<T>>::value && sizeof(T) == sizeof(no_init<T>), "T does not have standard layout"); }

    no_init(T& v) { value = v; }
    T& operator=(T& v) { value = v; return value; }

    no_init(no_init<T>& n) { value = n.value; }
    no_init(no_init<T>&& n) { value = std::move(n.value); }
    T& operator=(no_init<T>& n) { value = n.value; return this; }
    T& operator=(no_init<T>&& n) { value = std::move(n.value); return this; }

    T* operator&() { return &value; } // So you can use &(vec[0]) etc.
};

To use:

std::vector<no_init<char>> vec;
vec.resize(2ul * 1024ul * 1024ul * 1024ul);
芯好空 2024-07-13 07:47:20

我测试了这里建议的一些方法。
我在一个容器/指针中分配了一组巨大的数据(200GB):

编译器/操作系统:

g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

设置:(c ++ -17,-O3优化)

g++ --std=c++17 -O3

我用linux-time 1计时了总程序运行时间

。)std ::向量:

#include <vector>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t> vec(size);
}
real    0m36.246s
user    0m4.549s
sys     0m31.604s

那是36秒。

2.) std::vector 和 boost::noinit_adaptor

#include <vector>
#include <boost/core/noinit_adaptor.hpp>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t,boost::noinit_adaptor<std::allocator<size_t>>> vec(size);
}

real    0m0.002s
user    0m0.001s
sys     0m0.000s

所以这解决了问题。 仅在不初始化的情况下进行分配基本上没有任何成本(至少对于大型数组而言)。

3.) std::unique_ptr

#include <memory>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  auto data = std::unique_ptr<size_t[]>(new size_t[size]);
}

real    0m0.002s
user    0m0.002s
sys     0m0.000s

所以性能基本上与2.)相同,但不需要提升。
我还测试了简单的 new/delete 和 malloc/free,其性能与 2.) 和 3.) 相同。

因此,如果处理大型数据集,默认结构可能会带来巨大的性能损失。
实际上,您希望随后实际初始化分配的数据。
然而,一些性能损失仍然存在,特别是如果稍后的初始化是并行执行的。
例如,我用一组(伪)随机数初始化一个巨大的向量:(

现在我使用 fopenmp 在 24 核 AMD Threadripper 3960X 上进行并行化)

g++ --std=c++17-fopenmp -O3

1.)std::vector:

#include <vector>
#include <random>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t> vec(size);
  #pragma omp parallel
  {
    std::minstd_rand0 gen(42);
    #pragma omp for schedule(static)
    for (size_t i = 0; i < size; ++i) vec[i] = gen();
  }
}
real    0m41.958s
user    4m37.495s
sys     0m31.348s

即 42 秒,仅比默认值多 6 秒初始化。
问题是, std::vector 的初始化是顺序的。

2.) std::vector 与 boost::noinit_adaptor:

#include <vector>
#include <random>
#include <boost/core/noinit_adaptor.hpp>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t,boost::noinit_adaptor<std::allocator<size_t>>> vec(size);
  #pragma omp parallel
  {
    std::minstd_rand0 gen(42);
    #pragma omp for schedule(static)
    for (size_t i = 0; i < size; ++i) vec[i] = gen();
  }
}
real    0m10.508s
user    1m37.665s
sys     3m14.951s

因此,即使使用随机初始化,代码也快了 4 倍,因为我们可以跳过 std::vector 的顺序初始化。

因此,如果您处理巨大的数据集并计划随后并行初始化它们,则应避免使用默认的 std::vector。

I tested a few of the approaches suggested here.
I allocated a huge set of data (200GB) in one container/pointer:

Compiler/OS:

g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0

Settings: (c++-17, -O3 optimizations)

g++ --std=c++17 -O3

I timed the total program runtime with linux-time

1.) std::vector:

#include <vector>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t> vec(size);
}
real    0m36.246s
user    0m4.549s
sys     0m31.604s

That is 36 seconds.

2.) std::vector with boost::noinit_adaptor

#include <vector>
#include <boost/core/noinit_adaptor.hpp>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t,boost::noinit_adaptor<std::allocator<size_t>>> vec(size);
}

real    0m0.002s
user    0m0.001s
sys     0m0.000s

So this solves the problem. Just allocating without initializing costs basically nothing (at least for large arrays).

3.) std::unique_ptr<T[]>:

#include <memory>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  auto data = std::unique_ptr<size_t[]>(new size_t[size]);
}

real    0m0.002s
user    0m0.002s
sys     0m0.000s

So basically the same performance as 2.), but does not require boost.
I also tested simple new/delete and malloc/free with the same performance as 2.) and 3.).

So the default-construction can have a huge performance penalty if you deal with large data sets.
In practice you want to actually initialize the allocated data afterwards.
However, some of the performance penalty still remains, especially if the later initialization is performed in parallel.
E.g., I initialize a huge vector with a set of (pseudo)random numbers:

(now I use fopenmp for parallelization on a 24 core AMD Threadripper 3960X)

g++ --std=c++17-fopenmp -O3

1.) std::vector:

#include <vector>
#include <random>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t> vec(size);
  #pragma omp parallel
  {
    std::minstd_rand0 gen(42);
    #pragma omp for schedule(static)
    for (size_t i = 0; i < size; ++i) vec[i] = gen();
  }
}
real    0m41.958s
user    4m37.495s
sys     0m31.348s

That is 42s, only 6s more than the default initialization.
The problem is, that the initialization of std::vector is sequential.

2.) std::vector with boost::noinit_adaptor:

#include <vector>
#include <random>
#include <boost/core/noinit_adaptor.hpp>

int main(){
  constexpr size_t size = 1024lu*1024lu*1024lu*25lu;//25B elements = 200GB
  std::vector<size_t,boost::noinit_adaptor<std::allocator<size_t>>> vec(size);
  #pragma omp parallel
  {
    std::minstd_rand0 gen(42);
    #pragma omp for schedule(static)
    for (size_t i = 0; i < size; ++i) vec[i] = gen();
  }
}
real    0m10.508s
user    1m37.665s
sys     3m14.951s

So even with the random-initialization, the code is 4 times faster because we can skip the sequential initialization of std::vector.

So if you deal with huge data sets and plan to initialize them afterwards in parallel, you should avoid using the default std::vector.

白色秋天 2024-07-13 07:47:20

呃...

尝试一下方法:

std::vector<T>::reserve(x)

它将使您能够为 x 项保留足够的内存,而无需初始化任何项(您的向量仍然为空)。 因此,直到超过 x 才会重新分配。

第二点是向量不会将值初始化为零。 您正在调试中测试您的代码吗?

在 g++ 上验证后,以下代码:

#include <iostream>
#include <vector>

struct MyStruct
{
   int m_iValue00 ;
   int m_iValue01 ;
} ;

int main()
{
   MyStruct aaa, bbb, ccc ;

   std::vector<MyStruct> aMyStruct ;

   aMyStruct.push_back(aaa) ;
   aMyStruct.push_back(bbb) ;
   aMyStruct.push_back(ccc) ;

   aMyStruct.resize(6) ; // [EDIT] double the size

   for(std::vector<MyStruct>::size_type i = 0, iMax = aMyStruct.size(); i < iMax; ++i)
   {
      std::cout << "[" << i << "] : " << aMyStruct[i].m_iValue00 << ", " << aMyStruct[0].m_iValue01 << "\n" ;
   }

   return 0 ;
}

给出以下结果:

[0] : 134515780, -16121856
[1] : 134554052, -16121856
[2] : 134544501, -16121856
[3] : 0, -16121856
[4] : 0, -16121856
[5] : 0, -16121856

您看到的初始化可能是一个工件。

[编辑] 在对调整大小发表评论后,我修改了代码以添加调整大小行。 调整大小有效地调用向量内对象的默认构造函数,但如果默认构造函数不执行任何操作,则不会初始化任何内容...我仍然相信这是一个工件(我第一次成功地将整个向量归零)以下代码:

aMyStruct.push_back(MyStruct()) ;
aMyStruct.push_back(MyStruct()) ;
aMyStruct.push_back(MyStruct()) ;

所以...
:-/

[编辑 2] 就像 Arkadiy 已经提供的那样,解决方案是使用带有所需参数的内联构造函数。 像这样的东西

struct MyStruct
{
   MyStruct(int p_d1, int p_d2) : d1(p_d1), d2(p_d2) {}
   int d1, d2 ;
} ;

可能会内联在您的代码中。

但无论如何,您应该使用分析器研究您的代码,以确保这段代码是您的应用程序的瓶颈。

Err...

try the method:

std::vector<T>::reserve(x)

It will enable you to reserve enough memory for x items without initializing any (your vector is still empty). Thus, there won't be reallocation until to go over x.

The second point is that vector won't initialize the values to zero. Are you testing your code in debug ?

After verification on g++, the following code:

#include <iostream>
#include <vector>

struct MyStruct
{
   int m_iValue00 ;
   int m_iValue01 ;
} ;

int main()
{
   MyStruct aaa, bbb, ccc ;

   std::vector<MyStruct> aMyStruct ;

   aMyStruct.push_back(aaa) ;
   aMyStruct.push_back(bbb) ;
   aMyStruct.push_back(ccc) ;

   aMyStruct.resize(6) ; // [EDIT] double the size

   for(std::vector<MyStruct>::size_type i = 0, iMax = aMyStruct.size(); i < iMax; ++i)
   {
      std::cout << "[" << i << "] : " << aMyStruct[i].m_iValue00 << ", " << aMyStruct[0].m_iValue01 << "\n" ;
   }

   return 0 ;
}

gives the following results:

[0] : 134515780, -16121856
[1] : 134554052, -16121856
[2] : 134544501, -16121856
[3] : 0, -16121856
[4] : 0, -16121856
[5] : 0, -16121856

The initialization you saw was probably an artifact.

[EDIT] After the comment on resize, I modified the code to add the resize line. The resize effectively calls the default constructor of the object inside the vector, but if the default constructor does nothing, then nothing is initialized... I still believe it was an artifact (I managed the first time to have the whole vector zerooed with the following code:

aMyStruct.push_back(MyStruct()) ;
aMyStruct.push_back(MyStruct()) ;
aMyStruct.push_back(MyStruct()) ;

So...
:-/

[EDIT 2] Like already offered by Arkadiy, the solution is to use an inline constructor taking the desired parameters. Something like

struct MyStruct
{
   MyStruct(int p_d1, int p_d2) : d1(p_d1), d2(p_d2) {}
   int d1, d2 ;
} ;

This will probably get inlined in your code.

But you should anyway study your code with a profiler to be sure this piece of code is the bottleneck of your application.

落花浅忆 2024-07-13 07:47:20

从您对其他海报的评论来看,您似乎只剩下 malloc() 和朋友了。 Vector 不会让你拥有未构造的元素。

From your comments to other posters, it looks like you're left with malloc() and friends. Vector won't let you have unconstructed elements.

为人所爱 2024-07-13 07:47:20

从您的代码来看,您似乎有一个结构向量,每个结构都包含 2 个整数。 你可以使用 2 个整数向量吗? 那么

copy(data1, data1 + count, back_inserter(v1));
copy(data2, data2 + count, back_inserter(v2));

现在您不必为每次复制结构付费。

From your code, it looks like you have a vector of structs each of which comprises 2 ints. Could you instead use 2 vectors of ints? Then

copy(data1, data1 + count, back_inserter(v1));
copy(data2, data2 + count, back_inserter(v2));

Now you don't pay for copying a struct each time.

烏雲後面有陽光 2024-07-13 07:47:20

如果您确实坚持让元素未初始化并牺牲一些方法,例如 front()、back()、push_back(),请使用 numeric 中的 boost 向量。 它甚至允许您在调用 resize() 时不保留现有元素...

If you really insist on having the elements uninitialized and sacrifice some methods like front(), back(), push_back(), use boost vector from numeric . It allows you even not to preserve existing elements when calling resize()...

三生一梦 2024-07-13 07:47:20

我不确定所有那些说这是不可能的或告诉我们未定义行为的答案。

有时,您需要使用 std::vector。 但有时,你知道它的最终大小。 而且您还知道您的元素将在稍后构建。
示例:当您将向量内容序列化为二进制文件时,然后稍后将其读回。
虚幻引擎有它的 TArray::setNumUninitialized,为什么没有 std::vector ?

回答最初的问题
“有什么方法可以阻止初始化,或者是否有一个类似 STL 的容器,具有可调整大小的连续存储和未初始化的元素?”

是和否。
不,因为 STL 没有公开这样做的方法。

是的,因为我们使用 C++ 进行编码,而 C++ 允许做很多事情。 如果你准备好成为一个坏人(并且如果你真的知道自己在做什么)。 你可以劫持向量。

这里是一个仅适用于 Windows 的 STL 实现的示例代码,对于另一个平台,看看 std::vector 是如何实现以使用其内部成员的:

// This macro is to be defined before including VectorHijacker.h. Then you will be able to reuse the VectorHijacker.h with different objects.
#define HIJACKED_TYPE SomeStruct

// VectorHijacker.h
#ifndef VECTOR_HIJACKER_STRUCT
#define VECTOR_HIJACKER_STRUCT

struct VectorHijacker
{
    std::size_t _newSize;
};

#endif


template<>
template<>
inline decltype(auto) std::vector<HIJACKED_TYPE, std::allocator<HIJACKED_TYPE>>::emplace_back<const VectorHijacker &>(const VectorHijacker &hijacker)
{
    // We're modifying directly the size of the vector without passing by the extra initialization. This is the part that relies on how the STL was implemented.
    _Mypair._Myval2._Mylast = _Mypair._Myval2._Myfirst + hijacker._newSize;
}

inline void setNumUninitialized_hijack(std::vector<HIJACKED_TYPE> &hijackedVector, const VectorHijacker &hijacker)
{
    hijackedVector.reserve(hijacker._newSize);
    hijackedVector.emplace_back<const VectorHijacker &>(hijacker);
}

但是要注意,这是我们正在谈论的劫持。 这确实是肮脏的代码,只有当您真正知道自己在做什么时才可以使用它。 此外,它不可移植,并且严重依赖于 STL 的实现方式。

我不建议你使用它,因为这里的每个人(包括我)都是好人。 但我想让你知道,这可能与之前所有声称不是的答案相反。

I'm not sure about all those answers that says it is impossible or tell us about undefined behavior.

Sometime, you need to use an std::vector. But sometime, you know the final size of it. And you also know that your elements will be constructed later.
Example : When you serialize the vector contents into a binary file, then read it back later.
Unreal Engine has its TArray::setNumUninitialized, why not std::vector ?

To answer the initial question
"Is there any way to prevent the initialization, or is there an STL-like container out there with resizeable contiguous storage and uninitialized elements?"

yes and no.
No, because STL doesn't expose a way to do so.

Yes because we're coding in C++, and C++ allows to do a lot of thing. If you're ready to be a bad guy (and if you really know what you are doing). You can hijack the vector.

Here a sample code that works only for the Windows's STL implementation, for another platform, look how std::vector is implemented to use its internal members :

// This macro is to be defined before including VectorHijacker.h. Then you will be able to reuse the VectorHijacker.h with different objects.
#define HIJACKED_TYPE SomeStruct

// VectorHijacker.h
#ifndef VECTOR_HIJACKER_STRUCT
#define VECTOR_HIJACKER_STRUCT

struct VectorHijacker
{
    std::size_t _newSize;
};

#endif


template<>
template<>
inline decltype(auto) std::vector<HIJACKED_TYPE, std::allocator<HIJACKED_TYPE>>::emplace_back<const VectorHijacker &>(const VectorHijacker &hijacker)
{
    // We're modifying directly the size of the vector without passing by the extra initialization. This is the part that relies on how the STL was implemented.
    _Mypair._Myval2._Mylast = _Mypair._Myval2._Myfirst + hijacker._newSize;
}

inline void setNumUninitialized_hijack(std::vector<HIJACKED_TYPE> &hijackedVector, const VectorHijacker &hijacker)
{
    hijackedVector.reserve(hijacker._newSize);
    hijackedVector.emplace_back<const VectorHijacker &>(hijacker);
}

But beware, this is hijacking we're speaking about. This is really dirty code, and this is only to be used if you really know what you are doing. Besides, it is not portable and relies heavily on how the STL implementation was done.

I won't advise you to use it because everyone here (me included) is a good person. But I wanted to let you know that it is possible contrary to all previous answers that stated it wasn't.

雨后彩虹 2024-07-13 07:47:20

使用 std::vector::reserve() 方法。 它不会调整向量的大小,但会分配空间。

Use the std::vector::reserve() method. It won't resize the vector, but it will allocate the space.

帥小哥 2024-07-13 07:47:20

结构体本身是否需要位于连续的内存中,或者您可以使用 struct* 向量吗?

向量会复制添加到其中的任何内容,因此使用指针向量而不是对象向量是提高性能的一种方法。

Do the structs themselves need to be in contiguous memory, or can you get away with having a vector of struct*?

Vectors make a copy of whatever you add to them, so using vectors of pointers rather than objects is one way to improve performance.

倒数 2024-07-13 07:47:20

我不认为 STL 是你的答案。 您将需要使用 realloc() 推出您自己的解决方案。 您必须存储一个指针以及元素的大小或数量,并使用它来查找在 realloc() 之后开始添加元素的位置。

int *memberArray;
int arrayCount;
void GetsCalledALot(int* data1, int* data2, int count) {
    memberArray = realloc(memberArray, sizeof(int) * (arrayCount + count);
    for (int i = 0; i < count; ++i) {
        memberArray[arrayCount + i].d1 = data1[i];
        memberArray[arrayCount + i].d2 = data2[i];
    }
    arrayCount += count;
}

I don't think STL is your answer. You're going to need to roll your own sort of solution using realloc(). You'll have to store a pointer and either the size, or number of elements, and use that to find where to start adding elements after a realloc().

int *memberArray;
int arrayCount;
void GetsCalledALot(int* data1, int* data2, int count) {
    memberArray = realloc(memberArray, sizeof(int) * (arrayCount + count);
    for (int i = 0; i < count; ++i) {
        memberArray[arrayCount + i].d1 = data1[i];
        memberArray[arrayCount + i].d2 = data2[i];
    }
    arrayCount += count;
}
姜生凉生 2024-07-13 07:47:20

我会做类似的事情:

void GetsCalledALot(int* data1, int* data2, int count)
{
  const size_t mvSize = memberVector.size();
  memberVector.reserve(mvSize + count);

  for (int i = 0; i < count; ++i) {
    memberVector.push_back(MyType(data1[i], data2[i]));
  }
}

您需要为存储在 memberVector 中的类型定义一个 ctor,但这是一个很小的成本,因为它会给您带来两全其美的效果; 不会进行不必要的初始化,并且在循环期间不会发生重新分配。

I would do something like:

void GetsCalledALot(int* data1, int* data2, int count)
{
  const size_t mvSize = memberVector.size();
  memberVector.reserve(mvSize + count);

  for (int i = 0; i < count; ++i) {
    memberVector.push_back(MyType(data1[i], data2[i]));
  }
}

You need to define a ctor for the type that is stored in the memberVector, but that's a small cost as it will give you the best of both worlds; no unnecessary initialization is done and no reallocation will occur during the loop.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文