将 std::vector 转换为另一个 std::vector 的最快方法
将 std::vector 从一种数据类型转换为另一种数据类型的最快方法(如果有其他方法)是什么(以节省空间为目的)?例如:
std::vector<unsigned short> ----> std::vector<bool>
我们显然假设第一个向量只包含0和1。如果向量非常大,逐个元素复制效率非常低。
条件问题: 如果您认为没有办法做得更快,是否有一种复杂的数据类型实际上允许从一种数据类型快速转换为另一种数据类型?
What is the fastest way (if there is any other) to convert a std::vector from one datatype to another (with the idea to save space)? For example:
std::vector<unsigned short> ----> std::vector<bool>
we obviously assume that the first vector only contains 0s and 1s. Copying element by element is highly inefficient in case of a really large vector.
Conditional question:
If you think there is no way to do it faster, is there a complex datatype which actually allows fast conversion from one datatype to another?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
停止。
std::vector
是...不是。std::vector
对bool
类型的使用有专门化,这会导致vector
发生某些变化。也就是说,它不再像std::vector
那样工作。该标准保证您可以使用
std::vector
执行某些操作。而vector
违反了这些保证。因此,您在使用它们时应该非常小心。不管怎样,我会假装你说的是
vector
而不是vector
,因为后者确实让事情变得复杂。只要你做错了。
您想要的类型的矢量转换需要仔细完成才能有效。
如果源
T
类型可转换为目标T
,那么这工作得很好:体面的实现应该识别它们何时被赋予随机访问迭代器并进行优化适当的内存分配和循环。
对于简单类型的不可转换类型来说,最大的问题不是这样做:
那很糟糕。这将分配一个适当大小的缓冲区,但它也会用数据填充它。即,默认构造的
int
(int()
)。相反,您应该这样做:
这保留了等于原始向量的容量,但它也确保不会发生默认构造。您现在可以
push_back
尽情享受,因为您知道您永远不会导致新向量中的重新分配。从那里,您可以循环遍历旧向量中的每个条目,根据需要进行转换。
Stop.
A
std::vector<bool>
is... not.std::vector
has a specialization for the use of the typebool
, which causes certain changes in thevector
. Namely, it stops acting like astd::vector
.There are certain things that the standard guarantees you can do with a
std::vector
. Andvector<bool>
violates those guarantees. So you should be very careful about using them.Anyway, I'm going to pretend you said
vector<int>
instead ofvector<bool>
, as the latter really complicates things.Only if you do it wrong.
Vector casting of the type you want needs to be done carefully to be efficient.
If the the source
T
type is convertible to the destinationT
, then this is works just fine:Decent implementations should recognize when they've been given random-access iterators and optimize the memory allocation and loop appropriately.
The biggest problem for non-convertible types you'll have for simple types is not doing this:
That's bad. That will allocate a buffer of the proper size, but it will also fill it with data. Namely, default-constructed
int
s (int()
).Instead, you should do this:
This reserves capacity equal to the original vector, but it also ensures that no default construction takes place. You can now
push_back
to your hearts content, knowing that you will never cause reallocation in your new vector.From there, you can just loop over each entry in the old vector, doing the conversion as needed.
没有办法避免复制,因为
std::vector
是一个独特的从
std::vector
输入,并且它们无法共享记忆。除此之外,这取决于数据的映射方式。如果
映射对应于隐式转换(例如,
unsigned Short
到bool
),然后简单地使用开始和结束创建一个新向量旧的迭代器可以解决这个问题:
如果映射不仅仅是隐式转换(这包括
您想要验证事物的情况;例如,
无符号短
确实只包含
0
或1
),那么它会变得更加复杂。这明显的解决方案是使用 std::transform:
,其中
TranformationObject
是一个函数对象,它执行以下操作转换,例如:(
请注意,我只是使用此转换函数作为示例。
如果唯一区分变换函数和
隐式转换是验证,验证可能会更快
首先使用
std::for_each
获取oldV
中的所有值,然后使用上面的两个迭代器构造函数。)
根据默认构造目标类型的成本,它可能是
更快地创建具有正确大小的新向量,然后覆盖
it:
最后,另一种可能性是使用
boost::transform_iterator
。比如:从很多方面来说,这是我更喜欢的解决方案;取决于如何
boost::transform_iterator
已经实现,它也可能是最快。
There's no way to avoid the copy, since a
std::vector<T>
is a distincttype from
std::vector<U>
, and there's no way for them to share thememory. Other than that, it depends on how the data is mapped. If the
mapping corresponds to an implicit conversion (e.g.
unsigned short
tobool
), then simply creating a new vector using the begin and enditerators from the old will do the trick:
If the mapping isn't just an implicit conversion (and this includes
cases where you want to verify things; e.g. that the
unsigned short
does contain only
0
or1
), then it gets more complicated. Theobvious solution would be to use std::transform:
, where
TranformationObject
is a functional object which does thetransformation, e.g.:
(Note that I'm just using this transformation function as an example.
If the only thing which distinguishes the transformation function from
an implicit conversion is the verification, it might be faster to verify
all of the values in
oldV
first, usingstd::for_each
, and then usethe two iterator constructor above.)
Depending on the cost of default constructing the target type, it may be
faster to create the new vector with the correct size, then overwrite
it:
Finally, another possibility would be to use a
boost::transform_iterator
. Something like:In many ways, this is the solution I prefer; depending on how
boost::transform_iterator
has been implemented, it could also be thefastest.
您应该能够像这样使用
assign
:You should be able to use
assign
like this:最快的方法就是不做。例如,如果您事先知道您的项目只需要一个字节来存储,则首先使用字节大小向量。您会发现很难找到比这更快的方法:-)
如果不可能,那么只需吸收转换成本即可。即使它有点慢(这绝不是确定的,请参阅 尼科尔的精彩回答了解详情),还是有必要的。如果不是,您只需将其保留在较大类型的向量中即可。
The fastest way to do it is to not do it. For example, if you know in advance that your items only need a byte for storage, use a byte-size vector to begin with. You'll find it difficult to find a faster way than that :-)
If that's not possible, then just absorb the cost of the conversion. Even if it's a little slow (and that's by no means certain, see Nicol's excellent answer for details), it's still necessary. If it wasn't, you would just leave it in the larger-type vector.
首先,警告:不要按照我的建议去做。这是危险的,绝对不能这样做。也就是说,如果您无论如何都必须挤出一点点性能......
首先,有一些警告。如果不满足这些要求,则无法执行此操作:
向量必须包含普通旧数据。如果您的类型有指针,或使用析构函数,或需要运算符 = 才能正确复制...请不要这样做。
两个向量包含的 sizeof() 类型必须相同。即,矢量< A>可以从向量复制B>仅当 sizeof(A) == sizeof(B) 时。
这是一个相当稳定的方法:
它对向量 b 中包含的内存进行非常快速的块复制,直接粉碎向量 a 中的任何数据。它不调用构造函数,不执行任何安全检查,并且比此处给出的任何其他方法都要快得多。理论上,优化编译器应该能够匹配它的速度,但除非您使用的是非常好的编译器,否则它不会(几年前我检查过 Visual C++,结果还差得远)。
另外,考虑到这些限制,您可以强制(通过 void *)将一种向量类型转换为另一种向量类型并交换它们——我有一个代码示例,但它开始在我的屏幕上渗出外质,所以我删除了它。
First, a warning: Don't do what I'm about to suggest. It's dangerous and must never be done. That said, if you just have to squeeze out a tiny bit more performance No Matter What...
First, there are some caveats. If you don't meet these, you can't do this:
The vector must contain plain-old-data. If your type has pointers, or uses a destructor, or needs an operator = to copy correctly ... do not do this.
The sizeof() both vector's contained types must be the same. That is, vector< A > can copy from vector< B > only if sizeof(A) == sizeof(B).
Here is a fairly stable method:
This does a very fast, block copy of the memory contained in vector b, directly smashing whatever data you have in vector a. It doesn't call constructors, it doesn't do any safety checking, and it's much faster than any of the other methods given here. An optimizing compiler should be able to match the speed of this in theory, but unless you're using an unusually good one, it won't (I checked with Visual C++ a few years ago, and it wasn't even close).
Also, given these constraints, you could forcibly (via void *) cast one vector type to the other and swap them -- I had a code sample for that, but it started oozing ectoplasm on my screen, so I deleted it.
逐个元素复制的效率并不是很低。 std::vector 为其任何元素提供恒定的访问时间,因此整个操作将是 O(n) 。你不会注意到它。
Copying element by element is not highly inefficient. std::vector provides constant access time to any of its elements, hence the operation will be O(n) overall. You will not notice it.