我正在尝试为动态大小的数组创建一个数据结构。可以有多种选择,最简单的一种是 std::vector>
。然而它通常效率不高,我们希望将所有内部向量的数据压缩成一个大的向量,并有一个偏移向量来告诉每个元素从哪里开始。
示例:
// encoding of : | 4.,5.,1. | 7.,8.,9.,2 |
std::vector<double> v = {4.,5.,1., 7.,8.,9.,2};
std::vector<int> offsets = {0 , 3 , 7};
让我们封装它!考虑以下数据结构:(
注意:代码既不完整、通用也不精确,此时这只是为了说明正在发生的事情):
class vblock_vector {
private:
std::vector<double> v;
std::vector<int> offsets;
public:
using iterator = vblock_iterator;
auto begin() -> iterator {
return {v.data(),offsets.data()};
}
auto end() -> iterator {
return {v.data(),offsets.data()+offsets.size()};
}
};
迭代器类型的基本实现如下:
struct vblock_iterator {
private:
double* ptr;
int* offsets_ptr;
public:
using reference = span_ref<double>; // see notes (0) and (1)
// using value_type = ???; // See below
auto operator++() {
++offsets_ptr;
return *this;
}
auto operator*() const {
return span_ref<double,int>(ptr+offsets_ptr[0],ptr+offsets_ptr[1]);
}
auto operator<=>(const vblock_iterator&) const = default;
// ... other iterator interface stuff that is trivial
};
此迭代器可与例如 std::copy
配合使用。 (4)
现在假设我想用 std::ranges::copy
替换旧的 std::copy
调用。为此,vblock_iterator 需要满足 std::input_iterator 概念。为了做到这一点,vblock_iterator
需要有一个关联的value_type
(中间std::indirectly_read
概念)。
一个明显的选择是使用 value_type = std::vector(2),但我肯定不想给出 std::ranges: :copy 在其实现中自行决定使用此类型的自由:这将是低效的。
我的问题如下:为什么 std::input_iterator
需要 In
具有 value_type
?至少对于复制来说不需要它(我可以使用 std::copy 并且它做正确的事情证明了这一点)。当然,有人可以说:“将 value_type
定义为任何内容,无论如何它都不会被 std::range::copy
实现使用”,但是为什么需要它?
我目前的印象是 value_type
对于例如 std::swappable
是强制性的,但对于 std::input_iterator
不是强制性的(甚至也不是 >std::random_access_iterator
我敢说)。但标准委员会却另有决定:这一选择背后的原因是什么? (3)
注释:
(0) span_ref
就像一个 std::span 具有引用语义(其 operator=
是“通过分配”而不是“重新绑定到新数组”)。
(1) 实际上,引用类型需要更复杂一点才能考虑偏移量,但这不是这里的主题。可以说,这个结构可以有一个有效的引用类型。
(2) 我认为这是唯一合理的选择。至少需要一个容器(vector
、deque
...)。例如 std::span
就不行,因为如果我们费心去保存迭代器指向的值,那是因为我们会修改原始内存,而 std::span< /code> 不会帮助我们解决这个问题。
(3) 在演示文稿中关于 std::indirectly_read
概念(后来称为 Readable
),Eric Niebler 详细介绍了为什么我们需要value_type
以某种形式与 reference
相关,以便与代理引用很好地配合使用,但我仍然不明白为什么我们甚至需要 value_type
> 对于不需要交换元素(或将它们存储在某处)的算法。是的,在数学上,vblock_iterator
存在一个 value_type
,但如果不打算使用它,为什么需要它呢? (类似地,也有数学operator+=
用于向前范围:但由于它效率低下,所以根本不需要)。
(4) 以及其他算法:std::move
、std::find
、std::find_if
、 std::any_of
、std::partition_point
、std::lower_bound
、std::unique
...所以我认为有一些东西更根本的事情比:“我们只是幸运地拥有 std::copy
”。
I am trying to create a data structure for arrays of dynamic-sized arrays. Multiple choices are possible, the simplest one being std::vector<std::vector<T>>
. However it is often not efficient and we would like to compress the data of all the inner vectors into one big vector
, and have a vector
of offsets to tell where each element begins.
Example:
// encoding of : | 4.,5.,1. | 7.,8.,9.,2 |
std::vector<double> v = {4.,5.,1., 7.,8.,9.,2};
std::vector<int> offsets = {0 , 3 , 7};
Let's encapsulate it ! Consider the following data structure:
(note: the code is neither complete, general or precise, at this point this is just to give an idea of what is going on):
class vblock_vector {
private:
std::vector<double> v;
std::vector<int> offsets;
public:
using iterator = vblock_iterator;
auto begin() -> iterator {
return {v.data(),offsets.data()};
}
auto end() -> iterator {
return {v.data(),offsets.data()+offsets.size()};
}
};
An basic implementation of the iterator type is the following:
struct vblock_iterator {
private:
double* ptr;
int* offsets_ptr;
public:
using reference = span_ref<double>; // see notes (0) and (1)
// using value_type = ???; // See below
auto operator++() {
++offsets_ptr;
return *this;
}
auto operator*() const {
return span_ref<double,int>(ptr+offsets_ptr[0],ptr+offsets_ptr[1]);
}
auto operator<=>(const vblock_iterator&) const = default;
// ... other iterator interface stuff that is trivial
};
This iterator works with e.g. std::copy
. (4)
Now let's say that I want to replace my old std::copy
calls with std::ranges::copy
. For that, vblock_iterator
needs to satisfy the std::input_iterator
concept. In order to do that, vblock_iterator
needs to have an associated value_type
(required by the intermediate std::indirectly_readable
concept).
An obvious choice would be using value_type = std::vector<double>
(2), but I surely don't want to give std::ranges::copy
the freedom to use this type at its discretion in its implementation: it would be inefficient.
My question is the following : why does std::input_iterator<In>
requires In
to have a value_type
? At least for copying it is not needed (the fact that I can use std::copy
and that it does the right thing proves it). Of course, one can say : "define value_type
to be anything, it won't be used by std::range::copy
implementations anyway", but then why require it?
I am currently under the impression that value_type
is mandatory for e.g. std::swappable
, but not for std::input_iterator
(nor even std::random_access_iterator
dare I say). But the standard committee decided otherwise: what is the reason behind this choice? (3)
Notes:
(0) span_ref
is just like a std::span
with reference semantics (its operator=
is "assign-through" and not "rebind to new array").
(1) In reality, the reference type needs to be a tad more complex to account for offsets, but it is not the subject here. Suffice to say, that it is possible to have an efficient reference type for this structure.
(2) And I think this is the only reasonable choice. At least a container is needed (vector
, deque
...). E.g. a std::span
won't do because if we bother to save the value pointed to by the iterator, it is because we will modify the original memory, and std::span
won't help us with that.
(3) In the presentation of the std::indirectly_readable
concept (then called Readable
), Eric Niebler goes into some detail of why we need value_type
to be related in some form to reference
to work well with proxy references, but I still don't see why we would would even need value_type
for algorithms that don't need to swap elements (or store them somewhere). Yes, there is mathematically a value_type
for vblock_iterator
, but why require it if it is not meant to be used? (similarly, there is also mathematical operator+=
for forward ranges : but since it is inefficient, it is simply not required).
(4) And other algorithms: std::move
, std::find
, std::find_if
, std::any_of
, std::partition_point
, std::lower_bound
, std::unique
... So I think that there is something more fundamental going on than: "we are just lucky with std::copy
".
发布评论
评论(1)
std::copy
需要LegacyInputIterator
其迭代器类型。 它不会检查此要求。如果您未能提供LegacyInputIterator
,则您的程序格式错误,无需进行诊断。LegacyInputIterator
要求std::iterator_traits::value_type
存在,因为它包含LegacyIterator
。因此,一旦您将程序传递给
std::copy
,您的程序就格式不正确。格式错误的程序的行为无论如何都不是由 C++ 标准决定的;编译器可以合法地为您提供一个程序,将您的浏览器历史记录通过电子邮件发送给您的尤斯蒂斯阿姨,并且符合标准。或者它可以做一些恰好与您认为程序“应该”做的事情一致的事情。否则可能无法编译。std::ranges
算法的要求略有不同。与旧式算法相比,这些要求更有可能通过概念进行检查,并告诉用户编译时错误。你就遇到了这样的情况。
更清楚地说,您不能依赖
std
代码的实现来强制执行该标准。需要这些类型的部分原因是为了更容易地讨论相关类型以及对它们的操作在语义上的含义。
除了诸如
std::iterator_traits::value_type
之类的简单要求之外,还存在关于*it
的作用、x = *it++< 的语义要求/code> 确实如此,等等。大多数这些要求无法被编译器检查(由于莱斯定理,理论上它们无法检查);但是
std
命名空间中的算法依赖对于传入的任何迭代器来说这些语义都是正确的。因为编译器可以假设语义是正确的,所以算法可以更清晰,比检查更简单、更快。这意味着多个不同的编译器供应商可以编写不同的 std 算法实现,相互改进算法,并且有一个客观的标准可以争论。
对于
LegacyInputIterator
以及来自std::iterator_traits
的value_type
和reference
类型,我们必须:一个有效的表达式,
*it
必须返回reference
,并且必须返回可转换为
value_type
的类型。并非每个算法都需要使用它要求迭代器具有的每个迭代器的每个属性。这里的目标是拥有语义上有意义的类别,并且不需要太多的开销。
要求东西上的迭代器实际上有一个它是迭代器的类型并不是一个很大的开销。这使得谈论迭代器变得非常容易。
您可以重构它并删除该概念,或者将概念分割成更小的部分,以便仅在需要时才需要
value_type
,但这会使概念更难编写并且更难理解。std::copy
requires aLegacyInputIterator
for its iterator types. It does not check this requirement. If you fail to provide aLegacyInputIterator
, your program is ill-formed, no diagnostic required.A
LegacyInputIterator
requires thatstd::iterator_traits<X>::value_type
exists because it subsumesLegacyIterator
.So your program was ill-formed once you passed it to
std::copy
. The behavior of your ill-formed program is not determined by the C++ standard in any way; the compiler can legally provide you a program that emails your browser history to your great aunt Eustice and be standard compliant. Or it could do something that happens to align with what you think the program "should" do. Or it could fail to compile.The
std::ranges
algorithms have slightly different requirements. These requirements are far more likely to be checked by concepts than the old style algorithms are, telling the user with a compile time error.You are running into such a case.
To be even more clear, you cannot rely on the implementation of
std
code to enforce the standard.These types are required partly to make it easier to talk about the types in question and what operations on them mean, semantically.
Beyond the simple requirements like
std::iterator_traits<X>::value_type
exist, there are semantic requirements on what*it
does, whatx = *it++
does, etc. Most of those requirements cannot be checked by the compiler (due to Rice's theorem, they cannot be checked in theory); but the algorithms in thestd
namespace rely on those semantic meanings being correct for any iterator passed in.Because the compiler can assume the semantic meanings are correct, the algorithms can be cleaner, simpler and faster than if they had to check them. And it means that multiple different compiler vendors can write different
std
algorithm implementations, improving the algorithm over each other, and there is an objective standard to argue against.For a
LegacyInputIterator
and typesvalue_type
andreference
fromstd::iterator_traits<X>
, we must have:is a valid expression,
*it
must returnreference
, andmust return a type convertible to
value_type
.Not every algorithm need use every property of every iterator it requires that iterator to have. The goal here is to have semantically meaningful categories that do not demand too much in the way of overhead.
Requiring that an iterator over stuff actually have a type it is an iterator over is not a large overhead. And it makes talking about that the iterator is insanely easier.
You could refactor it and remove that concept, or cut the concept up into smaller pieces so that the
value_type
is only required in the narrow cases where it is required, but that would make the concepts harder to write about and harder to understand.