当前位置：文江博客话题详情

为什么 `std::input_iterator` 需要 `value_type`？

发布于 2025-01-14 12:50:06 字数 3738 浏览 4 评论 0 原文

我正在尝试为动态大小的数组创建一个数据结构。可以有多种选择，最简单的一种是 std::vector>。然而它通常效率不高，我们希望将所有内部向量的数据压缩成一个大的向量，并有一个偏移向量来告诉每个元素从哪里开始。

示例：

  // encoding of : | 4.,5.,1. | 7.,8.,9.,2 |
  std::vector<double> v    = {4.,5.,1., 7.,8.,9.,2};
  std::vector<int> offsets = {0       , 3         , 7};

让我们封装它！考虑以下数据结构：（

注意：代码既不完整、通用也不精确，此时这只是为了说明正在发生的事情）：

  class vblock_vector {
    private:
      std::vector<double> v;
      std::vector<int> offsets;

    public:
      using iterator = vblock_iterator;
      auto begin() -> iterator {
        return {v.data(),offsets.data()};
      }
      auto end() -> iterator {
        return {v.data(),offsets.data()+offsets.size()};
      }
  };

迭代器类型的基本实现如下：

  struct vblock_iterator {
    private:
      double* ptr;
      int* offsets_ptr;

    public:
      using reference = span_ref<double>; // see notes (0) and (1)
      // using value_type = ???; // See below

      auto operator++() {
        ++offsets_ptr;
        return *this;
      }
      auto operator*() const {
        return span_ref<double,int>(ptr+offsets_ptr[0],ptr+offsets_ptr[1]);
      }
      auto operator<=>(const vblock_iterator&) const = default;

      // ... other iterator interface stuff that is trivial
  };

此迭代器可与例如 std::copy 配合使用。 (4)

现在假设我想用 std::ranges::copy 替换旧的 std::copy 调用。为此，vblock_iterator 需要满足 std::input_iterator 概念。为了做到这一点，vblock_iterator需要有一个关联的value_type（中间std::indirectly_read 概念）。

一个明显的选择是使用 value_type = std::vector(2)，但我肯定不想给出 std::ranges: :copy 在其实现中自行决定使用此类型的自由：这将是低效的。

我的问题如下：为什么 std::input_iterator 需要 In 具有 value_type？至少对于复制来说不需要它（我可以使用 std::copy 并且它做正确的事情证明了这一点）。当然，有人可以说：“将 value_type 定义为任何内容，无论如何它都不会被 std::range::copy 实现使用”，但是为什么需要它？

我目前的印象是 value_type 对于例如 std::swappable 是强制性的，但对于 std::input_iterator 不是强制性的（甚至也不是 >std::random_access_iterator 我敢说）。但标准委员会却另有决定：这一选择背后的原因是什么？ (3)

注释：

(0) span_ref 就像一个 std::span 具有引用语义（其 operator= 是“通过分配”而不是“重新绑定到新数组”）。

(1) 实际上，引用类型需要更复杂一点才能考虑偏移量，但这不是这里的主题。可以说，这个结构可以有一个有效的引用类型。

(2) 我认为这是唯一合理的选择。至少需要一个容器（vector、deque...）。例如 std::span 就不行，因为如果我们费心去保存迭代器指向的值，那是因为我们会修改原始内存，而 std::span< /code> 不会帮助我们解决这个问题。

(3) 在演示文稿中关于 std::indirectly_read 概念（后来称为 Readable），Eric Niebler 详细介绍了为什么我们需要value_type 以某种形式与 reference 相关，以便与代理引用很好地配合使用，但我仍然不明白为什么我们甚至需要 value_type > 对于不需要交换元素（或将它们存储在某处）的算法。是的，在数学上，vblock_iterator 存在一个 value_type，但如果不打算使用它，为什么需要它呢？（类似地，也有数学operator+=用于向前范围：但由于它效率低下，所以根本不需要）。

(4) 以及其他算法：std::move、std::find、std::find_if、 std::any_of、std::partition_point、std::lower_bound、std::unique...所以我认为有一些东西更根本的事情比：“我们只是幸运地拥有 std::copy”。

原文

I am trying to create a data structure for arrays of dynamic-sized arrays. Multiple choices are possible, the simplest one being std::vector<std::vector<T>>. However it is often not efficient and we would like to compress the data of all the inner vectors into one big vector, and have a vector of offsets to tell where each element begins.

Example:

  // encoding of : | 4.,5.,1. | 7.,8.,9.,2 |
  std::vector<double> v    = {4.,5.,1., 7.,8.,9.,2};
  std::vector<int> offsets = {0       , 3         , 7};

Let's encapsulate it ! Consider the following data structure:

(note: the code is neither complete, general or precise, at this point this is just to give an idea of what is going on):

  class vblock_vector {
    private:
      std::vector<double> v;
      std::vector<int> offsets;

    public:
      using iterator = vblock_iterator;
      auto begin() -> iterator {
        return {v.data(),offsets.data()};
      }
      auto end() -> iterator {
        return {v.data(),offsets.data()+offsets.size()};
      }
  };

An basic implementation of the iterator type is the following:

  struct vblock_iterator {
    private:
      double* ptr;
      int* offsets_ptr;

    public:
      using reference = span_ref<double>; // see notes (0) and (1)
      // using value_type = ???; // See below

      auto operator++() {
        ++offsets_ptr;
        return *this;
      }
      auto operator*() const {
        return span_ref<double,int>(ptr+offsets_ptr[0],ptr+offsets_ptr[1]);
      }
      auto operator<=>(const vblock_iterator&) const = default;

      // ... other iterator interface stuff that is trivial
  };

This iterator works with e.g. std::copy. (4)

Now let's say that I want to replace my old std::copy calls with std::ranges::copy. For that, vblock_iterator needs to satisfy the std::input_iterator concept. In order to do that, vblock_iterator needs to have an associated value_type (required by the intermediate std::indirectly_readable concept).

An obvious choice would be using value_type = std::vector<double>(2), but I surely don't want to give std::ranges::copy the freedom to use this type at its discretion in its implementation: it would be inefficient.

My question is the following : why does std::input_iterator<In> requires In to have a value_type? At least for copying it is not needed (the fact that I can use std::copy and that it does the right thing proves it). Of course, one can say : "define value_type to be anything, it won't be used by std::range::copy implementations anyway", but then why require it?

I am currently under the impression that value_type is mandatory for e.g. std::swappable, but not for std::input_iterator (nor even std::random_access_iterator dare I say). But the standard committee decided otherwise: what is the reason behind this choice? (3)

Notes:

(0) span_ref is just like a std::span with reference semantics (its operator= is "assign-through" and not "rebind to new array").

(1) In reality, the reference type needs to be a tad more complex to account for offsets, but it is not the subject here. Suffice to say, that it is possible to have an efficient reference type for this structure.

(2) And I think this is the only reasonable choice. At least a container is needed (vector, deque...). E.g. a std::span won't do because if we bother to save the value pointed to by the iterator, it is because we will modify the original memory, and std::span won't help us with that.

(3) In the presentation of the std::indirectly_readable concept (then called Readable), Eric Niebler goes into some detail of why we need value_type to be related in some form to reference to work well with proxy references, but I still don't see why we would would even need value_type for algorithms that don't need to swap elements (or store them somewhere). Yes, there is mathematically a value_type for vblock_iterator, but why require it if it is not meant to be used? (similarly, there is also mathematical operator+= for forward ranges : but since it is inefficient, it is simply not required).

(4) And other algorithms: std::move, std::find, std::find_if, std::any_of, std::partition_point, std::lower_bound, std::unique... So I think that there is something more fundamental going on than: "we are just lucky with std::copy".

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

素食主义者 2025-01-21 12:50:06

std::copy 需要 LegacyInputIterator 其迭代器类型。 它不会检查此要求。如果您未能提供 LegacyInputIterator，则您的程序格式错误，无需进行诊断。

LegacyInputIterator 要求 std::iterator_traits::value_type 存在，因为它包含 LegacyIterator。

因此，一旦您将程序传递给 std::copy，您的程序就格式不正确。格式错误的程序的行为无论如何都不是由 C++ 标准决定的；编译器可以合法地为您提供一个程序，将您的浏览器历史记录通过电子邮件发送给您的尤斯蒂斯阿姨，并且符合标准。或者它可以做一些恰好与您认为程序“应该”做的事情一致的事情。否则可能无法编译。

std::ranges 算法的要求略有不同。与旧式算法相比，这些要求更有可能通过概念进行检查，并告诉用户编译时错误。

你就遇到了这样的情况。

更清楚地说，您不能依赖 std 代码的实现来强制执行该标准。

需要这些类型的部分原因是为了更容易地讨论相关类型以及对它们的操作在语义上的含义。

除了诸如 std::iterator_traits::value_type 之类的简单要求之外，还存在关于 *it 的作用、x = *it++< 的语义要求/code> 确实如此，等等。大多数这些要求无法被编译器检查（由于莱斯定理，理论上它们无法检查）；但是 std 命名空间中的算法依赖对于传入的任何迭代器来说这些语义都是正确的。

因为编译器可以假设语义是正确的，所以算法可以更清晰，比检查更简单、更快。这意味着多个不同的编译器供应商可以编写不同的 std 算法实现，相互改进算法，并且有一个客观的标准可以争论。

对于 LegacyInputIterator 以及来自 std::iterator_traits 的 value_type 和 reference 类型，我们必须

value_type v = *it;

：一个有效的表达式，*it 必须返回 reference，并且

*it++

必须返回可转换为 value_type 的类型。

并非每个算法都需要使用它要求迭代器具有的每个迭代器的每个属性。这里的目标是拥有语义上有意义的类别，并且不需要太多的开销。

要求东西上的迭代器实际上有一个它是迭代器的类型并不是一个很大的开销。这使得谈论迭代器变得非常容易。

您可以重构它并删除该概念，或者将概念分割成更小的部分，以便仅在需要时才需要 value_type ，但这会使概念更难编写并且更难理解。

std::copy requires a LegacyInputIterator for its iterator types. It does not check this requirement. If you fail to provide a LegacyInputIterator, your program is ill-formed, no diagnostic required.

A LegacyInputIterator requires that std::iterator_traits<X>::value_type exists because it subsumes LegacyIterator.

So your program was ill-formed once you passed it to std::copy. The behavior of your ill-formed program is not determined by the C++ standard in any way; the compiler can legally provide you a program that emails your browser history to your great aunt Eustice and be standard compliant. Or it could do something that happens to align with what you think the program "should" do. Or it could fail to compile.

The std::ranges algorithms have slightly different requirements. These requirements are far more likely to be checked by concepts than the old style algorithms are, telling the user with a compile time error.

You are running into such a case.

To be even more clear, you cannot rely on the implementation of std code to enforce the standard.

These types are required partly to make it easier to talk about the types in question and what operations on them mean, semantically.

Beyond the simple requirements like std::iterator_traits<X>::value_type exist, there are semantic requirements on what *it does, what x = *it++ does, etc. Most of those requirements cannot be checked by the compiler (due to Rice's theorem, they cannot be checked in theory); but the algorithms in the std namespace rely on those semantic meanings being correct for any iterator passed in.

Because the compiler can assume the semantic meanings are correct, the algorithms can be cleaner, simpler and faster than if they had to check them. And it means that multiple different compiler vendors can write different std algorithm implementations, improving the algorithm over each other, and there is an objective standard to argue against.

For a LegacyInputIterator and types value_type and reference from std::iterator_traits<X>, we must have:

value_type v = *it;

is a valid expression, *it must return reference, and

*it++

must return a type convertible to value_type.

Not every algorithm need use every property of every iterator it requires that iterator to have. The goal here is to have semantically meaningful categories that do not demand too much in the way of overhead.

Requiring that an iterator over stuff actually have a type it is an iterator over is not a large overhead. And it makes talking about that the iterator is insanely easier.

You could refactor it and remove that concept, or cut the concept up into smaller pieces so that the value_type is only required in the narrow cases where it is required, but that would make the concepts harder to write about and harder to understand.

回复收藏 0 原文

~没有更多了~

关于作者

自此以后，行同陌路

暂无简介

文章

776 人气

关注发私信

友情链接

文江博客

为什么 `std::input_iterator` 需要 `value_type`？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

为什么 `std::input_iterator` 需要 `value_type`？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。