启用 STL 迭代器调试到底有什么作用？

发布于 2024-08-30 19:46:43 字数 166 浏览 5 评论 0原文

我通过定义在应用程序中启用了迭代器调试，

_HAS_ITERATOR_DEBUGGING = 1

我希望这实际上只是检查向量边界，但我有一种感觉它所做的远不止于此。实际上正在执行哪些检查等？

顺便说一句，Dinkumware STL。

原文

I've enabled iterator debugging in an application by defining

_HAS_ITERATOR_DEBUGGING = 1

I was expecting this to really just check vector bounds, but I have a feeling it's doing a lot more than that. What checks, etc are actually being performed?

Dinkumware STL, by the way.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

深爱不及久伴 2024-09-06 19:46:43

有许多迭代器操作会导致未定义的行为，此触发器的目标是激活运行时检查以防止其发生（使用断言）。

问题

显而易见的操作是使用无效的迭代器，但这种无效性可能由多种原因引起：

未初始化的迭代
器迭代到已被擦除的元素
迭代到物理位置已更改的元素（重新分配a vector)
Iterator Outside of [begin, end)

该标准在每个容器的令人难以忍受的细节中指定了哪个操作使哪个迭代器无效。

还有一个人们往往会忘记的不太明显的原因：将迭代器混合到不同的容器：

std::vector<Animal> cats, dogs;

for_each(cats.begin(), dogs.end(), /**/); // obvious bug

这涉及一个更普遍的问题：传递给算法的范围的有效性。

[cats.begin(), dogs.end()) 无效（除非其中一个是另一个的别名）
[cats.end(), cats.begin()) code> 无效（除非 cats 为空？？）

解决方案

解决方案包括向迭代器添加信息，以便它们的有效性以及它们定义的范围的有效性可以在执行期间被断言，从而防止发生未定义的行为。

_HAS_ITERATOR_DEBUGGING 符号充当此功能的触发器，因为不幸的是它会减慢程序速度。理论上非常简单：每个迭代器都是发出它的容器的观察者，因此会收到修改通知。

在 Dinkumware 中，这是通过两个附加功能来实现的：

每个迭代器都携带一个指向其相关容器的指针
每个容器都保存它创建的迭代器的链表

这巧妙地解决了我们的问题：

未初始化的迭代器没有父容器，大多数操作（除了分配和销毁）将触发断言
已删除或移动元素的迭代器已被通知（感谢列表）并知道其无效性
在递增和递减迭代器时，它可以检查它是否保持在边界内
检查 2 个迭代器所属到同一个容器就像比较它们的父指针一样简单
检查范围的有效性就像在到达容器末尾之前检查我们是否到达范围的末尾一样简单（对于那些不可随机访问的容器的线性操作），因此大多数）

成本

成本很重，但是正确性有代价吗？我们可以分解成本：

额外的内存分配（维护的额外迭代器列表）：O(NbIterators)
变异操作的通知过程：O(NbIterators)（请注意push_back 或 insert 不一定会使所有迭代器无效，但 erase 会）
范围有效性检查：O( min(last-first,当然

，大多数库算法都是为了最大效率而实现的，通常在算法开始时一次性完成检查，然后运行未经检查的版本。然而速度可能会严重减慢，尤其是手写循环：

for (iterator_t it = vec.begin();
     it != vec.end();              // Oops
     ++it)
// body

我们知道 Oops 行的品味很差，但这里更糟糕：每次运行循环时，我们都会创建一个新的迭代器销毁它，这意味着为 vec 的迭代器列表分配和取消分配节点...我是否必须强调在紧密循环中分配/取消分配内存的成本？

当然，for_each 不会遇到这样的问题，这是使用 STL 算法而不是手工编码版本的另一个令人信服的案例。

There is a number of operations with iterators which lead to undefined behavior, the goal of this trigger is to activate runtime checks to prevent it from occurring (using asserts).

The issue

The obvious operation is to use an invalid iterator, but this invalidity may arise from various reasons:

Uninitialized iterator
Iterator to an element that has been erased
Iterator to an element which physical location has changed (reallocation for a vector)
Iterator outside of [begin, end)

The standard specifies in excruciating details for each container which operation invalidates which iterator.

There is also a somehow less obvious reason that people tend to forget: mixing iterators to different containers:

std::vector<Animal> cats, dogs;

for_each(cats.begin(), dogs.end(), /**/); // obvious bug

This pertain to a more general issue: the validity of ranges passed to the algorithms.

[cats.begin(), dogs.end()) is invalid (unless one is an alias for the other)
[cats.end(), cats.begin()) is invalid (unless cats is empty ??)

The solution

The solution consists in adding information to the iterators so that their validity and the validity of the ranges they defined can be asserted during execution thus preventing undefined behavior to occur.

The _HAS_ITERATOR_DEBUGGING symbol serves as a trigger to this capability, because it unfortunately slows down the program. It's quite simple in theory: each iterator is made an Observer of the container it's issued from and is thus notified of the modification.

In Dinkumware this is achieved by two additions:

Each iterator carries a pointer to its related container
Each container holds a linked list of the iterators it created

And this neatly solves our problems:

An uninitialized iterator does not have a parent container, most operations (apart from assignment and destruction) will trigger an assertion
An iterator to an erased or moved element has been notified (thanks to the list) and know of its invalidity
On incrementing and decrementing an iterator it can checks it stays within the bounds
Checking that 2 iterators belong to the same container is as simple as comparing their parent pointers
Checking the validity of a range is as simple as checking that we reach the end of the range before we reach the end of the container (linear operation for those containers which are not randomly accessible, thus most of them)

The cost

The cost is heavy, but does correctness have a price? We can break down the cost:

extra memory allocation (the extra list of iterators maintained): O(NbIterators)
notification process on mutating operations: O(NbIterators) (Note that push_back or insert do not necessarily invalidate all iterators, but erase does)
range validity check: O( min(last-first, container.end()-first) )

Most of the library algorithms have of course been implemented for maximum efficiency, typically the check is done once and for all at the beginning of the algorithm, then an unchecked version is run. Yet the speed might severely slow down, especially with hand-written loops:

for (iterator_t it = vec.begin();
     it != vec.end();              // Oops
     ++it)
// body

We know the Oops line is bad taste, but here it's even worse: at each run of the loop, we create a new iterator then destroy it which means allocating and deallocating a node for vec's list of iterators... Do I have to underline the cost of allocating/deallocating memory in a tight loop ?

Of course, a for_each would not encounter such an issue, which is yet another compelling case toward the use of STL algorithms instead of hand-coded versions.

回复收藏 0 原文