更改引用它们的动态数组是一种不好的做法吗？

发布于 2024-09-13 01:26:48 字数 2592 浏览 6 评论 0原文

我稍微研究了一下 D2 中的动态数组，发现它们非常难以理解。我似乎也错误地解释了规范。在更改数组时，处理动态数组的引用或切片似乎很容易出错......或者我只是不理解基本原理？

引用相同的数组仅共享实际的项目：

auto a = [1];
auto b = a;
assert(&a != &b); // different instance; Doesn't share length
assert(a.ptr == b.ptr); // same items
assert(a == [1]);
assert(a == b);

由于它们引用相同的数组，更改一个数组会更改另一个：

auto a = [1,2];
auto b = a;
a[1] = 20;
assert(a == [1,20]);
assert(a == b);

来自数组的规范

为了最大限度地提高效率，运行时总是尝试调整大小数组到位以避免额外的复制。它总是会做一个副本如果新的大小更大并且数组不是通过新的操作员或以前的操作员调整大小操作。

因此，更改长度不一定会破坏引用：

auto a = [1];
auto b = a;
b.length = 2;
assert(b == [1,0]);
assert(a == [1]); // a unchanged even if it refers to the same instance
assert(a.ptr == b.ptr);  // but still the same instance

// So updates to one works on the other
a[0]  = 10;
assert(a == [10]);
assert(b == [10,0]);

来自数组的规范

串联始终会创建其操作数的副本，即使其中一个操作数是 0 长度数组

auto a = [1];
auto b = a;
b ~= 2; // Should make a copy, right..?
assert(a == [1]);
assert(b == [1,2]);
assert(a != b);
assert(a4.ptr == b.ptr); // But it's still the same instance
a[0] = 10;
assert(b == [10,2]); // So changes to a changes b

但是当数组相互叠加时，值会被复制到新位置，并且引用会被破坏：

auto a = [1];
auto b = a;
b ~= 2;
assert(a == [1]);
assert(b == [1,2]);

a.length = 2; // Copies values to new memory location to not overwrite b's changes
assert(a.ptr != b.ptr);

在创建之前更改两个数组的长度更改给出与上面相同的结果（考虑到上面的情况，我希望得到这样的结果）：

auto a = [1];
auto b = a;
a.length = 2;
b.length = 2;
a[1] = 2;
assert(a == [1,2]);
assert(b == [1,0]);
assert(a.ptr != b.ptr);

并且在更改长度或串联时也相同（考虑到上面的情况，我希望得到这样的结果）：

auto a = [1];
auto b = a;
b.length = 2;
a ~= 2;
assert(a == [1,2]);
assert(b == [1,0]);
assert(a.ptr != b.ptr);

但是随后切片也出现在图片中，突然间它甚至更多复杂的！这些切片可能是孤立的...

auto a = [1,2,3];
auto b = a;
auto slice = a[1..$]; // [2,3];
slice[0] = 20;
assert(a == [1,20,3]);
assert(a == b);

a.length = 4;
assert(a == [1,20,3,0]);
slice[0] = 200;
assert(b == [1,200,3]); // the reference to b is still valid.
assert(a == [1, 20, 3, 0]); // but the reference to a is now invalid..

b ~= 4;
// Now both references is invalid and the slice is orphan...
// What does the slice modify?
assert(a.ptr != b.ptr);
slice[0] = 2000;
assert(slice == [2000,3]);
assert(a == [1,20,3,0]); 
assert(b == [1,200,3,4]);

那么...对同一个动态数组进行多个引用是不好的做法吗？并传递切片等？或者我只是离开这里，错过了 D 中动态数组的全部要点？

原文

I looked a bit at dynamic arrays in D2, and I found them very difficult to understand. It also seems I'm interpreting the spec wrongly..
Working on a reference or slice of a dynamic array seems very error prone when changing the arrays... Or am I just not understanding the fundamentals?

Referring to the same array only shares the actual items:

auto a = [1];
auto b = a;
assert(&a != &b); // different instance; Doesn't share length
assert(a.ptr == b.ptr); // same items
assert(a == [1]);
assert(a == b);

As they reference the same array, changing one changes the other:

auto a = [1,2];
auto b = a;
a[1] = 20;
assert(a == [1,20]);
assert(a == b);

From the spec on array

To maximize efficiency, the runtime always tries to resize the
array in place to avoid extra copying. It will always do a copy
if the new size is larger and the array was not allocated via the
new operator or a previous
resize operation.

So changing the length doesn't neccesarily break the reference:

auto a = [1];
auto b = a;
b.length = 2;
assert(b == [1,0]);
assert(a == [1]); // a unchanged even if it refers to the same instance
assert(a.ptr == b.ptr);  // but still the same instance

// So updates to one works on the other
a[0]  = 10;
assert(a == [10]);
assert(b == [10,0]);

From the spec on array

Concatenation always creates a copy of its operands, even if one of the operands is a 0 length array

auto a = [1];
auto b = a;
b ~= 2; // Should make a copy, right..?
assert(a == [1]);
assert(b == [1,2]);
assert(a != b);
assert(a4.ptr == b.ptr); // But it's still the same instance
a[0] = 10;
assert(b == [10,2]); // So changes to a changes b

But when the arrays would step on each other, the values are copied to a new location and the reference broken:

auto a = [1];
auto b = a;
b ~= 2;
assert(a == [1]);
assert(b == [1,2]);

a.length = 2; // Copies values to new memory location to not overwrite b's changes
assert(a.ptr != b.ptr);

Changing length of both arrays before making a change gives the same result as above (I would expect this given the above):

auto a = [1];
auto b = a;
a.length = 2;
b.length = 2;
a[1] = 2;
assert(a == [1,2]);
assert(b == [1,0]);
assert(a.ptr != b.ptr);

And the same when changing length or cancatenating (I would expect this given the above):

auto a = [1];
auto b = a;
b.length = 2;
a ~= 2;
assert(a == [1,2]);
assert(b == [1,0]);
assert(a.ptr != b.ptr);

But then slices also come into the picture, and suddenly it's even more complicated! The slices might be orphaned...

auto a = [1,2,3];
auto b = a;
auto slice = a[1..$]; // [2,3];
slice[0] = 20;
assert(a == [1,20,3]);
assert(a == b);

a.length = 4;
assert(a == [1,20,3,0]);
slice[0] = 200;
assert(b == [1,200,3]); // the reference to b is still valid.
assert(a == [1, 20, 3, 0]); // but the reference to a is now invalid..

b ~= 4;
// Now both references is invalid and the slice is orphan...
// What does the slice modify?
assert(a.ptr != b.ptr);
slice[0] = 2000;
assert(slice == [2000,3]);
assert(a == [1,20,3,0]); 
assert(b == [1,200,3,4]);

So... Is it bad practice to have multiple references to the same dynamic array? And passing slices around etc.? Or am I just way out here, missing the entire point of dynamic arrays in D?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

再可℃爱ぅ一点好了 2024-09-20 01:26:48

总的来说，您似乎对事情理解得相当好，但您似乎误解了 ptr 属性的用途。它不表明两个数组是否引用同一个实例。它的作用是让您找到实际上是下面的 C 数组的指针。 D 中的数组有其 length 作为其一部分，因此它更像是一个具有长度和指向 C 数组的指针的结构，而不是像 C 数组。 ptr 允许您获取 C 数组并将其传递给 C 或 C++ 代码。您可能不应该将它用于纯 D 代码中的任何内容。如果您想测试两个数组变量是否引用同一个实例，则可以使用 is 运算符（或 !is 检查它们是否是不同的实例）

assert(a is b);   //checks that they're the same instance
assert(a !is b);  //checks that they're *not* the same instance

：两个数组的 ptr 相等表明它们的第一个元素位于内存中的同一位置。特别是，它们的长度可能不同。但是，这确实意味着如果您在其中一个数组中更改任何重叠元素，则两个数组中的任何重叠元素都会发生更改。

当更改数组的长度时，D 会尝试避免重新分配，但它可以决定重新分配，因此您不一定依赖它是否会重新分配。例如，如果不这样做会占用另一个数组的内存（包括那些具有相同 ptr 值的内存），它将重新分配。如果没有足够的内存来调整自身大小，它也可能会重新分配。基本上，如果不这样做就会占用另一个数组的内存，它就会重新分配，否则它可能会也可能不会重新分配。因此，在设置数组的长度时依赖数组是否会重新分配通常不是一个好主意。

我希望附加总是按照文档进行复制，但是根据您的测试，它的行为确实就像 length 一样（我不知道这是否意味着文档需要更新或是否是一个错误 - 我的猜测是文档需要更新）。在任何一种情况下，您当然不能依赖对该数组的其他引用来在附加后仍然引用同一数组。

至于切片，它们的工作方式与预期一致，并且在 D 中被广泛使用 - 特别是在标准库 Phobos 中。切片是数组的一个范围，范围是 Phobos 的核心概念。然而，就像许多其他范围一样，更改范围/切片所属的容器可能会使该范围/切片无效。这就是为什么当您在 Phobos 中使用可以调整容器大小的函数时，您需要使用以 stable 开头的函数（例如 stableRemove() 或 stableInsert()），如果您不想冒使该容器的范围无效的风险。

此外，切片也是一个数组，就像它指向的数组一样。因此，自然地，更改其长度或附加到它将遵循与更改任何其他数组的长度或附加到任何其他数组的规则相同的规则，并且它因此可以重新分配并且不再是另一个数组的切片。

基本上，您只需要注意以任何方式更改数组的长度都可能导致重新分配，因此如果您希望引用继续引用相同的数组，则需要避免这样做数组实例。如果您绝对需要确保它们不指向相同的引用，那么您需要使用dup来获取数组的新副本。如果您根本不弄乱数组的长度，那么数组引用（无论是切片还是对整个数组的引用）将继续愉快地引用同一个数组。

编辑：事实证明文档需要更新。任何可以调整数组大小的操作都会尝试在可以的情况下就地进行（因此它可能不会重新分配），但如果必须的话，则会重新分配，以避免占用另一个数组的内存，或者如果它没有足够的空间重新分配到位。因此，通过设置 length 属性来调整数组大小和通过附加到数组来调整数组大小之间不应该有任何区别。

附录：任何使用 D 的人都应该阅读这篇文章数组和切片。它很好地解释了它们，并且应该让您更好地了解数组在 D 中的工作原理。

On the whole, you seem to understand things fairly well, but you appear to be misunderstanding the purpose of the ptr property. It does not indicate whether two arrays refer to the same instance. What it does is get you at the pointer to what is effectively the C array underneath. An array in D has its length as part of it, so it's more like it's a struct with a length and pointer to a C array than it is like a C array. ptr allows you to get at the C array and pass it to C or C++ code. You probably shouldn't be using it for anything in pure D code. If you want to test whether two array variables refer to the same instance, then you use the is operator (or !is to check that they're different instances):

assert(a is b);   //checks that they're the same instance
assert(a !is b);  //checks that they're *not* the same instance

All that ptr being equal for two arrays would indicate is that their first element is in the same place in memory. In particular, their lengths could differ. However, it does mean that any overlapping elements will get altered in both arrays if you alter them in one of them.

When changing the length of an array, D tries to avoid reallocating, but it could decide to reallocate, so you can't necessarily rely on whether it would reallocate or not. For instance, it's going to reallocate if not doing so will stomp on another array's memory (including those that have the same value for ptr). It could also reallocate if there isn't enough memory to resize itself in place. Basically, it will reallocate if not doing so will stomp on another array's memory, and it may or may not reallocate otherwise. So, it's generally not a good idea to rely on whether an array will reallocate or not when you set its length.

I would have expected appending to always copy per the docs, but per your tests, it does appear to act just like length does (I don't know whether that means that the docs need to be updated or whether it's a bug - my guess would be that the docs need to be updated). In either case, you certainly can't rely on other references to that array to still refer to the same array after appending.

As for slices, they work just as expected and are highly used in D - especially in the standard library, Phobos. A slice is a range for an array and ranges are a core concept in Phobos. However, just like many other ranges, altering the container that the range/slice is for could invalidate that range/slice. That's why when you're using functions which could resize containers in Phobos, you need to use the functions prepended with stable (e.g. stableRemove() or stableInsert()) if you don't want to risk invalidating the ranges that you have to that container.

Also, a slice is an array just like the array that it points to. So, naturally, altering its length or appending to it is going to follow all of the same rules as those for altering the length of or appending to any other array, and it could therefore be reallocated and no longer be a slice into another array.

Pretty much, you just need to be aware that altering the length of an array in any way could result in a reallocation, so you need to avoid doing that if you want references to continue to refer to the same array instance. And if you absolutely need to make sure that they do not point to the same reference, then you need to use dup to get a new copy of the array. If you don't mess with the length of an array at all, then array references (be they slices or references to the whole array) will continue to happily refer to the same array.

EDIT: It turns out that the docs need to be updated. Anything that could resize the array will try to do it in place if it can (so it might not reallocate) but will reallocate if it has to in order to avoid stomping on the memory of another array or if it doesn't have enough space to reallocate in place. So, there shouldn't be any distinction between resizing the array by setting its length property and resizing it by appending to it.

ADDENDUM: Anyone using D really should read this article on arrays and slices. It explains them quite well, and should give you a much better idea of how arrays work in D.

回复收藏 0 原文

亣腦蒛氧 2024-09-20 01:26:48

我真的不想把它变成一个完整的答案，但我还不能对之前的答案发表评论。

我认为连接和附加是两种略有不同的操作。如果将 ~ 与数组和元素一起使用，则它是附加的；对于两个数组，它是串联的。

您可以尝试这样做：

a = a ~ 2;

看看是否得到相同的结果。

另外，如果您想定义行为，只需使用 .dup （或 .idup 对于不可变）属性。如果您有一系列参考文献，这也非常有用；您可以修改主数组和 .dup 切片来进行计算，而无需担心竞争条件。

编辑：好吧，我有点错了，但无论如何都是这样。连接！=附加。

//最大限度

I didn't really want to make this into a full-blown answer, but I can't yet comment on the previous answer.

I think that concatenation and appending are two slightly different operations. If you use ~ with an array and an element, it's appending; with two arrays, it's concatenation.

You could try this instead:

a = a ~ 2;

And see if you get the same results.

Also, if you want to have defined behaviour, just use the .dup (or .idup for immutables) properties. This is also very useful if you have an array of references; you can modify the main array and .dup slices to work on out without worrying about race conditions.

EDIT: ok, I got it a bit wrong, but there it is anyway. Concatenation != appending.

//Max

回复收藏 0 原文

~没有更多了~