List.AddRange 实现不理想
对我的 C# 应用程序进行分析表明,List
花费了大量时间。使用 Reflector 查看此方法中的代码表明它调用了 List
,其实现如下:
public void InsertRange(int index, IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
if (index > this._size)
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index);
}
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
if (count > 0)
{
this.EnsureCapacity(this._size + count);
if (index < this._size)
{
Array.Copy(this._items, index, this._items, index + count, this._size - index);
}
if (this == is2)
{
Array.Copy(this._items, 0, this._items, index, index);
Array.Copy(this._items, (int) (index + count), this._items, (int) (index * 2), (int) (this._size - index));
}
else
{
T[] array = new T[count]; // (*)
is2.CopyTo(array, 0); // (*)
array.CopyTo(this._items, index); // (*)
}
this._size += count;
}
}
else
{
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Insert(index++, enumerator.Current);
}
}
}
this._version++;
}
private T[] _items;
可以说接口的简单性(只有一个 InsertRange 重载)证明运行时类型检查和转换的性能开销是合理的。 但是我用 (*)
指示的 3 行背后的原因可能是什么? 我认为它可以重写为更快的替代方案:
is2.CopyTo(this._items, index);
您认为有什么理由不使用这种更简单且明显更快的替代方案吗?
编辑:
感谢您的回答。因此,共识认为这是针对以有缺陷/恶意方式实现 CopyTo 的输入集合的一种保护措施。对我来说,不断付出 1) 运行时类型检查 2) 临时数组的动态分配 3) 双倍复制操作的代价似乎有点过分了,而所有这些都可以通过定义 2 个或多个 InsertRange 重载来保存,一个像现在一样获取 IEnumerable
,第二个获取 List
,第三个获取 T[]
。后两者的运行速度可以是当前情况的两倍。
编辑 2:
我确实实现了一个类 FastList,与 List 相同,但它还提供了接受 T[] 参数的 AddRange 的重载。此重载不需要动态类型验证和元素的双重复制。我确实通过将 4 字节数组添加到最初为空的列表 1000 次来根据 List.AddRange 来分析此 FastList.AddRange。我的实现比标准 List.AddRange 的速度快了 9 倍(九!)。在我们应用程序的重要使用场景之一中,List.AddRange 占用大约 5% 的运行时间,用提供更快 AddRange 的类替换 List 可以将应用程序运行时间提高 4%。
Profiling my C# application indicated that significant time is spent in List<T>.AddRange
. Using Reflector to look at the code in this method indicated that it calls List<T>.InsertRange
which is implemented as such:
public void InsertRange(int index, IEnumerable<T> collection)
{
if (collection == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
}
if (index > this._size)
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index, ExceptionResource.ArgumentOutOfRange_Index);
}
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
if (count > 0)
{
this.EnsureCapacity(this._size + count);
if (index < this._size)
{
Array.Copy(this._items, index, this._items, index + count, this._size - index);
}
if (this == is2)
{
Array.Copy(this._items, 0, this._items, index, index);
Array.Copy(this._items, (int) (index + count), this._items, (int) (index * 2), (int) (this._size - index));
}
else
{
T[] array = new T[count]; // (*)
is2.CopyTo(array, 0); // (*)
array.CopyTo(this._items, index); // (*)
}
this._size += count;
}
}
else
{
using (IEnumerator<T> enumerator = collection.GetEnumerator())
{
while (enumerator.MoveNext())
{
this.Insert(index++, enumerator.Current);
}
}
}
this._version++;
}
private T[] _items;
One can argue that the simplicity of the interface (only having one overload of InsertRange) justifies the performance overhead of runtime type cheching and casting.
But what could be the reason behind the 3 lines I have indicated with (*)
?
I think it could be rewritten to the faster alternative:
is2.CopyTo(this._items, index);
Do you see any reason for not using this simpler and apparently faster alternative?
Edit:
Thanks for the answers. So consensus opinion is that this is a protective measure against the input collection implementing the CopyTo in a defective/malicious manner. To me it seems like a overkill to constantly pay the price of 1) runtime type checking 2) dynamic allocation of the temporary array 3) double the copy operation, when all this could have been saved by defining 2 or a few more overloads of InsertRange, one getting IEnumerable
as now, the second getting a List<T>
, third getting T[]
. The later two could have been implemented to run around twice as fast as in the current case.
Edit 2:
I did implement a class FastList, identical to List, except that it also provides an overload of AddRange which takes a T[] argument. This overload does not need the dynamic type verification, and double-copying of elements. I did profile this FastList.AddRange against List.AddRange by adding 4-byte arrays 1000 times to a list which was initially emtpy. My implementation beats the speed of standard List.AddRange with a factor of 9 (nine!). List.AddRange takes about 5% of runtime in one of the important usage scenarios of our application, replacing List with a class providing a faster AddRange could improve application runtime by 4%.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
它们阻止
ICollection
的实现访问插入边界之外的目标列表的索引。如果调用CopyTo
的错误(或“操纵”)实现,则上述实现会导致IndexOutOfBoundsException
。请记住,
T[].CopyTo
实际上在内部实现为memcpy
,因此添加该行的性能开销很小。当您以如此低的成本为大量呼叫添加安全性时,您不妨这样做。编辑:我觉得奇怪的部分是,对
ICollection.CopyTo
的调用(复制到临时数组)不会在调用<之后立即发生。代码>确保容量。如果它被移动到该位置,则遵循任何同步异常 列表将保持不变。按原样,仅当插入发生在列表末尾时,该条件才成立。这里的推理是:ICollection.CopyTo
的外部调用以及调整列表大小和分配临时数组所需的分配。如果所有这三个发生在移动元素进行插入之前,则更改列表的事务不能引发同步异常。编辑2(对OP编辑的回应):你对此进行过分析吗?您大胆声称 Microsoft 应该选择更复杂的 API,因此您应该确保当前方法速度慢的断言是正确的。我从来没有遇到过
InsertRange
的性能问题,而且我非常确定,与重新实现动态列表相比,重新设计算法可以更好地解决人们遇到的任何性能问题。为了避免您认为我以消极的方式严厉,请记住以下几点:不想无法忍受我的开发团队中喜欢重新发明方轮。They are preventing the implementation of
ICollection<T>
from accessing indices of the destination list outside the bounds of insertion. The implementation above results in anIndexOutOfBoundsException
if a faulty (or "manipulative") implementation ofCopyTo
is called.Keep in mind that
T[].CopyTo
is quite literally internally implemented asmemcpy
, so the performance overhead of adding that line is minute. When you have such a low cost of adding safety to a tremendous number of calls, you might as well do so.Edit: The part I find strange is the fact that the call to
ICollection<T>.CopyTo
(copying to the temporary array) does not occur immediately following the call toEnsureCapacity
. If it were moved to that location, then following any synchronous exception the list would remain unchanged. As-is, that condition only holds if the insertion happens at the end of the list. The reasoning here is:Array.Copy
cannot fail becauseICollection.CopyTo
and the allocations required for resizing the list and allocating the temporary array. If all three of these occur before moving elements for the insertion, the transaction to change the list cannot throw a synchronous exception.Edit 2 (response to the OP's edit): Have you profiled this? You are making some bold claims that Microsoft should have chosen a more complicated API, so you should make sure you're correct in the assertions that the current method is slow. I've never had a problem with the performance of
InsertRange
, and I'm quite sure that any performance problems someone does face with it will be better resolved with an algorithm redesign than by reimplementing the dynamic list. Just so you don't take me as being harsh in a negative way, keep the following in mind:don't wantcan't stand people on my dev team that like to reinvent the square wheel.这是个好问题,我正在努力找出理由。参考来源中没有任何提示。一种可能性是,当实现 ICollection<>.CopyTo() 方法对象的类反对复制到非 0 的起始索引时,他们会尝试避免出现问题。或者作为一种安全措施,防止集合弄乱数组它不应该访问的元素。
另一个原因是,这是以线程不安全方式使用集合时的一种对策。如果某个项目被另一个线程添加到集合中,则失败的是集合类的 CopyTo() 方法,而不是 Microsoft 代码。合适的人会接到服务电话。
这些都不是很好的解释。
It's a good question, I'm struggling to come up with a reason. There's no hint in the Reference Source. One possibility is that they try to avoid a problem when the class that implements the ICollection<>.CopyTo() method objects against copying to a start index other than 0. Or as a security measure, preventing the collection from messing with the array elements it should not have access to.
Another one is that this is a counter-measure when the collection is used in thread-unsafe manner. If an item got added to the collection by another thread it will be the collection class' CopyTo() method that fails, not the Microsoft code. The right person will get the service call.
These are not great explanations.
如果您想一想,您的解决方案就会出现问题,如果您以这种方式更改代码,那么您实际上是在向应该添加对内部数据结构的访问权限的集合授予权限。
这不是一个好主意,例如,如果 List 数据结构的作者找到了比数组更好的底层结构来存储数据,则无法更改 List 的实现,因为所有集合都期望将数组放入 CopyTo 函数中。
本质上,您将巩固 List 类的实现,尽管面向对象编程告诉我们数据结构的内部实现应该是可以在不破坏其他代码的情况下进行更改的东西。
There is a problem with your solution if you think about it for a minute, if you change the code in that way you are essentially giving the collection that should be added access to an internal datastructure.
This is not a good idea, for example if the author of the List datastructure figures out a better underlying structure to store the data than an array there is no way to change the implementation of List since all collection are expecting an array into the CopyTo function.
In essence you would be cementing the implementation of the List class, even though object oriented programming tells us that the internal implementation of a datastructure should be something that can be changed without breaking other code.