已知分布的最佳列表容量
如果最终大小的一般分布已知,是否有在构造函数中定义 C# 列表容量的最佳算法?
举一个具体的例子,如果要放置在每个列表中的值的数量的平均值为 500,标准差为 50,并且近似正态分布,那么就内存消耗而言,列表的最佳初始容量是多少?
Is there a best algorithm for defining the capacity of a C# list in the constructor, if the general distribution of eventual sizes is known?
As a concrete example, if the numbers of values to be placed in each list has a mean of 500, and a standard deviation of 50, with approximately a normal distribution what is the best initial capacity for the list in terms of memory consumption?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
留下清单来决定。我不会费心去设置它(只需使用一个空的构造函数),除非您遇到具体的性能问题,此时您可能可以先修复其他问题。
过早的优化是万恶之源。
Leave the list to decide. I wouldn't bother setting it (just use an empty constructor) unless you experience concrete performance problems, at which point there are probably other things you can fix first.
Premaure optimisation is the root of all evil.
这是个人观点,而不是基于研究,但请记住,列表本身仅保存每个对象的引用,因此最好在为一些对象分配空间时犯一点错误许多参考文献,而不是意外地使您需要的参考文献数量增加一倍。考虑到这一点,额外整整两个甚至三个标准差(600 或 650)可能并不算出格。但是,这又是我的观点,而不是研究结果。
This is personal opinion, rather than research-based, but remember that a List itself only holds the reference to each object, and therefore it's probably better to err a little on the side allocating space for a few too many references, rather than accidentally doubling the amount of references that you need. With that in mind, a full two or even three standard deviations extra (600 or 650) is probably not out of line. But, again, that's my opinion rather than a researched result.
如果您采用三西格玛规则,http://en.wikipedia.org/wiki /68-95-99.7_rule 规定,如果考虑 3 个标准差,则单个样本 99.7% 的时间都将在该范围内。
If you go with the three sigma rule, http://en.wikipedia.org/wiki/68-95-99.7_rule states if you account for 3 standard deviations, a single sample will be within that range 99.7% of the time.
我做了一些研究,似乎这个问题有一个“正确”的答案。
首先,我同意这可能是过早的优化,因此在决定切换之前进行分析至关重要。
上面的图表是在 Excel 中生成的,使用正态分布并进行测试各种初始列表容量过度使用的空间,使用 10,000 个样本,平均值为 10,000。正如您所看到的,它有几个有趣的功能。
警告:YMMV 与其他发行版、手段等。
I've done a little research and it seems that there is a "right" answer to this question.
First of all I agree that this can be premature optimisation, so profiling before deciding to switch is essential.
The graph above was generated in excel, using a normal distribution, and testing the space overused by various initial list capacities, using 10,000 samples and a mean of 10,000. As you can see it has several interesting features.
Caveat: YMMV with other distributions, means etc.
没有正确答案。这将是内存使用和 CPU 之间的权衡。初始化列表越大,可能浪费的内存就越多,但可以节省 CPU,因为以后不必再次调整其大小。
There's no right answer. It's going to be a tradeoff between memory usage and CPU. The larger you initialize the list, the more memory you're probably wasting but your saving CPU since it doesn't have to be resized again later.