已知分布的最佳列表容量

发布于 2024-12-10 09:32:46 字数 130 浏览 1 评论 0原文

如果最终大小的一般分布已知,是否有在构造函数中定义 C# 列表容量的最佳算法?

举一个具体的例子,如果要放置在每个列表中的值的数量的平均值为 500,标准差为 50,并且近似正态分布,那么就内存消耗而言,列表的最佳初始容量是多少?

Is there a best algorithm for defining the capacity of a C# list in the constructor, if the general distribution of eventual sizes is known?

As a concrete example, if the numbers of values to be placed in each list has a mean of 500, and a standard deviation of 50, with approximately a normal distribution what is the best initial capacity for the list in terms of memory consumption?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

荭秂 2024-12-17 09:32:46

留下清单来决定。我不会费心去设置它(只需使用一个空的构造函数),除非您遇到具体的性能问题,此时您可能可以先修复其他问题。

过早的优化是万恶之源。

Leave the list to decide. I wouldn't bother setting it (just use an empty constructor) unless you experience concrete performance problems, at which point there are probably other things you can fix first.

Premaure optimisation is the root of all evil.

星軌x 2024-12-17 09:32:46

这是个人观点,而不是基于研究,但请记住,列表本身仅保存每个对象的引用,因此最好在为一些对象分配空间时犯一点错误许多参考文献,而不是意外地使您需要的参考文献数量增加一倍。考虑到这一点,额外整整两个甚至三个标准差(600 或 650)可能并不算出格。但是,这又是我的观点,而不是研究结果。

This is personal opinion, rather than research-based, but remember that a List itself only holds the reference to each object, and therefore it's probably better to err a little on the side allocating space for a few too many references, rather than accidentally doubling the amount of references that you need. With that in mind, a full two or even three standard deviations extra (600 or 650) is probably not out of line. But, again, that's my opinion rather than a researched result.

三生一梦 2024-12-17 09:32:46

如果您采用三西格玛规则,http://en.wikipedia.org/wiki /68-95-99.7_rule 规定,如果考虑 3 个标准差,则单个样本 99.7% 的时间都将在该范围内。

If you go with the three sigma rule, http://en.wikipedia.org/wiki/68-95-99.7_rule states if you account for 3 standard deviations, a single sample will be within that range 99.7% of the time.

遥远的绿洲 2024-12-17 09:32:46

我做了一些研究,似乎这个问题有一个“正确”的答案。

首先,我同意这可能是过早的优化,因此在决定切换之前进行分析至关重要。

图表显示各种标准差的容量浪费的内存。

上面的图表是在 Excel 中生成的,使用正态分布并进行测试各种初始列表容量过度使用的空间,使用 10,000 个样本,平均值为 10,000。正如您所看到的,它有几个有趣的功能。

  1. 对于低标准偏差,选择不好的初始容量可能会浪费最佳选择空间的八倍。
  2. 对于相对于平均值的高标准差,可能节省的资金较少。
  3. 对应于最低内存浪费的波谷出现在取决于标准差的点处。
  4. 最好从图表的右半部分选择一个值,以避免列表重新分配。
  5. 我找不到最小浪费的确切公式,但根据此分析,平均值 + 1.75 x 标准差似乎是最佳选择。

警告:YMMV 与其他发行版、手段等。

I've done a little research and it seems that there is a "right" answer to this question.

First of all I agree that this can be premature optimisation, so profiling before deciding to switch is essential.

Graph showing memory wasted by capacity for various standard deviations.

The graph above was generated in excel, using a normal distribution, and testing the space overused by various initial list capacities, using 10,000 samples and a mean of 10,000. As you can see it has several interesting features.

  1. For low standard deviations, picking a bad initial capacity can waste up to eight times the space of the best choice.
  2. For high standard deviations relative to the mean, less savings are possible.
  3. Troughs, corresponding to the lowest memory wastage, occur at points dependant on the standard deviation.
  4. It is better to choose a value from the right half of the graph to avoid list reallocations.
  5. I couldn't find an exact formula for the minimum wastage, but mean + 1.75 x standard deviation seems to be the best choice based on this analysis.

Caveat: YMMV with other distributions, means etc.

波浪屿的海角声 2024-12-17 09:32:46

没有正确答案。这将是内存使用和 CPU 之间的权衡。初始化列表越大,可能浪费的内存就越多,但可以节省 CPU,因为以后不必再次调整其大小。

There's no right answer. It's going to be a tradeoff between memory usage and CPU. The larger you initialize the list, the more memory you're probably wasting but your saving CPU since it doesn't have to be resized again later.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文