使用.Net/C# 计算集合的频率分布
是否有一种快速/简单的方法来使用 Linq 或其他方式计算 .Net 集合的频率分布?
例如:任意长的 List 包含许多重复项。遍历列表并计算/跟踪重复次数的巧妙方法是什么?
Is there a fast/simple way to calculate the frequency distribution of a .Net collection using Linq or otherwise?
For example: An arbitrarily long List contains many repetitions. What's a clever way of walking the list and counting/tracking repetitions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
查找列表中重复项的最简单方法是将其分组,如下所示:(
编写
Skip(1).Any()
应该比 (Count() > 1) 更快,因为它赢了不必遍历每个组中的两个以上的项目,但是,除非list
的枚举器很慢,否则差异可能可以忽略不计。The simplest way to find duplicate items in a list is to group it, like this:
(Writing
Skip(1).Any()
should be faster than (Count() > 1) because it won't have to traverse more than two items from each group. However, the difference is probably negligible unlesslist
's enumerator is slow)最简单的方法是使用哈希图,或者使用值作为键并增加值,或者选择一个存储桶大小(存储桶 1 = 1 - 10,存储桶 2 = 11 - 20 等),然后将每个存储桶增加价值。
然后您可以检查并确定频率。
The easiest way is to use a hashmap and either use the value as the key and increment the value, or pick a bucket size (bucket 1 = 1 - 10, bucket 2 = 11 - 20, etc), and increment each bucket by the value.
Then you can go through and determine the frequencies.
C5 通用集合库 有一个
HashBag
实现,它接受重复项计数。以下伪代码将为您提供所需内容:(其中
K
是列表中项目的类型)mults
将包含一个IDictionary
其中列表项是键,重数是值。The C5 generic collections library has a
HashBag
implementation that accepts duplicates by counting. The following pseudo-code would get you what you're looking for:(where
K
is the type of the items in your list)mults
will then contain anIDictionary<K,int>
where the list item is the key and the multiplicity is the value.