如何统计字典中唯一值的出现次数?
我有一个字典,其中双精度值作为值,字符串作为键。
我想计算该字典中每个值的出现次数,并且我想知道该值(例如重复的值)。
例如:
key1, 2
key2, 2
key3, 3
key4, 2
key5, 5
key6, 5
我想得到一个列表:
2 - 3 (times)
3 - 1 (once)
5 - 2 (twice)
我该怎么做?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先要注意的是,您实际上并不关心字典的键。因此,第一步是忽略它们,因为它们与手头的任务无关。我们将使用字典的
Values
属性,并且该工作与任何其他整数集合(或者实际上我们可以比较相等性的任何其他类型的任何其他枚举)非常相似)。解决这个问题有两种常见的方法,这两种方法都值得了解。
第一个使用另一个字典来保存值的计数:
希望这非常简单。另一种方法更复杂,但有一些优点:(
我们可能会使用
var
而不是冗长的IEnumerable>
,但它值得解释代码时准确)。直接比较,这个版本较差——理解起来更复杂,效率也更低。然而,学习这种方法可以实现同一技术的一些简洁而有效的变体,因此值得研究。
GroupBy()
接受一个枚举并创建另一个包含键值对的枚举,其中值也是一个枚举。 lambdax =>; x
意味着它的分组依据是它本身,但是我们可以灵活地使用不同的分组规则。 grp 的内容看起来有点像:因此,如果我们循环遍历每个组,我们就会取出
Key
并调用Count()
在小组中,我们得到了我们想要的结果。现在,在第一种情况下,我们在一次 O(n) 遍中建立了计数,而在这里,我们在一次 O(n) 遍中建立了组,然后在第二次 O(n) 遍中获取了计数,使得它的效率要低得多。这也有点难以理解,那为什么还要费劲去提呢?
嗯,首先,一旦我们理解了它,我们就可以将以下几行:
转换为:
这非常简洁,并且变得惯用。如果我们想继续对值计数对做一些更复杂的事情,那就特别好,因为我们可以将其链接到另一个操作中。
将结果放入字典的版本可以更加简洁:
在那里,您的整个问题在短短的一行中得到了回答,而不是第一个版本的 6(删除注释)。
(有些人可能更喜欢用
dict.GroupBy(x => x.Value)
替换dict.Values.GroupBy(x => x)
,这将完全具有我们对其运行Count()
后会得到相同的结果(如果您不能立即确定原因,请尝试解决)。另一个优点是,在其他情况下,我们可以使用
GroupBy
获得更大的灵活性。由于这些原因,习惯使用 GroupBy 的人很可能会从一行简洁的 dict.Values.GroupBy(x => x).ToDictinary(g => g.Key, g => g.Count()); 然后更改为第一个版本的更详细但更有效的形式(我们在新字典中增加运行总数)证明了性能热点。The first thing to note, is that you don't actually care about the keys of the dictionary. Step one therefore is to ignore them as irrelevant to the task in hand. We're going to work with the
Values
property of the dictionary, and the work is much the same as for any other collection of integers (or indeed any other enumerable of any other type we can compare for equality).There are two common approaches to this problem, both of which are well worth knowing.
The first uses another dictionary to hold the count of values:
Hopefully this is pretty straightforward. Another approach is more complicated but has some pluses:
(We'd probably use
var
rather than the verboseIEnumerable<IGrouping<int, int>>
, but it's worth being precise when explaining code).In a straight comparison, this version is inferior - both more complicated to understand and less efficient. However, learning this approach allows for some concise and efficient variants of the same technique, so it's worth examining.
GroupBy()
takes an enumeration and creates another enumeration that contains key-value pairs where the value is an enumeration too. The lambdax => x
means that what it is grouped by is itself, but we've the flexibilty for different grouping rules than that. The contents ofgrp
looks a bit like:So, if we loop through this an for each group we pull out the
Key
and callCount()
on the group, we get the results we want.Now, in the first case we built up our count in a single O(n) pass, while here we build up the group in a O(n) pass, and then obtain the count in a second O(n) pass, making it much less efficient. It's also a bit harder to understand, so why bother mentioning it?
Well, the first is that once we do understand it we can turn the lines:
Into:
Which is quite concise, and becomes idiomatic. It's especially nice if we want to then go on and do something more complicated with the value-count pairs as we can chain this into another operation.
The version that puts the results into a dictionary can be even more concise still:
There, your whole question answered in one short line, rather than the 6 (cutting out comments) for the first version.
(Some might prefer to replace
dict.Values.GroupBy(x => x)
withdict.GroupBy(x => x.Value)
which will have exactly the same results once we run theCount()
on it. If you aren't immediately sure why, try to work it out).The other advantage, is that we have more flexibility with
GroupBy
in other cases. For these reasons, people who are used to usingGroupBy
are quite likely to start off with the one-line concision ofdict.Values.GroupBy(x => x).ToDictinary(g => g.Key, g => g.Count());
and then change to the more verbose but more effient form of the first version (where we increment running totals in the new dictionary) if it proved a performance hotspot.更简单的是:(
是的,它是在 VB.NET 中,但转换为 C# 应该不会有太多麻烦:-))
Even simpler would be:
(Yes it's in VB.NET, but you shouldn't have much trouble to convert to C# :-) )