Lambda 表达式求差

发布于 2024-09-07 14:25:31 字数 278 浏览 1 评论 0原文

通过以下数据，

string[] data = { "a", "a", "b" };

我非常希望找到重复项并获得此结果：

我尝试了以下代码，

var a = data.Distinct().ToList();
var b = a.Except(a).ToList();

显然这不起作用，我可以看到上面发生了什么，但我不确定如何修复它。

原文

With the following data

string[] data = { "a", "a", "b" };

I'd very much like to find duplicates and get this result:

I tried the following code

var a = data.Distinct().ToList();
var b = a.Except(a).ToList();

obviously this didn't work, I can see what is happening above but I'm not sure how to fix it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忱杏 2024-09-14 14:25:31

当运行时没有问题时，您可以使用

var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();

Good old O(n^n) =)

Edit: Now 以获得更好的解决方案。 =)
如果您定义一个新的扩展方法，例如

static class Extensions
{        

    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        HashSet<T> hash = new HashSet<T>();
        foreach (T item in input)
        {
            if (!hash.Contains(item))
            {
                hash.Add(item);
            }
            else
            {
                yield return item;
            }
        }
    }
}

您可以使用

var duplicates = data.Duplicates().Distinct().ToArray();

When runtime is no problem, you could use

var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();

Good old O(n^n) =)

Edit: Now for a better solution. =)
If you define a new extension method like

static class Extensions
{        

    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        HashSet<T> hash = new HashSet<T>();
        foreach (T item in input)
        {
            if (!hash.Contains(item))
            {
                hash.Add(item);
            }
            else
            {
                yield return item;
            }
        }
    }
}

you can use

var duplicates = data.Duplicates().Distinct().ToArray();

回复收藏 0 原文

林空鹿饮溪 2024-09-14 14:25:31

使用group by stuff，这些方法的性能相当不错。如果您正在处理大型数据集，唯一需要担心的是巨大的内存开销。

from g in (from x in data group x by x)
where g.Count() > 1 
select g.Key;

--或者，如果您更喜欢扩展方法，

data.GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => x.Key)

其中 Count() == 1 是您的不同项，而 Count() > 则为1 这是一个或多个重复项。

由于 LINQ 有点懒，如果您不想重新评估计算，可以这样做：

var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1 
select x.Key;
// distinct
from x in g
where x.Count() == 1 
select x.Key;

创建分组时，将创建一组集合。假设它是一个插入 O(1) 的集合，则按方法分组的运行时间为 O(n)。每个操作产生的成本有点高，但它应该相当于接近线性的性能。

Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.

from g in (from x in data group x by x)
where g.Count() > 1 
select g.Key;

--OR if you prefer extension methods

data.GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => x.Key)

Where Count() == 1 that's your distinct items and where Count() > 1 that's one or more duplicate items.

Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:

var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1 
select x.Key;
// distinct
from x in g
where x.Count() == 1 
select x.Key;

When creating the grouping a set of sets will be created. Assuming that it's a set with O(1) insertion the running time of the group by approach is O(n). The incurred cost for each operation is somewhat high, but it should equate to near linear performance.

回复收藏 0 原文