Lambda 表达式求差

发布于 2024-09-07 14:25:31 字数 278 浏览 1 评论 0原文

通过以下数据,

string[] data = { "a", "a", "b" };

我非常希望找到重复项并获得此结果:

a

我尝试了以下代码,

var a = data.Distinct().ToList();
var b = a.Except(a).ToList();

显然这不起作用,我可以看到上面发生了什么,但我不确定如何修复它。

With the following data

string[] data = { "a", "a", "b" };

I'd very much like to find duplicates and get this result:

a

I tried the following code

var a = data.Distinct().ToList();
var b = a.Except(a).ToList();

obviously this didn't work, I can see what is happening above but I'm not sure how to fix it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

忱杏 2024-09-14 14:25:31

当运行时没有问题时,您可以使用

var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();

Good old O(n^n) =)

Edit: Now 以获得更好的解决方案。 =)
如果您定义一个新的扩展方法,例如

static class Extensions
{        

    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        HashSet<T> hash = new HashSet<T>();
        foreach (T item in input)
        {
            if (!hash.Contains(item))
            {
                hash.Add(item);
            }
            else
            {
                yield return item;
            }
        }
    }
}

您可以使用

var duplicates = data.Duplicates().Distinct().ToArray();

When runtime is no problem, you could use

var duplicates = data.Where(s => data.Count(t => t == s) > 1).Distinct().ToList();

Good old O(n^n) =)

Edit: Now for a better solution. =)
If you define a new extension method like

static class Extensions
{        

    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        HashSet<T> hash = new HashSet<T>();
        foreach (T item in input)
        {
            if (!hash.Contains(item))
            {
                hash.Add(item);
            }
            else
            {
                yield return item;
            }
        }
    }
}

you can use

var duplicates = data.Duplicates().Distinct().ToArray();
林空鹿饮溪 2024-09-14 14:25:31

使用group by stuff,这些方法的性能相当不错。如果您正在处理大型数据集,唯一需要担心的是巨大的内存开销。

from g in (from x in data group x by x)
where g.Count() > 1 
select g.Key;

--或者,如果您更喜欢扩展方法,

data.GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => x.Key)

其中 Count() == 1 是您的不同项,而 Count() > 则为1 这是一个或多个重复项。

由于 LINQ 有点懒,如果您不想重新评估计算,可以这样做:

var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1 
select x.Key;
// distinct
from x in g
where x.Count() == 1 
select x.Key;

创建分组时,将创建一组集合。假设它是一个插入 O(1) 的集合,则按方法分组的运行时间为 O(n)。每个操作产生的成本有点高,但它应该相当于接近线性的性能。

Use the group by stuff, the performance of these methods are reasonably good. Only concern is big memory overhead if you are working with large data sets.

from g in (from x in data group x by x)
where g.Count() > 1 
select g.Key;

--OR if you prefer extension methods

data.GroupBy(x => x)
    .Where(x => x.Count() > 1)
    .Select(x => x.Key)

Where Count() == 1 that's your distinct items and where Count() > 1 that's one or more duplicate items.

Since LINQ is kind of lazy, if you don't want to reevaluate your computation you can do this:

var g = (from x in data group x by x).ToList(); // grouping result
// duplicates
from x in g
where x.Count() > 1 
select x.Key;
// distinct
from x in g
where x.Count() == 1 
select x.Key;

When creating the grouping a set of sets will be created. Assuming that it's a set with O(1) insertion the running time of the group by approach is O(n). The incurred cost for each operation is somewhat high, but it should equate to near linear performance.

胡大本事 2024-09-14 14:25:31

对数据进行排序、迭代并记住最后一项。当当前项目与上一个项目相同时,它是重复的。这可以通过迭代或使用 lambda 表达式在 O(n*log(n)) 时间内轻松实现。

Sort the data, iterate through it and remember the last item. When the current item is the same as the last, its a duplicate. This can be easily implemented either iteratively or using a lambda expression in O(n*log(n)) time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文