如何在两组数据中找到部分不同的项目子集？

发布于 2024-07-14 14:42:55 字数 633 浏览 7 评论 0原文

我正在尝试获取 dataA 中位于 dataB 中且具有不同属性 c 值的项目子集。属性 a 和 b 可以用作索引，因此我尝试仅过滤掉有用的对，然后检查它们是否具有不同的 c 值。

这是我想出的 linq 表达式，它确实有效，但似乎必须有更好/更快的方法来找到这个子集。

var itemsInBoth = from item in dataA
                  from item2 in dataB
                  where item.a == item2.a && item.b == item2.b
                      select new
                      {
                          first= item,
                          second = item2
                      };
var haveDifferentC = from item in itemsInBoth 
                     where item.first.c != item.second.c
                     select item.first;

原文

I am trying to get the subset of items in dataA that are in dataB, and have different values of property c. The properties a and b can be used as an index, so I have tried to filter out only the useful pairs then check to see if they have a different c value.

This is the linq expression I came up with, and it does work, but It seems like there has to be a better/faster way of finding this subset.

var itemsInBoth = from item in dataA
                  from item2 in dataB
                  where item.a == item2.a && item.b == item2.b
                      select new
                      {
                          first= item,
                          second = item2
                      };
var haveDifferentC = from item in itemsInBoth 
                     where item.first.c != item.second.c
                     select item.first;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忘东忘西忘不掉你 2024-07-21 14:42:55

根据 David B 提供的答案，我最终选择了他的方法的稍微修改版本。尽管差异很小，但我想我会分享这个，主要是为了向那些喜欢表达语法的人（像我一样）展示一个版本。

另外，我决定使用匿名键/值对来简化结构，而不是分组。

var dictA = (from item in dataA
             select new
             {
                 key = CreateIndexValue(item.a, item.b),
                 value = item
             }).ToDictionary(kv => kv.key, kv => kv.value);
var dictB = (from item in dataB
             select new
             {
                 key = CreateIndexValue(item.a, item.b),
                 value = item
             }).ToDictionary(kv => kv.key, kv => kv.value);
var filesInBoth = from item in dictA
                  where dictB.ContainsKey(item.Key)
                  select new
                  {
                      itemA = dictA[item.Key],
                      itemB = dictB[item.Key]
                  };
var differentSize = from item in filesInBoth
                    where item.itemA.c!= item.itemB.c
                    select item.itemA;

Based on the answer provided by David B, I eventually settled on a slightly modified version of his method. Although the differences are minor, I thought I would share this, primarily to show a version that for those (like me) that prefer the expressive syntax.

Also, instead of grouping, I decided to use an anonymous key/value pair to simplify the structure.

var dictA = (from item in dataA
             select new
             {
                 key = CreateIndexValue(item.a, item.b),
                 value = item
             }).ToDictionary(kv => kv.key, kv => kv.value);
var dictB = (from item in dataB
             select new
             {
                 key = CreateIndexValue(item.a, item.b),
                 value = item
             }).ToDictionary(kv => kv.key, kv => kv.value);
var filesInBoth = from item in dictA
                  where dictB.ContainsKey(item.Key)
                  select new
                  {
                      itemA = dictA[item.Key],
                      itemB = dictB[item.Key]
                  };
var differentSize = from item in filesInBoth
                    where item.itemA.c!= item.itemB.c
                    select item.itemA;

回复收藏 0 原文

韵柒 2024-07-21 14:42:55

快点？你所拥有的是 O(n^2)。第一个列表中的每个项目将完全迭代第二个列表中的项目。您需要删除该连接中的冗余迭代。一种方法是使用另一种结构来进行 O(1) 匹配查找。

这是一些未经测试（未经检查）的代码：

var dictionaryA = dataA
  .GroupBy(item => new {a = item.a, b = item.b})
  .ToDictionary(g => g.Key, g => g.ToList());

var dictionaryB = dataB
  .GroupBy(item => new {a = item.a, b = item.b})
  .ToDictionary(g => g.Key, g => g.ToList());

var results = dictionaryA
  .Where(g1 => dictionaryB.ContainsKey(g1.Key))
  .Select(g1 => new {g1 = g1, g2 = dictionaryB[g1.Key]})
  .SelectMany(pair =>
    pair.g1.SelectMany(item1 =>
      pair.g2
      .Where(item2 => item2.c != item1.c)
      .Select(item2 => new {item1, item2})
    )
  );

如果 a,b 对在每个列表中都是唯一的，那么这是一个简化版本。

var dictionaryA = dataA
  .ToDictionary(item => new {a = item.a, b = item.b}, item => item);

var dictionaryB = dataB
  .ToDictionary(item => new {a = item.a, b = item.b}, item => item);

var results = dictionaryA
  .Where(e1 => dictionaryB.ContainsKey(e1.Key))
  .Select(e1 => new {i1 = e1.Value, i2 = dictionaryB[e1.Key]})
  .Where(pair => pair.i1.c != pair.i2.c);

Faster? What you have there is O(n^2). Each item in the first list will fully iterate the items in the second list. You need to remove the redundant iteration in that join. One way to do that is to use another structure to do O(1) lookups for matchs.

Here's some untested (unchecked) code:

var dictionaryA = dataA
  .GroupBy(item => new {a = item.a, b = item.b})
  .ToDictionary(g => g.Key, g => g.ToList());

var dictionaryB = dataB
  .GroupBy(item => new {a = item.a, b = item.b})
  .ToDictionary(g => g.Key, g => g.ToList());

var results = dictionaryA
  .Where(g1 => dictionaryB.ContainsKey(g1.Key))
  .Select(g1 => new {g1 = g1, g2 = dictionaryB[g1.Key]})
  .SelectMany(pair =>
    pair.g1.SelectMany(item1 =>
      pair.g2
      .Where(item2 => item2.c != item1.c)
      .Select(item2 => new {item1, item2})
    )
  );

Here's a simplified version if a,b pairs are unique in each list.

var dictionaryA = dataA
  .ToDictionary(item => new {a = item.a, b = item.b}, item => item);

var dictionaryB = dataB
  .ToDictionary(item => new {a = item.a, b = item.b}, item => item);

var results = dictionaryA
  .Where(e1 => dictionaryB.ContainsKey(e1.Key))
  .Select(e1 => new {i1 = e1.Value, i2 = dictionaryB[e1.Key]})
  .Where(pair => pair.i1.c != pair.i2.c);

回复收藏 0 原文

~没有更多了~