比较两个包含大量对象的列表

发布于 2024-11-07 07:47:26 字数 943 浏览 0 评论 0原文

我需要比较两个列表，其中每个列表包含大约 60,000 个对象。最有效的方法是什么？我想选择源列表中目标列表中不存在的所有项目。

我正在创建一个同步应用程序，其中 c# 扫描目录并将每个文件的属性放在列表中。因此，有一个源目录列表和另一个目标目录列表。然后，我不会复制所有文件，而是比较列表并查看哪些文件不同。如果两个列表都有相同的文件，那么我将不会复制该文件。这是我使用的 Linq 查询，它在扫描小文件夹时有效，但在扫描大文件夹时则无效。

// s.linst is the list of the source files
// d.list is the list of the files contained in the destination folder
  var q = from a in s.lstFiles
        from b in d.lstFiles
        where
        a.compareName == b.compareName &&
        a.size == b.size &&
        a.dateCreated == b.dateCreated
        select a;

// create a list to hold the items that are the same later select the outer join
List<Classes.MyPathInfo.MyFile> tempList = new List<Classes.MyPathInfo.MyFile>();

foreach (Classes.MyPathInfo.MyFile file in q)
{
    tempList.Add(file);
}

我不知道为什么这个查询需要很长时间。我还可以利用其他一些东西。例如，我知道如果源文件与目标文件匹配，则不可能与该文件存在另一个重复项，因为不可能使文件名具有相同的名称和相同的路径。

原文

I need to compare two lists where each list contains about 60,000 objects. what would be the most efficient way of doing this? I want to select all the items that are in the source list that do not exist in the destination list.

I am creating a sync application where c# scans a directory and places the attributes of each file in a list. therefore there is a list for the source directory and another list for the destination directory. Then instead of copying all the files I will just compare the list and see which ones are different. If both list have the same file then I will not copy that file. Here is the Linq query that I use and it works when I scan a small folder but it does not when I scan a large folder.

// s.linst is the list of the source files
// d.list is the list of the files contained in the destination folder
  var q = from a in s.lstFiles
        from b in d.lstFiles
        where
        a.compareName == b.compareName &&
        a.size == b.size &&
        a.dateCreated == b.dateCreated
        select a;

// create a list to hold the items that are the same later select the outer join
List<Classes.MyPathInfo.MyFile> tempList = new List<Classes.MyPathInfo.MyFile>();

foreach (Classes.MyPathInfo.MyFile file in q)
{
    tempList.Add(file);
}

I don't know why this query takes forever. Also there are other things that I can take advantage. For example, I know that if the source file matches a destination file, then it is impossible to have another duplicate with that file because it is not possible to have to file names with the same name and same path.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

峩卟喜欢 2024-11-14 07:47:26

为该类型创建一个相等比较器，然后您可以使用它来有效地比较集合：

public class MyFileComparer : IEqualityComparer<MyFile> {

  public bool Equals(MyFile a, MyFile b) {
    return
      a.compareName == b.compareName &&
      a.size == b.size &&
      a.dateCreated == b.dateCreated;
  }

  public int GetHashCode(MyFile a) {
    return
     (a.compareName.GetHashCode() * 251 + a.size.GetHashCode()) * 251 +
      a.dateCreated.GetHashCode();
  }

}

现在您可以将其与 Intersect 等方法一起使用，以获取两个列表中存在的所有项目，或 Except 获取一个列表中存在的所有项目，但不获取另一个列表中存在的所有项目：

List<MyFile> tempList =
  s.lstFiles.Intersect(d.lstFiles, new MyFileComparer()).ToList();

由于这些方法可以使用哈希码将项目划分到存储桶中，因此与连接相比，需要进行的比较要少得多必须将一个列表中的所有项目与另一列表中的所有项目进行比较。

Create an equality comparer for the type, then you can use that to efficiently compare the sets:

public class MyFileComparer : IEqualityComparer<MyFile> {

  public bool Equals(MyFile a, MyFile b) {
    return
      a.compareName == b.compareName &&
      a.size == b.size &&
      a.dateCreated == b.dateCreated;
  }

  public int GetHashCode(MyFile a) {
    return
     (a.compareName.GetHashCode() * 251 + a.size.GetHashCode()) * 251 +
      a.dateCreated.GetHashCode();
  }

}

Now you can use this with methods like Intersect to get all items that exist in both lists, or Except to get all items that exist in one list but not the other:

List<MyFile> tempList =
  s.lstFiles.Intersect(d.lstFiles, new MyFileComparer()).ToList();

As the methods can use the hash code to divide the items into buckets, there are a lot less comparisons that needs to be done compared to a join where it has to compare all items in one list to all items in the other list.

回复收藏 0 原文