比较两个包含大量对象的列表
我需要比较两个列表,其中每个列表包含大约 60,000 个对象。最有效的方法是什么?我想选择源列表中目标列表中不存在的所有项目。
我正在创建一个同步应用程序,其中 c# 扫描目录并将每个文件的属性放在列表中。因此,有一个源目录列表和另一个目标目录列表。然后,我不会复制所有文件,而是比较列表并查看哪些文件不同。如果两个列表都有相同的文件,那么我将不会复制该文件。这是我使用的 Linq 查询,它在扫描小文件夹时有效,但在扫描大文件夹时则无效。
// s.linst is the list of the source files
// d.list is the list of the files contained in the destination folder
var q = from a in s.lstFiles
from b in d.lstFiles
where
a.compareName == b.compareName &&
a.size == b.size &&
a.dateCreated == b.dateCreated
select a;
// create a list to hold the items that are the same later select the outer join
List<Classes.MyPathInfo.MyFile> tempList = new List<Classes.MyPathInfo.MyFile>();
foreach (Classes.MyPathInfo.MyFile file in q)
{
tempList.Add(file);
}
我不知道为什么这个查询需要很长时间。我还可以利用其他一些东西。例如,我知道如果源文件与目标文件匹配,则不可能与该文件存在另一个重复项,因为不可能使文件名具有相同的名称和相同的路径。
I need to compare two lists where each list contains about 60,000 objects. what would be the most efficient way of doing this? I want to select all the items that are in the source list that do not exist in the destination list.
I am creating a sync application where c# scans a directory and places the attributes of each file in a list. therefore there is a list for the source directory and another list for the destination directory. Then instead of copying all the files I will just compare the list and see which ones are different. If both list have the same file then I will not copy that file. Here is the Linq query that I use and it works when I scan a small folder but it does not when I scan a large folder.
// s.linst is the list of the source files
// d.list is the list of the files contained in the destination folder
var q = from a in s.lstFiles
from b in d.lstFiles
where
a.compareName == b.compareName &&
a.size == b.size &&
a.dateCreated == b.dateCreated
select a;
// create a list to hold the items that are the same later select the outer join
List<Classes.MyPathInfo.MyFile> tempList = new List<Classes.MyPathInfo.MyFile>();
foreach (Classes.MyPathInfo.MyFile file in q)
{
tempList.Add(file);
}
I don't know why this query takes forever. Also there are other things that I can take advantage. For example, I know that if the source file matches a destination file, then it is impossible to have another duplicate with that file because it is not possible to have to file names with the same name and same path.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为该类型创建一个相等比较器,然后您可以使用它来有效地比较集合:
现在您可以将其与
Intersect
等方法一起使用,以获取两个列表中存在的所有项目,或Except
获取一个列表中存在的所有项目,但不获取另一个列表中存在的所有项目:由于这些方法可以使用哈希码将项目划分到存储桶中,因此与连接相比,需要进行的比较要少得多必须将一个列表中的所有项目与另一列表中的所有项目进行比较。
Create an equality comparer for the type, then you can use that to efficiently compare the sets:
Now you can use this with methods like
Intersect
to get all items that exist in both lists, orExcept
to get all items that exist in one list but not the other:As the methods can use the hash code to divide the items into buckets, there are a lot less comparisons that needs to be done compared to a join where it has to compare all items in one list to all items in the other list.
LINQ 有一个
Except()
方法这个目的。您可以只使用a.Except(b);
LINQ has an
Except()
method for this purpose. You can just usea.Except(b);
使用
Except()
并阅读有关使用 linq 设置操作的更多信息 和 使用HashSet
进行设置操作。Use
Except()
and read more about set operations with linq and set operations withHashSet
.