在 C# 建议中比较 2 个 CSV 文件？

发布于 2024-09-11 16:15:11 字数 200 浏览 1 评论 0原文

我需要开发一个比较两个 csv 文件的应用程序。第一个文件有一个电子邮件地址列表。第二个列表也有电子邮件地址，但包括姓名和地址信息。第一个列表包含需要从第二个列表中删除的电子邮件地址。我有 CodeProject 网站上的 Fast CSV 阅读器，它运行得很好。应用程序将无权访问数据库服务器。将生成一个新文件，其中包含被视为已验证的数据。这意味着，它不会包含第一个文件中的任何信息。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柳絮泡泡 2024-09-18 16:15:11

如果将两个列表读入集合中，则可以使用 Linq 来确定地址的子集。

这是我为您准备的一个快速示例课程。

using System;
using System.Linq;
using System.Collections.Generic;

public class RemoveExample
{
    public List<Item> RemoveAddresses(List<Item> sourceList, List<string> emailAddressesToRemove)
    {
        List<Item> newList = (from s in sourceList
                              where !emailAddressesToRemove.Contains(s.Email)
                              select s).ToList();
        return newList;
    }

    public class Item
    {
        public string Email { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
    }
}

要使用它，请将您的 csv 读入列表，然后将其以及要作为列表删除的地址列表传递到该方法中。

If you read both lists into collections, you can use Linq to determine the subset of addresses.

Here is a quick example class I whipped up for you.

using System;
using System.Linq;
using System.Collections.Generic;

public class RemoveExample
{
    public List<Item> RemoveAddresses(List<Item> sourceList, List<string> emailAddressesToRemove)
    {
        List<Item> newList = (from s in sourceList
                              where !emailAddressesToRemove.Contains(s.Email)
                              select s).ToList();
        return newList;
    }

    public class Item
    {
        public string Email { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
    }
}

To use it, read your csv into a List, then pass it, and your list of addresses to remove as a List into the method.

回复收藏 0 原文

把梦留给海 2024-09-18 16:15:11

不确定您需要什么样的建议，听起来很简单。

这是一个快速算法草图：

循环遍历第一个 csv 中的电子邮件
- 将每封电子邮件放入 HashSet<>
运行删除
将每个输出电子邮件放在同一个 HashSet<> 中
- 如果出现 DuplicateKeyException，则说明您在删除过程中错过了一个
- 如果 emailList2.Count - emailList1.Count != outputList.Count，则您删除了太多

回复收藏 0 原文

等你爱我 2024-09-18 16:15:11

这相对简单，假设列表不是很大或者内存使用量不是太大的问题：在两个单独的 HashSet 实例中读取两组电子邮件地址。然后，您可以使用 HashSet.ExceptsWith 查找两个集合之间的差异。例如：

HashSet<string> setA = ...;
HashSet<string> setB = ...;

setA.ExceptWith(setB); // Remove all strings in setB from setA

// Print all strings that were in setA, but not setB
foreach(var s in setA)
   System.Console.WriteLine(s);

顺便说一句，上面的复杂度应该是 O(n*log(n))，而使用 Linq 答案时，复杂度在非索引数据上是 O(n^2)。

This is relatively simple, assuming the lists aren't terribly large or memory usage isn't an overly large concern: Read both sets of emails addresses in two separate HashSet<string> instances. Then, you can use HashSet<T>.ExceptsWith to find the differences between the two sets. For instance:

HashSet<string> setA = ...;
HashSet<string> setB = ...;

setA.ExceptWith(setB); // Remove all strings in setB from setA

// Print all strings that were in setA, but not setB
foreach(var s in setA)
   System.Console.WriteLine(s);

BTW, the above should be O(n*log(n)) complexity, versus using the Linq answer, which would be O(n^2) on non-indexed data.

回复收藏 0 原文

~没有更多了~