在 C# 建议中比较 2 个 CSV 文件?

发布于 2024-09-11 16:15:11 字数 200 浏览 1 评论 0原文

我需要开发一个比较两个 csv 文件的应用程序。第一个文件有一个电子邮件地址列表。第二个列表也有电子邮件地址,但包括姓名和地址信息。第一个列表包含需要从第二个列表中删除的电子邮件地址。我有 CodeProject 网站上的 Fast CSV 阅读器,它运行得很好。应用程序将无权访问数据库服务器。将生成一个新文件,其中包含被视为已验证的数据。这意味着,它不会包含第一个文件中的任何信息。

I need to develop an application where two csv files are compared. The first file has a list of email addresses. The second list also has email addresses, but includes name and address info. The first list contains email addresses that need to be removed from the second list. I have the Fast CSV reader from the CodeProject site which works pretty well. The application will not have access to a database server. A new file wil be generated with data that is considered verified. Meaning, it will not contain any of the information from the first file.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

柳絮泡泡 2024-09-18 16:15:11

如果将两个列表读入集合中,则可以使用 Linq 来确定地址的子集。

这是我为您准备的一个快速示例课程。

using System;
using System.Linq;
using System.Collections.Generic;

public class RemoveExample
{
    public List<Item> RemoveAddresses(List<Item> sourceList, List<string> emailAddressesToRemove)
    {
        List<Item> newList = (from s in sourceList
                              where !emailAddressesToRemove.Contains(s.Email)
                              select s).ToList();
        return newList;
    }

    public class Item
    {
        public string Email { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
    }
}

要使用它,请将您的 csv 读入列表,然后将其以及要作为列表删除的地址列表传递到该方法中。

If you read both lists into collections, you can use Linq to determine the subset of addresses.

Here is a quick example class I whipped up for you.

using System;
using System.Linq;
using System.Collections.Generic;

public class RemoveExample
{
    public List<Item> RemoveAddresses(List<Item> sourceList, List<string> emailAddressesToRemove)
    {
        List<Item> newList = (from s in sourceList
                              where !emailAddressesToRemove.Contains(s.Email)
                              select s).ToList();
        return newList;
    }

    public class Item
    {
        public string Email { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
    }
}

To use it, read your csv into a List, then pass it, and your list of addresses to remove as a List into the method.

把梦留给海 2024-09-18 16:15:11

不确定您需要什么样的建议,听起来很简单。

这是一个快速算法草图:

  • 循环遍历第一个 csv 中的电子邮件
    • 将每封电子邮件放入 HashSet<>
  • 运行删除
  • 将每个输出电子邮件放在同一个 HashSet<> 中
    • 如果出现 DuplicateKeyException,则说明您在删除过程中错过了一个
    • 如果 emailList2.Count - emailList1.Count != outputList.Count,则您删除了太多

Not sure what kind of advice you need, it sounds straight forward.

heres a quick algorithm sketch:

  • loop through email from first csv
    • put each email in a HashSet<>
  • run your delete
  • put each output email in the same HashSet<>
    • if there is a DuplicateKeyException, you missed one in the delete
    • if emailList2.Count - emailList1.Count != outputList.Count, you deleted too many
等你爱我 2024-09-18 16:15:11

这相对简单,假设列表不是很大或者内存使用量不是太大的问题:在两个单独的 HashSet 实例中读取两组电子邮件地址。然后,您可以使用 HashSet.ExceptsWith 查找两个集合之间的差异。例如:

HashSet<string> setA = ...;
HashSet<string> setB = ...;

setA.ExceptWith(setB); // Remove all strings in setB from setA

// Print all strings that were in setA, but not setB
foreach(var s in setA)
   System.Console.WriteLine(s);

顺便说一句,上面的复杂度应该是 O(n*log(n)),而使用 Linq 答案时,复杂度在非索引数据上是 O(n^2)。

This is relatively simple, assuming the lists aren't terribly large or memory usage isn't an overly large concern: Read both sets of emails addresses in two separate HashSet<string> instances. Then, you can use HashSet<T>.ExceptsWith to find the differences between the two sets. For instance:

HashSet<string> setA = ...;
HashSet<string> setB = ...;

setA.ExceptWith(setB); // Remove all strings in setB from setA

// Print all strings that were in setA, but not setB
foreach(var s in setA)
   System.Console.WriteLine(s);

BTW, the above should be O(n*log(n)) complexity, versus using the Linq answer, which would be O(n^2) on non-indexed data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文