在 C# 建议中比较 2 个 CSV 文件?
我需要开发一个比较两个 csv 文件的应用程序。第一个文件有一个电子邮件地址列表。第二个列表也有电子邮件地址,但包括姓名和地址信息。第一个列表包含需要从第二个列表中删除的电子邮件地址。我有 CodeProject 网站上的 Fast CSV 阅读器,它运行得很好。应用程序将无权访问数据库服务器。将生成一个新文件,其中包含被视为已验证的数据。这意味着,它不会包含第一个文件中的任何信息。
I need to develop an application where two csv files are compared. The first file has a list of email addresses. The second list also has email addresses, but includes name and address info. The first list contains email addresses that need to be removed from the second list. I have the Fast CSV reader from the CodeProject site which works pretty well. The application will not have access to a database server. A new file wil be generated with data that is considered verified. Meaning, it will not contain any of the information from the first file.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果将两个列表读入集合中,则可以使用 Linq 来确定地址的子集。
这是我为您准备的一个快速示例课程。
要使用它,请将您的 csv 读入列表,然后将其以及要作为列表删除的地址列表传递到该方法中。
If you read both lists into collections, you can use Linq to determine the subset of addresses.
Here is a quick example class I whipped up for you.
To use it, read your csv into a List, then pass it, and your list of addresses to remove as a List into the method.
不确定您需要什么样的建议,听起来很简单。
这是一个快速算法草图:
Not sure what kind of advice you need, it sounds straight forward.
heres a quick algorithm sketch:
这相对简单,假设列表不是很大或者内存使用量不是太大的问题:在两个单独的
HashSet
实例中读取两组电子邮件地址。然后,您可以使用HashSet.ExceptsWith
查找两个集合之间的差异。例如:顺便说一句,上面的复杂度应该是 O(n*log(n)),而使用 Linq 答案时,复杂度在非索引数据上是 O(n^2)。
This is relatively simple, assuming the lists aren't terribly large or memory usage isn't an overly large concern: Read both sets of emails addresses in two separate
HashSet<string>
instances. Then, you can useHashSet<T>.ExceptsWith
to find the differences between the two sets. For instance:BTW, the above should be O(n*log(n)) complexity, versus using the Linq answer, which would be O(n^2) on non-indexed data.