比较多列上两个大文件的最佳方法

发布于 2025-01-26 11:27:30 字数 608 浏览 3 评论 0原文

我正在研究一个功能，该功能将允许用户上传两个CSV文件，编写规则以比较行并将结果输出到文件中。

两个文件都可以具有任意数量的列，并且列名也未修复。

当前，我将文件读为两个单独的数组，并根据规则中给出的条件比较行。

这适用于较小的文件，但对于大型文件，进行比较需要大量时间和内存。

有没有更好的方式将数据库用于存储和查询无模式数据？

示例数据：

File1
type id  date       amount
A    1   12/10/2005 500
B    2   12/10/2005 500

File2
type id  date       amount
A    1   12/10/2005 500
B    2   12/10/2005 500
A    1   12/10/2005 500

Rule1  File1.type == File2.type && File1.amount == File2.amount

Rule2  File1.id == GroupBy(File2.id) && File1.amount == File2.TotalAmount

匹配条件将为= Rule1或Rule2

原文

I am working on a feature which will allow users to upload two csv files, write the rules to compare the rows and output a result into a file.

Both files can have any number of columns and the columns name are also not fixed.

Currently, I read the files into two separate arrays and compare the rows based on the condition given in the rule.

This works for smaller files but for large ones, it takes a lot of time and memory to do the comparison.

Is there a better way where a DB can be utilized for storing and querying on schema-less data?

Example Data:

File1
type id  date       amount
A    1   12/10/2005 500
B    2   12/10/2005 500

File2
type id  date       amount
A    1   12/10/2005 500
B    2   12/10/2005 500
A    1   12/10/2005 500

Rule1  File1.type == File2.type && File1.amount == File2.amount

Rule2  File1.id == GroupBy(File2.id) && File1.amount == File2.TotalAmount

The match condition will be = Rule1 or Rule2

分享到QQ

分享到微博