将相同的 CSV 附加在一起,同时删除标头

发布于 2024-11-19 15:10:04 字数 1275 浏览 0 评论 0原文

我想附加 6 个具有相同布局和标题的 CSV。

我已经能够通过将 6 个 csv 中的每一个加载到它们自己的单独数据表中并删除每个数据表的第一行来实现此目的。最后,我使用 ImportRow 方法将它们附加在一起。

DataTable table1 = csvToDataTable(@"C:\Program Files\Normalization\Scan1.csv");
DataTable table2 = csvToDataTable(@"C:\Program Files\Normalization\Scan2.csv");
DataTable table3 = csvToDataTable(@"C:\Program Files\Normalization\Scan3.csv");
DataTable table4 = csvToDataTable(@"C:\Program Files\Normalization\Scan4.csv");
DataTable table5 = csvToDataTable(@"C:\Program Files\Normalization\Scan5.csv");
DataTable table6 = csvToDataTable(@"C:\Program Files\Normalization\Scan6.csv");

        foreach (DataRow dr in table2.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table3.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table4.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table5.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table6.Rows)
        {
            table1.ImportRow(dr);
        }

        CreateCSVFile(table1, @"C:\Program Files\Normalization\RackMap.csv");

我觉得这很笨重,而且可扩展性不太好,但当我尝试在 CSV 级别附加时,我在处理标题时遇到了麻烦。有什么建议吗?

TIA

I am wanting to append 6 CSVs that have identical layouts and headers together.

I've been able to accomplish this by loading each of the 6 csvs into their own seperate data tables and removing the first row of each datatable. Finally I've appended them together using the ImportRow method.

DataTable table1 = csvToDataTable(@"C:\Program Files\Normalization\Scan1.csv");
DataTable table2 = csvToDataTable(@"C:\Program Files\Normalization\Scan2.csv");
DataTable table3 = csvToDataTable(@"C:\Program Files\Normalization\Scan3.csv");
DataTable table4 = csvToDataTable(@"C:\Program Files\Normalization\Scan4.csv");
DataTable table5 = csvToDataTable(@"C:\Program Files\Normalization\Scan5.csv");
DataTable table6 = csvToDataTable(@"C:\Program Files\Normalization\Scan6.csv");

        foreach (DataRow dr in table2.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table3.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table4.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table5.Rows)
        {
            table1.ImportRow(dr);
        }
        foreach (DataRow dr in table6.Rows)
        {
            table1.ImportRow(dr);
        }

        CreateCSVFile(table1, @"C:\Program Files\Normalization\RackMap.csv");

I feel this is clunky and not very scalable but I had trouble dealing with the headers when I tried to append at the CSV level. Any suggestions?

TIA

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

虫児飞 2024-11-26 15:10:05

如果您不想重复相同的行,那么您可以创建哈希代码列表并在循环中查找列表是否包含行的哈希代码。

    List<int> rowHashCodes = new List<int>();
    foreach (DataRow dr in table2.Rows)
    {
        int hash = dr.GetHashCode();
        if (rowHashCodes.Contains(hash))
        {
            // We already have this row
        }
        else
        {
            table1.ImportRow(dr);
            rowHashCodes.Add(hash);
        }
    }

从性能角度来看,这可能不是理想的方式,但我希望这可以解决您的问题。

If you want to not repeat identical rows, then you can create List of hash codes and in loop, find if list contains row's hash code.

    List<int> rowHashCodes = new List<int>();
    foreach (DataRow dr in table2.Rows)
    {
        int hash = dr.GetHashCode();
        if (rowHashCodes.Contains(hash))
        {
            // We already have this row
        }
        else
        {
            table1.ImportRow(dr);
            rowHashCodes.Add(hash);
        }
    }

May be this is not ideal way for performance point of view, but I hope this can solve your problem.

坐在坟头思考人生 2024-11-26 15:10:04

获取与掩码 *.csv 匹配的所有文件的 DirectoryInfo

创建一个 for 循环来迭代结果。

导入每个文件时删除第一行。

编辑:

如果您只想合并文件,而不是导入到数据表中,则可以将它们视为文本文件。将它们连接起来,每次都删除标题行。这是一个例子:

string myPath = @"K:\csv";

DirectoryInfo csvDirectory = new DirectoryInfo(myPath);
FileInfo[] csvFiles = csvDirectory.GetFiles("*.csv");
StringBuilder sb = new StringBuilder();
foreach (FileInfo csvFile in csvFiles)
    using (StreamReader sr = new StreamReader(csvFile.OpenRead()))
    {
        sr.ReadLine(); // Discard header line
        while (!sr.EndOfStream)
            sb.AppendLine(sr.ReadLine());
    }
File.AppendAllText(Path.Combine(myPath, "output.csv"), sb.ToString());

Get a DirectoryInfo of all files matching the mask *.csv

Create a for loop to iterate the results.

Drop the first row when importing each file.

EDIT:

If you just want to combine the files, rather than import into a data table, you could treat them as text files. Concatenate them, dropping the header line each time. Here is an example:

string myPath = @"K:\csv";

DirectoryInfo csvDirectory = new DirectoryInfo(myPath);
FileInfo[] csvFiles = csvDirectory.GetFiles("*.csv");
StringBuilder sb = new StringBuilder();
foreach (FileInfo csvFile in csvFiles)
    using (StreamReader sr = new StreamReader(csvFile.OpenRead()))
    {
        sr.ReadLine(); // Discard header line
        while (!sr.EndOfStream)
            sb.AppendLine(sr.ReadLine());
    }
File.AppendAllText(Path.Combine(myPath, "output.csv"), sb.ToString());
梦里梦着梦中梦 2024-11-26 15:10:04

正如 JYelton 所建议的,您肯定希望动态查找文件夹中的所有 *.csv 文件,并迭代它们(而不是硬编码 6 个文件名)。从那时起,您可能会考虑这样的方法:

  1. 为您的“目标”文件创建一个可写文件流。
  2. 对于每个 .CSV 文件,在其上打开一个可读文件流。
  3. 通过读取并包括第一个 CRLF 并丢弃该数据来丢弃每个文件的标题行。
  4. 将所有剩余数据读入可写流。
  5. 对每个 CSV 文件重复 #2-4。
  6. 关闭可写流以保存完成的文件。

这种方法将容纳任意数量的 CSV 文件,并且可能比使用 DataTables 更具性能效率。

注意:为了简洁起见,为了清楚起见,我遗漏了一些您需要做的边缘情况处理。比如如何处理空的 csv 文件,或者包含标题行而没有其他内容的文件,或者在最后一行之后没有尾随 CRLF 的文件。不是实施细节 &边缘情况处理有趣吗? ;)

As JYelton suggested, you'll definitely want to dynamically find all the *.csv files in your folder, and iterate over them (rather than hardcoding 6 filenames). From that point you might consider an approach like this:

  1. Create a writable filestream for your "destination" file.
  2. For each .CSV file, open a readable filestream on it.
  3. Discard each file's header row by reading to up to and including the first CRLF, and throwing that data away.
  4. Read all the remaining data into your writable stream.
  5. Repeat #2-4 for each CSV file.
  6. Close your writable stream to save the completed file.

This approach will accommodate an arbitrary number of CSV files, and is probably more performance-efficient than working with DataTables.

Note: for sake of brevity & clarity, I've left out some edge-case handling you'll need to do. Like how to handle an empty csv file, or one which contains a header row and nothing else, or one which does not have a trailing CRLF after its final row. Aren't implementation details & edge-case handling fun? ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文