如何在.NET 中读取大型（1 GB）txt 文件？

发布于 2024-10-04 11:52:51 字数 684 浏览 6 评论 0原文

我有一个 1 GB 的文本文件，需要逐行读取。最好和最快的方法是什么？

private void ReadTxtFile()
{            
    string filePath = string.Empty;
    filePath = openFileDialog1.FileName;
    if (string.IsNullOrEmpty(filePath))
    {
        using (StreamReader sr = new StreamReader(filePath))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                FormatData(line);                        
            }
        }
    }
}

在FormatData()中，我检查行的起始词，该词必须与一个词匹配，并基于该词递增一个整数变量。

void FormatData(string line)
{
    if (line.StartWith(word))
    {
        globalIntVariable++;
    }
}

原文

I have a 1 GB text file which I need to read line by line. What is the best and fastest way to do this?

private void ReadTxtFile()
{            
    string filePath = string.Empty;
    filePath = openFileDialog1.FileName;
    if (string.IsNullOrEmpty(filePath))
    {
        using (StreamReader sr = new StreamReader(filePath))
        {
            String line;
            while ((line = sr.ReadLine()) != null)
            {
                FormatData(line);                        
            }
        }
    }
}

In FormatData() I check the starting word of line which must be matched with a word and based on that increment an integer variable.

void FormatData(string line)
{
    if (line.StartWith(word))
    {
        globalIntVariable++;
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

默嘫て 2024-10-11 11:52:51

如果您使用的是 .NET 4.0，请尝试 MemoryMappedFile 这是为此场景设计的类。

否则，您可以使用 StreamReader.ReadLine 。

回复收藏 0 原文

窝囊感情。 2024-10-11 11:52:51

使用 StreamReader 可能是一种方法，因为您不希望整个文件同时存在于内存中。 MemoryMappedFile 更适合随机访问而不是顺序读取（顺序读取的速度是顺序读取的十倍，内存映射是随机访问的十倍）。

您还可以尝试从 FileOptions 设置为 SequentialScan 的文件流创建流读取器（请参阅FileOptions 枚举），但我怀疑它会产生很大的影响。

然而，有一些方法可以使您的示例更有效，因为您在阅读时在同一循环中进行格式化。您正在浪费时钟周期，因此如果您想要更高的性能，最好使用多线程异步解决方案，其中一个线程读取数据，另一个线程在数据可用时对其进行格式化。查看可能适合您需求的 BlockingCollection：

阻塞集合和生产者-消费者问题

如果您想要尽可能快的性能，根据我的经验，唯一的方法是读入尽可能大的二进制文件按顺序读取数据并将其并行反序列化为文本，但此时代码开始变得复杂。

回复收藏 0 原文

护你周全 2024-10-11 11:52:51

您可以使用LINQ：

int result = File.ReadLines(filePath).Count(line => line.StartsWith(word));

File.ReadLines 返回一个 IEnumerable延迟读取文件中的每一行，而不将整个文件加载到内存中。

Enumerable.Count 计算以单词开头的行数。

如果您从 UI 线程调用此函数，请使用 BackgroundWorker。

You can use LINQ:

int result = File.ReadLines(filePath).Count(line => line.StartsWith(word));

File.ReadLines returns an IEnumerable<String> that lazily reads each line from the file without loading the whole file into memory.

Enumerable.Count counts the lines that start with the word.

If you are calling this from an UI thread, use a BackgroundWorker.

回复收藏 0 原文

玩套路吗 2024-10-11 11:52:51

可能要逐行阅读。< /em>

你不应该试图通过读到结束然后处理来强制它进入内存。

回复收藏 0 原文

西瑶 2024-10-11 11:52:51

StreamReader.ReadLine 应该可以正常工作。让框架选择缓冲，除非您知道通过分析可以做得更好。

回复收藏 0 原文

蓝天白云 2024-10-11 11:52:51

TextReader.ReadLine()< /a>

回复收藏 0 原文

红衣飘飘貌似仙 2024-10-11 11:52:51

我在 Agenty 的生产服务器中遇到了同样的问题，我们看到大文件（有时 10-25 GB（ \t) 制表符分隔的 txt 文件)。经过大量测试和研究，我找到了使用 for/foreach 循环以小块形式读取大文件并使用 File.ReadLines() 设置偏移和限制逻辑的最佳方法。

int TotalRows = File.ReadLines(Path).Count(); // Count the number of rows in file with lazy load
int Limit = 100000; // 100000 rows per batch
for (int Offset = 0; Offset < TotalRows; Offset += Limit)
{
  var table = Path.FileToTable(heading: true, delimiter: '\t', offset : Offset, limit: Limit);

 // Do all your processing here and with limit and offset and save to drive in append mode
 // The append mode will write the output in same file for each processed batch.

  table.TableToFile(@"C:\output.txt");
}

请参阅我的 Github 库中的完整代码： https://github.com/Agenty/FileReader/

全面披露 - 我为 Agenty 工作，该公司拥有该图书馆和网站

I was facing same problem in our production server at Agenty where we see large files (sometimes 10-25 gb (\t) tab delimited txt files). And after lots of testing and research I found the best way to read large files in small chunks with for/foreach loop and setting offset and limit logic with File.ReadLines().

int TotalRows = File.ReadLines(Path).Count(); // Count the number of rows in file with lazy load
int Limit = 100000; // 100000 rows per batch
for (int Offset = 0; Offset < TotalRows; Offset += Limit)
{
  var table = Path.FileToTable(heading: true, delimiter: '\t', offset : Offset, limit: Limit);

 // Do all your processing here and with limit and offset and save to drive in append mode
 // The append mode will write the output in same file for each processed batch.

  table.TableToFile(@"C:\output.txt");
}

See the complete code in my Github library : https://github.com/Agenty/FileReader/

Full Disclosure - I work for Agenty, the company who owned this library and website

回复收藏 0 原文

〃温暖了心ぐ 2024-10-11 11:52:51

我的文件超过 13 GB：

您可以使用我的类：

public static void Read(int length)
    {
        StringBuilder resultAsString = new StringBuilder();

        using (MemoryMappedFile memoryMappedFile = MemoryMappedFile.CreateFromFile(@"D:\_Profession\Projects\Parto\HotelDataManagement\_Document\Expedia_Rapid.jsonl\Expedia_Rapi.json"))
        using (MemoryMappedViewStream memoryMappedViewStream = memoryMappedFile.CreateViewStream(0, length))
        {
            for (int i = 0; i < length; i++)
            {
                //Reads a byte from a stream and advances the position within the stream by one byte, or returns -1 if at the end of the stream.
                int result = memoryMappedViewStream.ReadByte();

                if (result == -1)
                {
                    break;
                }

                char letter = (char)result;

                resultAsString.Append(letter);
            }
        }
    }

此代码将从头开始读取文件文本，直到您传递给方法Read(int length) 并填充 resultAsString 变量。

它将返回以下文本：

My file is over 13 GB:

You can use my class:

public static void Read(int length)
    {
        StringBuilder resultAsString = new StringBuilder();

        using (MemoryMappedFile memoryMappedFile = MemoryMappedFile.CreateFromFile(@"D:\_Profession\Projects\Parto\HotelDataManagement\_Document\Expedia_Rapid.jsonl\Expedia_Rapi.json"))
        using (MemoryMappedViewStream memoryMappedViewStream = memoryMappedFile.CreateViewStream(0, length))
        {
            for (int i = 0; i < length; i++)
            {
                //Reads a byte from a stream and advances the position within the stream by one byte, or returns -1 if at the end of the stream.
                int result = memoryMappedViewStream.ReadByte();

                if (result == -1)
                {
                    break;
                }

                char letter = (char)result;

                resultAsString.Append(letter);
            }
        }
    }

This code will read text of file from start to the length that you pass to the method Read(int length) and fill the resultAsString variable.

It will return the bellow text:

回复收藏 0 原文