如何在 C# 中打开大文本文件

发布于 2024-08-30 01:44:22 字数 606 浏览 3 评论 0原文

我有一个包含大约 100000 篇文章的文本文件。 文件的结构是:

.Document ID 42944-YEAR:5
.Date  03\08\11
.Cat  political
Article Content 1

.Document ID 42945-YEAR:5
.Date  03\08\11
.Cat  political
Article Content 2

我想在c#中打开这个文件并逐行处理它。 我尝试了这段代码:

String[] FileLines = File.ReadAllText(
                  TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray()); 

但它说:

类型异常 “System.OutOfMemoryException”是 抛出。

问题是如何打开这个文件并逐行读取它。

  • 文件大小:564 MB(591,886,626 字节)
  • 文件编码:UTF-8
  • 文件包含 Unicode 字符。

I have a text file that contains about 100000 articles.
The structure of file is:

.Document ID 42944-YEAR:5
.Date  03\08\11
.Cat  political
Article Content 1

.Document ID 42945-YEAR:5
.Date  03\08\11
.Cat  political
Article Content 2

I want to open this file in c# for processing it line by line.
I tried this code:

String[] FileLines = File.ReadAllText(
                  TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray()); 

But it says:

Exception of type
'System.OutOfMemoryException' was
thrown.

The question is How can I open this file and read it line by line.

  • File Size: 564 MB (591,886,626 bytes)
  • File Encoding: UTF-8
  • File contains Unicode characters.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

懵少女 2024-09-06 01:44:22

您可以打开该文件并将其作为流读取,而不是加载所有内容一下子全部进入记忆。

来自 MSDN:

using System;
using System.IO;

class Test 
{
    public static void Main() 
    {
        try 
        {
            // Create an instance of StreamReader to read from a file.
            // The using statement also closes the StreamReader.
            using (StreamReader sr = new StreamReader("TestFile.txt")) 
            {
                String line;
                // Read and display lines from the file until the end of 
                // the file is reached.
                while ((line = sr.ReadLine()) != null) 
                {
                    Console.WriteLine(line);
                }
            }
        }
        catch (Exception e) 
        {
            // Let the user know what went wrong.
            Console.WriteLine("The file could not be read:");
            Console.WriteLine(e.Message);
        }
    }
}

You can open the file and read it as a stream rather than loading everything into memory all at once.

From MSDN:

using System;
using System.IO;

class Test 
{
    public static void Main() 
    {
        try 
        {
            // Create an instance of StreamReader to read from a file.
            // The using statement also closes the StreamReader.
            using (StreamReader sr = new StreamReader("TestFile.txt")) 
            {
                String line;
                // Read and display lines from the file until the end of 
                // the file is reached.
                while ((line = sr.ReadLine()) != null) 
                {
                    Console.WriteLine(line);
                }
            }
        }
        catch (Exception e) 
        {
            // Let the user know what went wrong.
            Console.WriteLine("The file could not be read:");
            Console.WriteLine(e.Message);
        }
    }
}
时常饿 2024-09-06 01:44:22

您的文件太大,无法像 File.ReadAllText 尝试那样一次性读入内存。您应该逐行读取文件。

改编自MSDN

string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(@"c:\yourfile.txt"))
{
    while ((line = file.ReadLine()) != null)
    {    
        Console.WriteLine(line);
        // do your processing on each line here
    }
}

这样,不任何时候内存中都会有不止一行文件。

Your file is too large to be read into memory in one go, as File.ReadAllText is trying to do. You should instead read the file line by line.

Adapted from MSDN:

string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(@"c:\yourfile.txt"))
{
    while ((line = file.ReadLine()) != null)
    {    
        Console.WriteLine(line);
        // do your processing on each line here
    }
}

In this way, no more than a single line of the file is in memory at any one time.

山川志 2024-09-06 01:44:22

如果您使用的是 .NET Framework 4,则 System.IO.File 上有一个名为 ReadLines 的新静态方法,它返回字符串的 IEnumerable。我相信它是针对这种情况添加到框架中的;不过,我自己还没有使用过。

MSDN 文档 - File.ReadLines 方法(字符串)

相关堆栈溢出问题 - File.ReadLines 中的错误( ..).net Framework 4.0的方法

If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.

MSDN Documentation - File.ReadLines Method (String)

Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0

披肩女神 2024-09-06 01:44:22

像这样的东西:

using (var fileStream = File.OpenText(@"path to file"))
{
    do
    {
        var fileLine = fileStream.ReadLine();
        // process fileLine here

    } while (!fileStream.EndOfStream);
}

Something like this:

using (var fileStream = File.OpenText(@"path to file"))
{
    do
    {
        var fileLine = fileStream.ReadLine();
        // process fileLine here

    } while (!fileStream.EndOfStream);
}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文