如何在 C# 中打开大文本文件
我有一个包含大约 100000 篇文章的文本文件。 文件的结构是:
.Document ID 42944-YEAR:5 .Date 03\08\11 .Cat political Article Content 1 .Document ID 42945-YEAR:5 .Date 03\08\11 .Cat political Article Content 2
我想在c#中打开这个文件并逐行处理它。 我尝试了这段代码:
String[] FileLines = File.ReadAllText(
TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray());
但它说:
类型异常 “System.OutOfMemoryException”是 抛出。
问题是如何打开这个文件并逐行读取它。
- 文件大小:564 MB(591,886,626 字节)
- 文件编码:UTF-8
- 文件包含 Unicode 字符。
I have a text file that contains about 100000 articles.
The structure of file is:
.Document ID 42944-YEAR:5 .Date 03\08\11 .Cat political Article Content 1 .Document ID 42945-YEAR:5 .Date 03\08\11 .Cat political Article Content 2
I want to open this file in c# for processing it line by line.
I tried this code:
String[] FileLines = File.ReadAllText(
TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray());
But it says:
Exception of type
'System.OutOfMemoryException' was
thrown.
The question is How can I open this file and read it line by line.
- File Size: 564 MB (591,886,626 bytes)
- File Encoding: UTF-8
- File contains Unicode characters.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以打开该文件并将其作为流读取,而不是加载所有内容一下子全部进入记忆。
来自 MSDN:
You can open the file and read it as a stream rather than loading everything into memory all at once.
From MSDN:
您的文件太大,无法像
File.ReadAllText
尝试那样一次性读入内存。您应该逐行读取文件。改编自MSDN:
这样,不任何时候内存中都会有不止一行文件。
Your file is too large to be read into memory in one go, as
File.ReadAllText
is trying to do. You should instead read the file line by line.Adapted from MSDN:
In this way, no more than a single line of the file is in memory at any one time.
如果您使用的是 .NET Framework 4,则 System.IO.File 上有一个名为 ReadLines 的新静态方法,它返回字符串的 IEnumerable。我相信它是针对这种情况添加到框架中的;不过,我自己还没有使用过。
MSDN 文档 - File.ReadLines 方法(字符串)
相关堆栈溢出问题 - File.ReadLines 中的错误( ..).net Framework 4.0的方法
If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.
MSDN Documentation - File.ReadLines Method (String)
Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0
像这样的东西:
Something like this: