.NET C# - 随机访问文本文件 - 没有简单的方法吗?
我有一个文本文件,其中包含多个“记录”。 每条记录都包含一个名称和一组数字作为数据。
我正在尝试构建一个类,该类将读取文件,仅显示所有记录的名称,然后允许用户选择他/她想要的记录数据。
第一次浏览文件时,我只读取标头名称,但我可以跟踪标头在文件中的“位置”。 我需要随机访问文本文件,以便在用户请求后查找每个记录的开头。
我必须这样做,因为文件太大,无法完全读入内存(1GB+)以及应用程序的其他内存需求。
我尝试使用 .NET StreamReader 类来完成此操作(它提供了非常易于使用的“ReadLine”功能,但无法捕获文件的真实位置(BaseStream 属性中的位置由于类使用的缓冲区)。
在 .NET 中是否没有简单的方法可以做到这一点?
I've got a text file that contains several 'records' inside of it. Each record contains a name and a collection of numbers as data.
I'm trying to build a class that will read through the file, present only the names of all the records, and then allow the user to select which record data he/she wants.
The first time I go through the file, I only read header names, but I can keep track of the 'position' in the file where the header is. I need random access to the text file to seek to the beginning of each record after a user asks for it.
I have to do it this way because the file is too large to be read in completely in memory (1GB+) with the other memory demands of the application.
I've tried using the .NET StreamReader class to accomplish this (which provides very easy to use 'ReadLine' functionality, but there is no way to capture the true position of the file (the position in the BaseStream property is skewed due to the buffer the class uses).
Is there no easy way to do this in .NET?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
提供了一些很好的答案,但我找不到一些适用于我非常简单的情况的源代码。 就在这里,希望它能节省其他人我花在搜索上的时间。
我指的“非常简单的情况”是:文本编码是固定宽度的,并且行结束字符在整个文件中是相同的。 这段代码在我的情况下效果很好(我正在解析一个日志文件,有时我必须在文件中向前查找,然后再回来。我实现的代码足以完成我需要做的事情(例如:只有一个构造函数,并且仅重写 ReadLine()),因此很可能您需要添加代码...但我认为这是一个合理的起点,
以下是如何使用 PositionableStreamReader 的示例:
There are some good answers provided, but I couldn't find some source code that would work in my very simplistic case. Here it is, with the hope that it'll save someone else the hour that I spent searching around.
The "very simplistic case" that I refer to is: the text encoding is fixed-width, and the line ending characters are the same throughout the file. This code works well in my case (where I'm parsing a log file, and I sometime have to seek ahead in the file, and then come back. I implemented just enough to do what I needed to do (ex: only one constructor, and only override ReadLine()), so most likely you'll need to add code... but I think it's a reasonable starting point.
Here's an example of how to use the PositionableStreamReader:
FileStream有seek()方法。
FileStream has the seek() method.
您可以使用 System.IO.FileStream 而不是 StreamReader。 如果您确切地知道文件包含什么(例如编码),您可以像使用 StreamReader 一样执行所有操作。
You can use a System.IO.FileStream instead of StreamReader. If you know exactly, what file contains ( the encoding for example ), you can do all operation like with StreamReader.
如果您对数据文件的写入方式很灵活并且不介意它对文本编辑器不太友好,则可以使用 BinaryWriter 写入记录:
然后,最初读取每个记录很简单,因为您可以使用 BinaryReader ReadString 方法:
BinaryReader 没有缓冲,因此您可以获得正确的位置来存储和稍后使用。 唯一的麻烦是从行中解析名称,无论如何您可能都必须使用 StreamReader 来完成此操作。
If you're flexible with how the data file is written and don't mind it being a little less text editor-friendly, you could write your records with a BinaryWriter:
Then, initially reading each record is simple because you can use the BinaryReader's ReadString method:
The BinaryReader isn't buffered so you get the proper position to store and use later. The only hassle is parsing the name out of the line, which you may have to do with a StreamReader anyway.
编码是固定大小的吗(例如 ASCII 或 UCS-2)? 如果是这样,您可以跟踪字符索引(基于您看到的字符数)并根据该索引找到二进制索引。
否则,不 - 您基本上需要编写自己的 StreamReader 实现,它可以让您查看二进制索引。 遗憾的是 StreamReader 没有实现这一点,我同意。
Is the encoding a fixed-size one (e.g. ASCII or UCS-2)? If so, you could keep track of the character index (based on the number of characters you've seen) and find the binary index based on that.
Otherwise, no - you'd basically need to write your own StreamReader implementation which lets you peek at the binary index. It's a shame that StreamReader doesn't implement this, I agree.
从 .NET 6 开始,
系统中的方法.IO.RandomAccess
类是随机读写文件的官方且受支持的方法。 这些 API 与 Microsoft.Win32.SafeHandles.SafeFileHandle 配合使用,可以通过新的System.IO.File.OpenHandle
函数,也在 .NET 6 中引入。Starting with .NET 6, the methods in the
System.IO.RandomAccess
class is the official and supported way to randomly read and write to a file. These APIs work withMicrosoft.Win32.SafeHandles.SafeFileHandle
s which can be obtained with the newSystem.IO.File.OpenHandle
function, also introduced in .NET 6.我认为 FileHelpers 库运行时记录功能可能会对您有所帮助。 http://filehelpers.sourceforge.net/runtime_classes.html
I think that the FileHelpers library runtime records feature might help u. http://filehelpers.sourceforge.net/runtime_classes.html
一些可能感兴趣的项目。
1) 如果行是长度固定的字符集,并且字符集具有可变大小(如 UTF-8),那么这不一定是有用的信息。 所以检查你的字符集。
2) 您可以使用 BaseStream.Position 值从 StreamReader 确定文件光标的确切位置 IF 您首先 Flush() 缓冲区(这将强制当前位置位于下一次读取的位置)开始 - 最后一个字节读取后的一个字节)。
3)如果您事先知道每个记录的确切长度将是相同的字符数,并且字符集使用固定宽度字符(因此每行的字节数相同),则可以将 FileStream 与固定缓冲区大小以匹配行的大小,并且每次读取结束时光标的位置必然是下一行的开头。
4)是否有任何特殊原因,如果行的长度相同(此处假设以字节为单位),您不简单地使用行号并根据行大小 x 行号计算文件中的字节偏移量?
A couple of items that may be of interest.
1) If the lines are a fixed set of characters in length, that is not of necessity useful information if the character set has variable sizes (like UTF-8). So check your character set.
2) You can ascertain the exact position of the file cursor from StreamReader by using the BaseStream.Position value IF you Flush() the buffers first (which will force the current position to be where the next read will begin - one byte after the last byte read).
3) If you know in advance that the exact length of each record will be the same number of characters, and the character set uses fixed-width characters (so each line is the same number of bytes long) the you can use FileStream with a fixed buffer size to match the size of a line and the position of the cursor at the end of each read will be, perforce, the beginning of the next line.
4) Is there any particular reason why, if the lines are the same length (assuming in bytes here) that you don't simply use line numbers and calculate the byte-offset in the file based on line size x line number?
您确定该文件“太大”吗? 您是否尝试过这种方式并导致出现问题?
如果您分配了大量内存,并且现在没有使用它,Windows 只会将其换出到磁盘。 因此,通过从“内存”访问它,您将完成您想要的事情——随机访问磁盘上的文件。
Are you sure that the file is "too large"? Have you tried it that way and has it caused a problem?
If you allocate a large amount of memory, and you aren't using it right now, Windows will just swap it out to disk. Hence, by accessing it from "memory", you will have accomplished what you want -- random access to the file on disk.
这个确切的问题是在 2006 年在这里提出的: http://www .devnewsgroups.net/group/microsoft.public.dotnet.framework/topic40275.aspx
摘要:
“问题是 StreamReader 缓冲数据,因此返回的值
BaseStream.Position 属性始终位于实际处理的行之前。”
但是,“如果文件以固定宽度的文本编码进行编码,您可以跟踪已读取的文本量并将其乘以宽度”
如果没有,您可以使用 FileStream 并一次读取一个字符,然后 BaseStream.Position 属性应该是正确的
This exact question was asked in 2006 here: http://www.devnewsgroups.net/group/microsoft.public.dotnet.framework/topic40275.aspx
Summary:
"The problem is that the StreamReader buffers data, so the value returned in
BaseStream.Position property is always ahead of the actual processed line."
However, "if the file is encoded in a text encoding which is fixed-width, you could keep track of how much text has been read and multiply that by the width"
and if not, you can just use the FileStream and read a char at a time and then the BaseStream.Position property should be correct