从驱动器读取大文件的部分内容
我正在 C# 中处理大文件(最多可达可用内存的 20%-40%),并且一次只需要将文件的一小部分加载到内存中(例如文件的 1-2%) )。我认为使用 FileStream 将是最好的选择,但我不知道。我需要给出一个起点(以字节为单位)和长度(以字节为单位)并将该区域复制到 byte[] 中。对文件的访问可能需要在线程之间共享,并且将在文件中的随机位置进行(非线性访问)。我也需要它快点。
该项目已经有 unsafe
方法,因此请随意从 C# 更危险的方面提出建议
I'm working with large files in C# (can be up to 20%-40% of available memory) and I will only need small parts of the files to be loaded into memory at a time (like 1-2% of the file). I was thinking that using a FileStream would be the best option, but idk. I will need to give a starting point (in bytes) and a length (in bytes) and copy that region into a byte[]. Access to the file might need to be shared between threads and will be at random spots in the file (non-linear access). I also need it to be fast.
The project already has unsafe
methods, so feel free to suggest things from the more dangerous side of C#
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
FileStream
将允许您查找所需文件的部分,没有问题。这是在 C# 中执行此操作的推荐方法,而且速度很快。线程之间共享:您需要创建一个锁,以防止其他线程在您尝试读取文件流时更改它的位置。最简单的方法是:
根据需要添加
try..catch
语句和其他代码。在您访问此 FileStream 的任何地方,请在成员级变量 fsLock 上放置一个锁...这将阻止其他方法在您尝试读取时读取/操作文件指针。就速度而言,我想您会发现您受到磁盘访问速度的限制,而不是代码。
您必须考虑有关多线程文件访问的所有问题......谁初始化/打开文件,谁关闭它,等等。有很多基础知识需要涵盖。
A
FileStream
will allow you to seek to the portion of the file you want, no problem. It's the recommended way to do it in C#, and it's fast.Sharing between threads: You will need to create a lock to prevent other threads from changing the FileStream position while you're trying to read from it. The simplest way to do this:
Add
try..catch
statements and other code as necessary. Everywhere you access this FileStream, put a lock on the member-level variable fsLock... this will keep other methods from reading/manipulating the file pointer while you're trying to read.Speed-wise, I think you'll find you're limited by disk access speeds, not code.
You'll have to think through all the issues about multi-threaded file access... who intializes/opens the file, who closes it, etc. There's a lot of ground to cover.
我对这些文件的结构一无所知,但使用 FileStream 或类似的方式读取文件的一部分听起来是最好、最快的方法。
您不需要复制 byte[],因为 FileStream 可以直接读入字节数组。
听起来您可能对文件的结构有更多了解,这也可能带来额外的技术。但如果您只需要读取文件的一部分,那么这可能就是这样做的方法。
I know nothing about the structure of these files, but reading a portion of a file with FileStream or similar sounds like the best and fastest way to do it.
You will not need to copy the byte[] since FileStream can read directly into a byte array.
It sounds like you might know more about the structure of the file, which could bring up additional techniques as well. But if you need to read only a portion of the file, then this would probably be the way to do it.
如果您使用的是 .Net 4,请考虑在 System.IO.MemoryMappedFiles 命名空间中使用内存映射文件。
它们非常适合从大文件中读取小块。 MSDN 文档中提供了示例。
您也可以在 .Net 的早期版本中执行此操作,但随后您需要包装 Win32 API(或使用 http:// /winterdom.com/dev/net),
If you are using .Net 4 look into using memory mapped files in the
System.IO.MemoryMappedFiles
namespace.They are perfect for reading small chunks out of large files. There are samples in the MSDN documentation.
You can also do this in earlier versions of .Net, but then you need to wrap the Win32 API (or use http://winterdom.com/dev/net),