2D字节数组可以做成一个巨大的连续字节数组吗?
我的内存中有一个非常大的 2D 字节数组,
byte MyBA = new byte[int.MaxValue][10];
有没有什么方法(可能不安全)可以让 C# 认为这是一个巨大的连续字节数组?我想这样做,以便可以将其传递给 MemoryStream
,然后传递给 BinaryReader
。
MyReader = new BinaryReader(MemoryStream(*MyBA)) //Syntax obviously made-up here
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我不相信 .NET 提供了这一点,但实现您自己的 System.IO.Stream 实现应该相当容易,它可以无缝切换后备数组。以下是(未经测试的)基础知识:
解决 2^31 字节大小限制的另一种方法是
UnmanagedMemoryStream
在非托管内存缓冲区(可能与操作系统支持的大小相同)之上实现System.IO.Stream
。像这样的东西可能会起作用(未经测试):I do not believe .NET provides this, but it should be fairly easy to implement your own implementation of
System.IO.Stream
, that seamlessly switches backing array. Here are the (untested) basics:Another way to workaround the size-limitation of 2^31 bytes is
UnmanagedMemoryStream
which implementsSystem.IO.Stream
on top of an unmanaged memory buffer (which might be as large as the OS supports). Something like this might work (untested):同意。无论如何,数组大小本身有限制。
如果您确实需要在流中操作大型数组,请编写自定义内存流类。
Agree. Anyway you have limit of array size itself.
If you really need to operate huge arrays in a stream, write your custom memory stream class.
我认为您可以使用以下方法使用线性结构而不是二维结构。
您可以使用 byte[int.MaxValue*10],而不是 byte[int.MaxValue][10]。您可以将 [4,5] 处的项目寻址为 int.MaxValue*(4-1)+(5-1)。 (一般公式为(i-1)*列数+(j-1)。
当然您可以使用其他约定。
I think you can use a linear structure instead of a 2D structure using the following approach.
Instead of having byte[int.MaxValue][10] you can have byte[int.MaxValue*10]. You would address the item at [4,5] as int.MaxValue*(4-1)+(5-1). (a general formula would be (i-1)*number of columns+(j-1).
Of course you could use the other convention.
如果我正确理解你的问题,那么你有一个巨大的文件想要读入内存然后处理。但你不能这样做,因为文件中的数据量超过了任何一维数组的数据量。
您提到速度很重要,并且您有多个并行运行的线程来尽快处理数据。如果您无论如何都必须对每个线程的数据进行分区,为什么不根据覆盖所有内容所需的
byte[int.MaxValue]
缓冲区数量来确定线程数量呢?If I understand your question correctly, you've got a massive file that you want to read into memory and then process. But you can't do this because the amount of data in the file exceeds that of any single-dimensional array.
You mentioned that speed is important, and that you have multiple threads running in parallel to process the data as quickly as possible. If you're going to have to partition the data for each thread anyway, why not base the number of threads on the number of
byte[int.MaxValue]
buffers required to cover everything?您可以创建一个内存流,然后使用方法 写入
编辑:
MemoryStream 的限制当然是应用程序存在的内存量。也许有一个限制,但如果您需要更多内存,那么您应该考虑修改您的整体架构。例如,您可以分块处理数据,或者可以对文件执行交换机制。
You can create a memoryStream and then pass the array in line by line using the method Write
EDIT:
The limit of a MemoryStream is certainly the amount of memory present for your application. Maybe there is a limit beneath that but if you need more memory, then you should consider to modify your overall architecture. E.g. you could process your data in chunks, or you could do a swapping mechanism to a file.
如果您使用的是 Framework 4.0,则可以选择使用内存映射文件。内存映射文件可以由物理文件或 Windows 交换文件支持。内存映射文件就像内存中的流一样,在需要时透明地与后备存储交换数据。
如果您没有使用 Framework 4.0,您仍然可以使用此选项,但您需要编写自己的包装器或找到现有的包装器。我希望代码项目上有很多内容。
If you are using Framework 4.0, you have the option of working with a MemoryMappedFile. Memory mapped files can be backed by a physical file, or by the Windows swap file. Memory mapped files act like an in-memory stream, transparently swapping data to/from the backing storage if and when required.
If you are not using Framework 4.0, you can still use this option, but you will need to either write your own or find an exsiting wrapper. I expect there are plenty on The Code Project.