2D字节数组可以做成一个巨大的连续字节数组吗?

发布于 2024-09-17 20:43:25 字数 321 浏览 8 评论 0 原文

我的内存中有一个非常大的 2D 字节数组,

byte MyBA = new byte[int.MaxValue][10];

有没有什么方法(可能不安全)可以让 C# 认为这是一个巨大的连续字节数组?我想这样做,以便可以将其传递给 MemoryStream,然后传递给 BinaryReader

MyReader = new BinaryReader(MemoryStream(*MyBA)) //Syntax obviously made-up here

I have an extremely large 2D bytearray in memory,

byte MyBA = new byte[int.MaxValue][10];

Is there any way (probably unsafe) that I can fool C# into thinking this is one huge continuous byte array? I want to do this such that I can pass it to a MemoryStream and then a BinaryReader.

MyReader = new BinaryReader(MemoryStream(*MyBA)) //Syntax obviously made-up here

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

夏花。依旧 2024-09-24 20:43:25

我不相信 .NET 提供了这一点,但实现您自己的 System.IO.Stream 实现应该相当容易,它可以无缝切换后备数组。以下是(未经测试的)基础知识:

public class MultiArrayMemoryStream: System.IO.Stream
{
    byte[][] _arrays;
    long _position;
    int _arrayNumber;
    int _posInArray;

    public MultiArrayMemoryStream(byte[][] arrays){
        _arrays = arrays;
        _position = 0;
        _arrayNumber = 0;
        _posInArray = 0;
    }

    public override int Read(byte[] buffer, int offset, int count){
        int read = 0;
        while(read<count){
            if(_arrayNumber>=_arrays.Length){
                return read;
            }
            if(count-read <= _arrays[_arrayNumber].Length - _posInArray){
                Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, count-read);
                _posInArray+=count-read;
                            _position+=count-read;
                read=count;
            }else{
                Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, _arrays[_arrayNumber].Length - _posInArray);
                read+=_arrays[_arrayNumber].Length - _posInArray;
                            _position+=_arrays[_arrayNumber].Length - _posInArray;
                _arrayNumber++;
                _posInArray=0;
            }
        }
        return count;
    }

    public override long Length{
        get {
            long res = 0;
            for(int i=0;i<_arrays.Length;i++){
                res+=_arrays[i].Length;
            }
            return res;
        }
    }

    public override long Position{
        get { return _position; }
        set { throw new NotSupportedException(); }
    }

    public override bool CanRead{
        get { return true; }
    }

    public override bool CanSeek{
        get { return false; }
    }

    public override bool CanWrite{
        get { return false; }
    }

    public override void Flush(){
    }

    public override void Seek(long offset, SeekOrigin origin){
        throw new NotSupportedException();
    }

    public override void SetLength(long value){
        throw new NotSupportedException();
    }

    public override void Write(byte[] buffer, int offset, int count){
        throw new NotSupportedException();
    }       
}

解决 2^31 字节大小限制的另一种方法是 UnmanagedMemoryStream 在非托管内存缓冲区(可能与操作系统支持的大小相同)之上实现 System.IO.Stream。像这样的东西可能会起作用(未经测试):

var fileStream = new FileStream("data", 
  FileMode.Open, 
  FileAccess.Read, 
  FileShare.Read, 
  16 * 1024, 
  FileOptions.SequentialScan);
long length = fileStream.Length;
IntPtr buffer = Marshal.AllocHGlobal(new IntPtr(length));
var memoryStream = new UnmanagedMemoryStream((byte*) buffer.ToPointer(), length, length, FileAccess.ReadWrite);
fileStream.CopyTo(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
// work with the UnmanagedMemoryStream
Marshal.FreeHGlobal(buffer);

I do not believe .NET provides this, but it should be fairly easy to implement your own implementation of System.IO.Stream, that seamlessly switches backing array. Here are the (untested) basics:

public class MultiArrayMemoryStream: System.IO.Stream
{
    byte[][] _arrays;
    long _position;
    int _arrayNumber;
    int _posInArray;

    public MultiArrayMemoryStream(byte[][] arrays){
        _arrays = arrays;
        _position = 0;
        _arrayNumber = 0;
        _posInArray = 0;
    }

    public override int Read(byte[] buffer, int offset, int count){
        int read = 0;
        while(read<count){
            if(_arrayNumber>=_arrays.Length){
                return read;
            }
            if(count-read <= _arrays[_arrayNumber].Length - _posInArray){
                Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, count-read);
                _posInArray+=count-read;
                            _position+=count-read;
                read=count;
            }else{
                Buffer.BlockCopy(_arrays[_arrayNumber], _posInArray, buffer, offset+read, _arrays[_arrayNumber].Length - _posInArray);
                read+=_arrays[_arrayNumber].Length - _posInArray;
                            _position+=_arrays[_arrayNumber].Length - _posInArray;
                _arrayNumber++;
                _posInArray=0;
            }
        }
        return count;
    }

    public override long Length{
        get {
            long res = 0;
            for(int i=0;i<_arrays.Length;i++){
                res+=_arrays[i].Length;
            }
            return res;
        }
    }

    public override long Position{
        get { return _position; }
        set { throw new NotSupportedException(); }
    }

    public override bool CanRead{
        get { return true; }
    }

    public override bool CanSeek{
        get { return false; }
    }

    public override bool CanWrite{
        get { return false; }
    }

    public override void Flush(){
    }

    public override void Seek(long offset, SeekOrigin origin){
        throw new NotSupportedException();
    }

    public override void SetLength(long value){
        throw new NotSupportedException();
    }

    public override void Write(byte[] buffer, int offset, int count){
        throw new NotSupportedException();
    }       
}

Another way to workaround the size-limitation of 2^31 bytes is UnmanagedMemoryStream which implements System.IO.Stream on top of an unmanaged memory buffer (which might be as large as the OS supports). Something like this might work (untested):

var fileStream = new FileStream("data", 
  FileMode.Open, 
  FileAccess.Read, 
  FileShare.Read, 
  16 * 1024, 
  FileOptions.SequentialScan);
long length = fileStream.Length;
IntPtr buffer = Marshal.AllocHGlobal(new IntPtr(length));
var memoryStream = new UnmanagedMemoryStream((byte*) buffer.ToPointer(), length, length, FileAccess.ReadWrite);
fileStream.CopyTo(memoryStream);
memoryStream.Seek(0, SeekOrigin.Begin);
// work with the UnmanagedMemoryStream
Marshal.FreeHGlobal(buffer);
遗忘曾经 2024-09-24 20:43:25

同意。无论如何,数组大小本身有限制。

如果您确实需要在流中操作大型数组,请编写自定义内存流类。

Agree. Anyway you have limit of array size itself.

If you really need to operate huge arrays in a stream, write your custom memory stream class.

瑾兮 2024-09-24 20:43:25

我认为您可以使用以下方法使用线性结构而不是二维结构。

您可以使用 byte[int.MaxValue*10],而不是 byte[int.MaxValue][10]。您可以将 [4,5] 处的项目寻址为 int.MaxValue*(4-1)+(5-1)。 (一般公式为(i-1)*列数+(j-1)。

当然您可以使用其他约定。

I think you can use a linear structure instead of a 2D structure using the following approach.

Instead of having byte[int.MaxValue][10] you can have byte[int.MaxValue*10]. You would address the item at [4,5] as int.MaxValue*(4-1)+(5-1). (a general formula would be (i-1)*number of columns+(j-1).

Of course you could use the other convention.

阳光下慵懒的猫 2024-09-24 20:43:25

如果我正确理解你的问题,那么你有一个巨大的文件想要读入内存然后处理。但你不能这样做,因为文件中的数据量超过了任何一维数组的数据量。

您提到速度很重要,并且您有多个并行运行的线程来尽快处理数据。如果您无论如何都必须对每个线程的数据进行分区,为什么不根据覆盖所有内容所需的 byte[int.MaxValue] 缓冲区数量来确定线程数量呢?

If I understand your question correctly, you've got a massive file that you want to read into memory and then process. But you can't do this because the amount of data in the file exceeds that of any single-dimensional array.

You mentioned that speed is important, and that you have multiple threads running in parallel to process the data as quickly as possible. If you're going to have to partition the data for each thread anyway, why not base the number of threads on the number of byte[int.MaxValue] buffers required to cover everything?

对你的占有欲 2024-09-24 20:43:25

您可以创建一个内存流,然后使用方法 写入

编辑:
MemoryStream 的限制当然是应用程序存在的内存量。也许有一个限制,但如果您需要更多内存,那么您应该考虑修改您的整体架构。例如,您可以分块处理数据,或者可以对文件执行交换机制。

You can create a memoryStream and then pass the array in line by line using the method Write

EDIT:
The limit of a MemoryStream is certainly the amount of memory present for your application. Maybe there is a limit beneath that but if you need more memory, then you should consider to modify your overall architecture. E.g. you could process your data in chunks, or you could do a swapping mechanism to a file.

偷得浮生 2024-09-24 20:43:25

如果您使用的是 Framework 4.0,则可以选择使用内存映射文件。内存映射文件可以由物理文件或 Windows 交换文件支持。内存映射文件就像内存中的流一样,在需要时透明地与后备存储交换数据。

如果您没有使用 Framework 4.0,您仍然可以使用此选项,但您需要编写自己的包装器或找到现有的包装器。我希望代码项目上有很多内容。

If you are using Framework 4.0, you have the option of working with a MemoryMappedFile. Memory mapped files can be backed by a physical file, or by the Windows swap file. Memory mapped files act like an in-memory stream, transparently swapping data to/from the backing storage if and when required.

If you are not using Framework 4.0, you can still use this option, but you will need to either write your own or find an exsiting wrapper. I expect there are plenty on The Code Project.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文