使用数组字段代替大量对象

发布于 2024-11-19 10:15:49 字数 1078 浏览 3 评论 0原文

鉴于 这篇文章,我想知道人们在内存中使用数组存储数据字段而不是实例化数百万个对象并积累大量数据集(例如,>10,000,000 个对象)的体验如何内存开销(例如,每个对象 12-24 字节,具体取决于您阅读的文章)。每个属性的数据因项目而异,因此我不能使用严格的享元模式,但会设想类似的东西。

我对这种表示的想法是,有一个“模板对象”...

class Thing
{
  double A;
  double B;
  int    C;
  string D;
}

然后是一个容器对象,具有根据请求创建对象的方法...

class ContainerOfThings
{
  double[] ContainerA;
  double[] ContainerB;
  int[]    ContainerC;
  string[] ContainerD;

  ContainerOfThings(int total)
  {
    //create arrays
  }

  IThing GetThingAtPosition(int position)
  {
     IThing thing = new Thing(); //probably best done as a factory instead
     thing.A = ContainerA[position];
     thing.B = ContainerB[position];
     thing.C = ContainerC[position];
     thing.D = ContainerD[position];

     return thing;
  }
}

所以这是一种简单的策略,但不是很通用,例如可以如果不复制数据并违背数组字段存储的目的,就不能创建“事物”的子集(作为列表)。我一直无法找到好的例子,所以我希望有更好的方法来处理这种情况的人提供的链接或代码片段......或者更好的想法。

In light of this article, I am wondering what people's experiences are with storing massive datasets (say, >10,000,000 objects) in-memory using arrays to store data fields instead of instantiating millions of objects and racking up the memory overhead (say, 12-24 bytes per object, depending which article you read). Data per property varies from item to item so I can't use a strict Flyweight pattern but would envision something similar.

My idea of this sort of representation is that one has a 'template object'...

class Thing
{
  double A;
  double B;
  int    C;
  string D;
}

And then a container object with a method of creating an object on request...

class ContainerOfThings
{
  double[] ContainerA;
  double[] ContainerB;
  int[]    ContainerC;
  string[] ContainerD;

  ContainerOfThings(int total)
  {
    //create arrays
  }

  IThing GetThingAtPosition(int position)
  {
     IThing thing = new Thing(); //probably best done as a factory instead
     thing.A = ContainerA[position];
     thing.B = ContainerB[position];
     thing.C = ContainerC[position];
     thing.D = ContainerD[position];

     return thing;
  }
}

So that's a simple strategy but not very versatile, for example one can't create a subset (as a List) of 'Thing' without duplicating data and defeating the purpose of array field storage. I haven't been able to find good examples, so I would appreciate either links or code snippets of better ways to handle this scenario from someone who's done it...or a better idea.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

作业与我同在 2024-11-26 10:15:49

这取决于你的具体场景。根据创建对象的频率,您可以:

  1. 如果对象是可序列化的,则将它们保存在 MemoryMappedFile 中(获得中/低性能和低内存消耗的某种融合)。

  2. 在不同对象之间映射字段:我的意思是,如果对象最初有默认值,则将它们全部放在单独的基中,如果该值与默认值不同,则真正分配一个新空间。 (这对于引用类型自然是有意义的)。

  3. 另一个解决方案再次将对象保存到 SqlLite 库。比 MemoryMappedFiles 更容易管理,因为您可以使用简单的 SQL。

选择取决于您,因为这取决于您的具体项目要求。

问候。

It depends on your concrete scenario. Depends on how often your objects are created, you can:

  1. If objects are serializable save them in MemoryMappedFile (obtaining some fusion of middle/low performance and low memory consumption).

  2. Map th fields between different objects: I mean if object initially have default values, have all them in separate base and really allocate a new space if that value becomes different from default one. (this make sense for reference types naturally).

  3. Another solution again save objects to SqlLite base. Much easier to manage than MemoryMappedFiles as you can use simple SQL.

The choice is up to you, as it depends on your concrete project requierements.

Regards.

篱下浅笙歌 2024-11-26 10:15:49

根据这篇文章,我想知道人们在内存中使用数组存储数据字段而不是实例化数百万个对象并增加内存开销来存储大量数据集(例如,>10,000,000 个对象)的体验如何。 ..

我我猜有几种方法可以解决这个问题,实际上您正在寻找一种可能的解决方案来限制内存中的数据。然而,我不确定你的结构是否减少了 24 个? bytes 会给你带来很多好处。您的结构约为 79 个字节(对于 15 个字符的字符串)= 8 + 8 + 4 + 24? + 4 + 1 + (2 * 字符长度) 因此您的总增益最多为 25%。这似乎不是很有用,因为您必须处于内存适合 1000 万 * 80 字节而 1000 万 * 100 字节不适合的位置。这意味着您设计的解决方案处于灾难边缘,太多大字符串,或太多记录,或某些其他程序占用内存,并且您的机器内存不足。

如果您需要支持对 n 个小记录的随机访问(其中 n = 1000 万),那么您的设计目标应该是至少 2n 或 10n。也许你已经在你的 1000 万中考虑这个了?无论哪种方式,都有很多技术可以支持此类数据的访问。

一种可能性是,如果字符串的最大长度 (ml) 受到限制,并且大小合理(例如 255),那么您可以转到简单的 ISAM 存储。每条记录的大小为 8 + 8 + 4 + 255 字节,您可以简单地偏移到平面文件中来读取它们。如果记录大小可变或可能很大,那么您将需要为此使用不同的存储格式并将偏移量存储到文件中。

另一种可能性是,如果您通过某个键查找值,那么我会推荐诸如嵌入式数据库或 BTree 之类的东西,您可以禁用某些磁盘一致性以获得性能。碰巧我写了一个 BPlusTree 用于客户端缓存大量数据。有关使用 B+Tree 的详细信息位于此处

In light of this article, I am wondering what people's experiences are with storing massive datasets (say, >10,000,000 objects) in-memory using arrays to store data fields instead of instantiating millions of objects and racking up the memory overhead...

I guess there are several ways to approach this, and indeed you are onto a possible solution to limit the data in memory. However, I'm not sure that reducing your structure by even 24? bytes is going to do you a whole lot of good. Your structure is around 79 bytes (for a 15 char string) = 8 + 8 + 4 + 24? + 4 + 1 + (2 * character length) so your total gain is at best 25%. That doesn't seem very useful since you'd have to be in a position where 10 million * 80 bytes fits in memory and 10 million * 100 bytes does not. That would mean that your designing a solution that is on the edge of disaster, too many large strings, or too many records, or some other program hogging memory and your machine is out of memory.

If you need to support random access to n small records, where n = 10 million, then you should aim to design for at least 2n or 10n. Perhaps your already considering this in your 10 million? Either way there are plenty of technologies that can support this type of data being accessed.

One possibility is if the string is limited in Max Length (ml), of a reasonable size (say 255) then you can go to a simple ISAM store. Each record would be 8 + 8 + 4 + 255 bytes and you can simply offset into a flat file to read them. If the record size is variable or possibly large then you will want to use a different storage format for this and store offsets into the file.

Another possibility is if your looking up values by some key then I would recommend something like an embedded database, or BTree, one you can disable some of the disk consistency to gain the performance. As it happens I wrote a BPlusTree for client-side caches of large volumes of data. Detailed information on using the B+Tree are here.

琴流音 2024-11-26 10:15:49

实际上,ADO.NET DataTable 使用类似的方法来存储数据。也许你应该看看它是如何实现的。
因此,您需要有一个类似 DataRow 的对象,该对象在内部保存指向表的指针和行数据的索引。这将是我认为最轻量级的解决方案。

在你的情况下:
a) 如果每次调用 GetThingAtPosition 方法时都在构造事物,则在堆中创建对象,这会使表中已有的信息加倍。加上“对象开销”数据。

b) 如果您需要访问 ContainerOfThings 中的每个项目,则所需内存将加倍 + 12 字节 * 对象开销数量。在这种情况下,最好有一个简单的数组,而不是动态创建它们。

Actually the ADO.NET DataTable uses similar approach to store the data. Maybe you should look how it is implemented there.
So, you'll need to have a DataRow-like object that internally holds pointer to Table and index of the row data. This would be the most lightweight solution I beleive.

In your case:
a) If you are constructing the Thing each time you call the GetThingAtPosition method you create the object in the heap, that doubles information that is already in your table. Plus "object overhead" data.

b) If you need to access each item in your ContainerOfThings the required memory will be doubled + 12bytes * number of objects overhead. In such scenario it would be better to have a simple array of things without creating them on-the-fly.

浮云落日 2024-11-26 10:15:49

你的问题说明有问题。内存使用是否被证明是一个问题?

如果每个项目 100 字节,那么听起来就像 1GB。所以我想知道该应用程序以及这是否有问题。该应用程序是否要在具有 8GB 或 RAM 的专用 64 位机器上运行?

如果存在恐惧,您可以通过集成测试来测试恐惧。实例化其中 2000 万个项目并运行一些性能测试。

但当然,这一切都来自于应用程序领域。我有专门的应用程序,使用比这更多的内存,并且运行良好。硬件成本通常远低于软件成本(是的,这又归结为应用程序领域)。

再见

Your question implies there is a problem. Has the memory usage proved to be a problem?

If 100 bytes per item then it sounds like 1GB. So I'm wondering about the app and if this is a problem. Is the app to run on a dedicated 64 bit box with, say, 8GB or ram?

If there is a fear, you could test the fear by an integration test. Instantiate say 20 million of these items and run some performance tests.

But of course it does all come down the app domain. I have had specialised apps that use more RAM than this and have worked fine. Cost of hardware is often way less than the cost of software (yea it comes down to app domain again).

See ya

酒废 2024-11-26 10:15:49

不幸的是,面向对象无法消除性能问题(带宽饱和就是其中之一)。这是一个方便的范例,但它也有局限性。

我喜欢你的想法,我也使用这个......你猜怎么着,我们不是第一个想到这个的;-)。我发现这确实需要一些思维转变。

我可以推荐您到 J 社区吗?请参阅:

http://www.JSoftware.com

那不是 C#(或 Java)组。他们是一群好人。通常,数组需要被视为第一类对象。在 C# 中,它就没有那么灵活了。使用 C# 的结构可能会令人沮丧。

对于大型数据集问题,有各种面向对象的模式……但如果您问这样的问题,可能是时候采用更实用的方法了。或者至少可以用于解决问题/原型设计。

Unfortunately, OO can't abstract away the performance issues (saturation of bandwidth being one). It's a convenient paradigm, but it comes with limitations.

I like your idea, and I use this as well... and guess what, we're not the first to think of this ;-). I've found that it does require a bit of a mind shift though.

May I refere you to the J community? See:

http://www.JSoftware.com.

That's not a C# (or Java) group. They're a good bunch. Typically the array needs to be treated as a first class object. In C#, it's not nearly as flexible. It can be a frustrating structure to work withing C#.

There are various OO patterns for large dataset problems... but if you are asking a question like this, probably it is time to go a little more functional. Or at least functional for problem solving / prototyping.

无畏 2024-11-26 10:15:49

我为rapidSTORM项目做了这样的事情,其中​​需要缓存数百万个稀疏对象(定位显微镜)。虽然我无法真正为您提供好的代码片段(太多依赖项),但我发现使用 Boost Fusion 实现非常快速且简单。融合结构,为每个元素类型构建一个向量,然后为该向量编写一个非常简单的访问器来重建每个元素。

(哦,我刚刚注意到你标记了这个问题,但也许我的 C++ 答案也有帮助)

I've done such a thing for the rapidSTORM project, where several million sparsely populated objects need to be cached (localization microscopy). While I can't really give you good code snippets (too many dependencies), I found that the implementation was very quick and straightforward with Boost Fusion. Fusionized the structure, built a vector for each element type, and then wrote a quite straightforward accessor for that vector that reconstructed each element.

(D'oh, I just noticed that you tagged the question, but maybe my C++ answer helps as well)

游魂 2024-11-26 10:15:49

[更新 2011-07-19]

现在有一个新版本:http://www.mediafire.com/file/74fxj7u1n0ppcq9/MemStorageDemo-6639584-2011_07_19-12_47_00.zip

我仍在尝试调试一些令人讨厌的引用计数,但来自新鲜的 xUnit 会话,我能够运行一个创建 1000 万个对象的测试(它发生在我身上)我现在已经缩小了字符串大小以进行测试,但我让它运行了 1000 万个长度为 3 到 15 字节的可变长度的字符串,我还没有机会在我的系统上尝试比这更大的字符串。大约 1.95G 负载到 2.35G 的 1000 万个对象,除了使用实际托管字符串的非常简单的支持类之外,我仍然没有对字符串做任何事情

,所以,我认为它工作得相当好,尽管如此。后备存储肯定还需要优化,而且我还认为,如果有必要,可以在迭代器上做一些工作,具体取决于您一次处理的数据量。不幸的是,直到明天晚些时候或第二天才能再次查看它。

无论如何,基本思想如下:

  1. MemoryArray 类:使用由 Marhsal.AllocHGlobal() 分配的非托管内存来存储结构。我正在使用一个大的 MemoryArray 进行测试,但成员方面只有几个字段,我认为只要您保持数组大小相当大,内存消耗就不会有太大差异把它分开。 MemoryArray 实现了 IEnumerable,这是我在测试中用来填充数组的方法。

MemoryArray 旨在保存来自任何受支持对象的常规大小的数据块。您可以使用我尚未实现的指针数学对枚举器执行一些操作。我目前每次都会返回新对象,所以这是我认为大小的很大一部分。在我基于此原型的原型类中,我能够使用非常常规的指针数学来进行遍历,但是我认为我这样做的方式主要对非常快速的遍历有用,并且可能不适用于互操作性。

MemoryArray 只有一个标准索引器,它使用 IntPtr 上的指针数学来获取请求的元素,该指针表示在构造时分配的非托管数据的头部。我还实现了一个名义上的 2D 索引器,您可以在其中将表示维度的整数数组传递给数组,然后可以对其执行 A[x,y]。这只是一个简单的小例子来说明它是如何工作的。

我还没有实现的一件事是任何形式的分段,但我确实认为分段适合该项目,所以当我有机会时我可能会实现它。

  1. MemoryArrayEnumerator 类:我选择实现实际的枚举器而不是枚举器函数。枚举器类基本上只接受一个MemoryArray,然后为枚举器提供库存函数以返回实际的MemoryArrayItem对象。

  2. MemoryArrayItem 类:该类除了根据数组中的开始和位置保存适当的指针信息之外,没有做太多事情。它是在该对象之上(到侧面?)实现的特定类,它们实际上执行指针操作以获取数据。

然后还有一些支持类,MemoryStringArray是可变大小的内存后备块,暂时只处理字符串,然后是自动处置类(AutoDisposer )和一个处理附加和分离的通用类。 (自动引用)。

现在,在此基础之上是特定的类。这三种类型(数组/枚举器/项目)中的每一种都是专门针对您正在查看的对象实现的。在我放弃的这个项目的更大规模的版本中,这是一个峰值,我对偏移量等有更通用的处理,这样你就不会那么依赖于具体的类,但即使它们是这样的非常有用;我最初的实现都是具体的类,没有真正的基础。

目前,我将它们全部实现为单独的类,您可以将引用传递给它们;因此,TestArray 类在其构造函数中传递了一个 MemoryArray 类。与枚举器和项目相同。我知道那里会节省一些处理能力,而且我认为如果我能找到一种好方法将它们实现为后代而不是仅仅拥有底层类的副本,那么也很有可能节省空间。不过,我想先获得基本的感觉,这似乎是最直接的方法。问题在于它是另一层间接。

TestArrayTestArrayEnumerator 事实证明,除了传递 MemoryArrayMemoryArrayEnumerator 的功能之外,并没有做太多事情。这些类中的主要问题只是将指针分割并传递到正在使用它的项目中。

但是,TestArrayItem 是指针实际切换为真实数据的地方;这是文件。我剪下了一大段评论,其中包含一些选项,以便更好地处理可变长度后备存储(仍在上面给出的链接中的实际文件中),请原谅我在其中留下注释的大量评论当我处理它时我在想什么:)

TestArrayItem.cs

// ----------------------------------------------
//      rights lavished upon all with love
//         see 'license/unlicense.txt'
//   ♥ 2011, shelley butterfly - public domain
// ----------------------------------------------

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace MemStorageDemo
{
    public unsafe class TestArrayItem
    {
        public static MemoryStringArray s_MemoryArrayStore = new MemoryStringArray();

        public static int TestArrayItemSize_bytes =
            sizeof(double) * 2
            + sizeof(int)
            + sizeof(int);

        // hard-coding here; this is another place that things could be a little more generic if you wanted and if 
        // performance permitted; for instance, creating a dictionary of offsets based on index.  also, perhaps
        // a dictionary with key strings to allow indexing using field names of the object.
        private enum EFieldOffset
        {
            DoubleTheFirstOffset       =  0,
            DoubleTheSecondOffset      =  8,
            IntTheFirstOffset          = 16,
            StringTheFirstHandleOffset = 20
        }

        private MemoryArrayItem myMemoryArrayItem;
        private MemoryStringArray myStringStore;

        // constructor that uses the static string array store
        public TestArrayItem(MemoryArrayItem parMemoryArrayItem) :
            this(parMemoryArrayItem, s_MemoryArrayStore)
        {
        }

        // constructor for getting the item at its memory block without any initialization (e.g. existing item)
        public TestArrayItem(MemoryArrayItem parMemoryArrayItem, MemoryStringArray parStringStore)
        {
            myMemoryArrayItem = parMemoryArrayItem;
            myStringStore = parStringStore;
        }

        // constructor for geting the item at its memory block and initializing it (e.g. adding new items)
        public TestArrayItem(MemoryArrayItem parMemoryArrayItem, double parDoubleTheFirst, double parDoubleTheSecond, int parIntTheFirst, string parStringTheFirst)
        {
            myMemoryArrayItem = parMemoryArrayItem;

            DoubleTheFirst = parDoubleTheFirst;
            DoubleTheSecond = parDoubleTheSecond;
            IntTheFirst = parIntTheFirst;
            StringTheFirst = parStringTheFirst;
        }

        // if you end up in a situation where the compiler isn't giving you equivalent performance to just doing
        // the array math directly in the properties, you could always just do the math directly in the properties.
        //
        // it reads much cleaner the way i have it set up, and there's a lot less code duplication, so without 
        // actually determining empirically that i needed to do so, i would stick with the function calls.
        private IntPtr GetPointerAtOffset(EFieldOffset parFieldOffset)
            { return myMemoryArrayItem.ObjectPointer + (int)parFieldOffset; }

        private double* DoubleTheFirstPtr 
            { get { return (double*)GetPointerAtOffset(EFieldOffset.DoubleTheFirstOffset); } }
        public double DoubleTheFirst
        {
            get
            {
                return *DoubleTheFirstPtr;
            }

            set
            {
                *DoubleTheFirstPtr = value;
            }
        }

        private double* DoubleTheSecondPtr
            { get { return (double*)GetPointerAtOffset(EFieldOffset.DoubleTheSecondOffset); } }
        public double DoubleTheSecond
        {
            get
            {
                return *DoubleTheSecondPtr;
            }
            set
            {
                *DoubleTheSecondPtr = value;
            }
        }

        // ahh wishing for a preprocessor about now
        private int* IntTheFirstPtr
            { get { return (int*)GetPointerAtOffset(EFieldOffset.IntTheFirstOffset); } }
        public int IntTheFirst
        {
            get
            {
                return *IntTheFirstPtr;
            }
            set
            {
                *IntTheFirstPtr = value;
            }
        }

        // okay since we're using the StringArray backing store in the example, we just need to get the
        // pointer stored in our blocks, and then copy the data from that address 
        private int* StringTheFirstHandlePtr 
            { get { return (int*)GetPointerAtOffset(EFieldOffset.StringTheFirstHandleOffset); } }
        public string StringTheFirst
        {
            get
            {
                return myStringStore.GetString(*StringTheFirstHandlePtr);
            }
            set
            {
                myStringStore.ModifyString(*StringTheFirstHandlePtr, value);
            }
        }

        public void CreateStringTheFirst(string WithValue)
        {
            *StringTheFirstHandlePtr = myStringStore.AddString(WithValue);
        }

        public override string ToString()
        {
            return string.Format("{0:X8}: {{ {1:0.000}, {2:0.000}, {3}, {4} }} {5:X8}", (int)DoubleTheFirstPtr, DoubleTheFirst, DoubleTheSecond, IntTheFirst, StringTheFirst, (int)myMemoryArrayItem.ObjectPointer);
        }
    }
}

所以,这才是真正的魔力;只是基本上实现根据字段信息找出正确指针的函数。就目前情况而言,我认为它是代码生成的一个非常好的候选者,假设我让附加/分离的东西正常工作。我隐藏了很多必须使用自动类型指针进行手动内存管理的内容,我认为从长远来看,这是值得进行调试的......

无论如何,仅此而已,我希望今晚或明天能回到互联网,我会尽力办理入住手续。不幸的是,我要把电缆调制解调器还给电缆公司,所以除非我们去麦当劳或其他什么地方,否则不会营业:)希望这对您有所帮助;我将用它来调试问题,以便至少有一个可以工作的功能基础;我知道这不是我第一次考虑编写这样的库,我想其他人也有过这种想法。


在之前的内容中,

我使用了类似的内容来使用我创建的 COM 库来与预编译的 win32/64 FFTW dll 进行互操作。对于这个问题,我们确实需要一些比我拥有的更通用的东西,所以我上周开始研究一些东西,它足以作为这些类型的用途的一个像样的通用库,具有可扩展的内存管理,多维度,切片,好吧

,昨天我终于向自己承认(a)还需要几天时间才能准备好,(b)我需要一个尖峰解决方案来弄清楚最后的一些内容。因此,我决定在较低的抽象级别上进行第一次切割,试图满足您问题中提到的需求。不幸的是,我们即将搬家,我必须收拾行李,但我认为这可能不足以解决您的问题。

我将在本周继续研究这个示例,我将在此处更新新代码,我将尝试从中提取足够的信息来发表一篇文章来解释它,但如果您仍然感兴趣,这是最后一个我'可能会在本周晚些时候发布:

http://www.mediafire.com/file/a7yq53ls18q7bvf/EfficientStorage-6639584.zip

它只是一个 VS2010 解决方案,具有做我认为满足您的需求的基本必需品。有很多评论,但请随意提出问题,我会在有互联网后立即回来查看......无论哪种方式,这都是一个有趣的项目。

完成示例后,我打算完成第一个迭代完整库并将其发布到某个地方;当我有链接时,我会在这里更新。

强制性警告:它可以编译但尚未经过测试;我确信存在问题,这是一个相当大的话题。

[update 2011-07-19]

there's a new version available now: http://www.mediafire.com/file/74fxj7u1n0ppcq9/MemStorageDemo-6639584-2011_07_19-12_47_00.zip

i am still trying to debug some of the refcounting which is annoying, but from a fresh xUnit session, i am able to run a test that creates 10 million objects (it occurrs to me i have the string size cut down right now for testing but i had it running 10 million with strings of variable length from 3 to 15 bytes, i haven't had a chance to try bigger than that yet. on my system i was going from approx ~1.95G load to ~2.35G with the 10 million objects and i still haven't done anything about the strings other than a very simple backing class that uses actual managed strings.

anyway, so, i think it's fairly well working, although there's definitely optimizing left to be done on the backing store, and i also think a bit of work can be done on the iterators if necessary, depending on how much data you process at once. unfortunately just not going to be able to look at it again until probably late tomorrow or next day.

anyway, so here's the basic ideas:

  1. MemoryArray class: uses unmanaged memory allocated by Marhsal.AllocHGlobal() to store the structures. i was testing with one big MemoryArray but member-wise there are only a few fields and i think as long as you kept the array size fairly large it's not going to be much of a difference in memory consumption to split it up. MemoryArray implements IEnumerable<MemoryArrayItem> which is what i've been using to fill the arrays in testing.

MemoryArray is intended for holding the regularly-sized chunks of data from any object being backed. there are some things you can do with the enumerators using pointer math that i have not yet implemented. i am currently returning new objects every time, so that's a big chunk of the size i believe. in the prototype class i based this prototype on, i was able to use very regular pointer math for doing traversals, however i think the way i was doing it would be mainly useful for very quick traversals and probably not for interoperability.

MemoryArray has just a standard indexer that grabs the requested element using pointer math on the IntPtr that represents the head of the unmanaged data which is allocated upon construction. I have also implemented a notional 2D indexer, where you can pass in an array of ints representing dimensions to the Array, and then can perform a A[x,y] on it. it's just a quick little example of how it could work.

one thing i haven't implemented yet is any sort of subsectioning, but i do think a subsection is appropriate for the project so i will probably implement that when i get a chance.

  1. MemoryArrayEnumerator class: i chose to implement actual enumerators instead of enumerator functions. the enumerator class basically just takes a MemoryArray and then provides the stock functions for enumerators to return the actual MemoryArrayItemobjects.

  2. MemoryArrayItem class: this class doesn't do much other than just hold the appropraite pointer information based off the start and position in the array. it's the specific classes that get implemented on top of (to the side of?) this object that actually do the pointer stuff to get the data out.

then there are a few more support classes, MemoryStringArray is the variable-sized memory backing chunk that only does strings for the time being, and then there's the auto disposing class (AutoDisposer) and a generic class to handle the attaches and detaches. (AutoReference<>).

now, on top of this foundation are the specific classes. each of the three types (array/enumerator/item) are implemented specifically for the objects you're looking at. in the more massive version of this project that i gave up on and that this is a spike for, i have more generic handling of the offsets and such so that you're not so tied to concrete classes, but even as they are they are pretty useful; my original implementation was all concrete classes with no real base.

currently i have them all implemented as separate classes that you pass references to; so, TestArray class is passed a MemoryArray class in its constructor. same with enumerator and item. i know there would be some processing power to be saved there, and i think there's a good possibility for space savings as well, if i can figure out a good way to implement them as descendants rather than just having a copy of the underlying classes. i wanted to get the basic feel for it first though, and that seemed to be the most straightforward way to go. the issue is that it's another layer of indirection.

TestArray and TestArrayEnumerator turned out not to do too much other than just pass through the functionality of MemoryArray and MemoryArrayEnumerator. the main issue in those classes is just getting the pointers carved up and passed out and into the items that are using it.

but, so TestArrayItem is where the pointers are actually switched into real data; here's the file. i snipped a big section of comments that goes through some options for a better way to handle the variable-length backing store (still in the actual file in the link given above), and please excuse the multitude of comments where i leave myself notes about what i'm thinking as i work on it :)

TestArrayItem.cs

// ----------------------------------------------
//      rights lavished upon all with love
//         see 'license/unlicense.txt'
//   ♥ 2011, shelley butterfly - public domain
// ----------------------------------------------

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace MemStorageDemo
{
    public unsafe class TestArrayItem
    {
        public static MemoryStringArray s_MemoryArrayStore = new MemoryStringArray();

        public static int TestArrayItemSize_bytes =
            sizeof(double) * 2
            + sizeof(int)
            + sizeof(int);

        // hard-coding here; this is another place that things could be a little more generic if you wanted and if 
        // performance permitted; for instance, creating a dictionary of offsets based on index.  also, perhaps
        // a dictionary with key strings to allow indexing using field names of the object.
        private enum EFieldOffset
        {
            DoubleTheFirstOffset       =  0,
            DoubleTheSecondOffset      =  8,
            IntTheFirstOffset          = 16,
            StringTheFirstHandleOffset = 20
        }

        private MemoryArrayItem myMemoryArrayItem;
        private MemoryStringArray myStringStore;

        // constructor that uses the static string array store
        public TestArrayItem(MemoryArrayItem parMemoryArrayItem) :
            this(parMemoryArrayItem, s_MemoryArrayStore)
        {
        }

        // constructor for getting the item at its memory block without any initialization (e.g. existing item)
        public TestArrayItem(MemoryArrayItem parMemoryArrayItem, MemoryStringArray parStringStore)
        {
            myMemoryArrayItem = parMemoryArrayItem;
            myStringStore = parStringStore;
        }

        // constructor for geting the item at its memory block and initializing it (e.g. adding new items)
        public TestArrayItem(MemoryArrayItem parMemoryArrayItem, double parDoubleTheFirst, double parDoubleTheSecond, int parIntTheFirst, string parStringTheFirst)
        {
            myMemoryArrayItem = parMemoryArrayItem;

            DoubleTheFirst = parDoubleTheFirst;
            DoubleTheSecond = parDoubleTheSecond;
            IntTheFirst = parIntTheFirst;
            StringTheFirst = parStringTheFirst;
        }

        // if you end up in a situation where the compiler isn't giving you equivalent performance to just doing
        // the array math directly in the properties, you could always just do the math directly in the properties.
        //
        // it reads much cleaner the way i have it set up, and there's a lot less code duplication, so without 
        // actually determining empirically that i needed to do so, i would stick with the function calls.
        private IntPtr GetPointerAtOffset(EFieldOffset parFieldOffset)
            { return myMemoryArrayItem.ObjectPointer + (int)parFieldOffset; }

        private double* DoubleTheFirstPtr 
            { get { return (double*)GetPointerAtOffset(EFieldOffset.DoubleTheFirstOffset); } }
        public double DoubleTheFirst
        {
            get
            {
                return *DoubleTheFirstPtr;
            }

            set
            {
                *DoubleTheFirstPtr = value;
            }
        }

        private double* DoubleTheSecondPtr
            { get { return (double*)GetPointerAtOffset(EFieldOffset.DoubleTheSecondOffset); } }
        public double DoubleTheSecond
        {
            get
            {
                return *DoubleTheSecondPtr;
            }
            set
            {
                *DoubleTheSecondPtr = value;
            }
        }

        // ahh wishing for a preprocessor about now
        private int* IntTheFirstPtr
            { get { return (int*)GetPointerAtOffset(EFieldOffset.IntTheFirstOffset); } }
        public int IntTheFirst
        {
            get
            {
                return *IntTheFirstPtr;
            }
            set
            {
                *IntTheFirstPtr = value;
            }
        }

        // okay since we're using the StringArray backing store in the example, we just need to get the
        // pointer stored in our blocks, and then copy the data from that address 
        private int* StringTheFirstHandlePtr 
            { get { return (int*)GetPointerAtOffset(EFieldOffset.StringTheFirstHandleOffset); } }
        public string StringTheFirst
        {
            get
            {
                return myStringStore.GetString(*StringTheFirstHandlePtr);
            }
            set
            {
                myStringStore.ModifyString(*StringTheFirstHandlePtr, value);
            }
        }

        public void CreateStringTheFirst(string WithValue)
        {
            *StringTheFirstHandlePtr = myStringStore.AddString(WithValue);
        }

        public override string ToString()
        {
            return string.Format("{0:X8}: {{ {1:0.000}, {2:0.000}, {3}, {4} }} {5:X8}", (int)DoubleTheFirstPtr, DoubleTheFirst, DoubleTheSecond, IntTheFirst, StringTheFirst, (int)myMemoryArrayItem.ObjectPointer);
        }
    }
}

so, that's the real magic; just basically implementing functions that figure out the right pointers based on the info about the fields. as it stands i think it's a pretty good candidate for code gen, well, assuming i get the attach/detach stuff working right. i hid a lot of having to do manual memory management using auto-type pointers and i think it will be worth the debug in the long run...

anyway, that's about it, i'll hopefully be back with internet this evening or tomorrow, and i'll do my best to check in. unfortunately about to go give the cable modem back to the cable company so won't be in business unless we go to a mcdonalds or something :) hope this has been some sort of help; i'll debug the issues with it so that there's at least a functional base to work from; i know this isn't the first time i've thought about writing a library like this and i imagine others have too.


previous content

i have used something like this for using a COM library that i created to interop with the pre-compiled win32/64 FFTW dll. for this question we really need something a little more generic than what i had, so i started working on something last week that would be sufficient as a decent generic library for these types of uses, with extensible memory management, multi-dimensions, slicing, etc.

well, i finally admimtted to myself yesterday that (a) it was going to be another few days before it was ready and (b) i needed a spike solution to figure out some of the last bits anyway. so, i decided to just do a first cut at a lower level of abstraction that attempts to meed the needs addressed in your question. unfortunately we are about to move and i have to do some packing but i think i'ts probably enough to address your question.

i am going to continue work on the example this week and i will update here with new code and i will attempt to distill enough info from it to make a post that explains it, but if you're still interested, here's the last i'll probably be able to post until later in the week:

http://www.mediafire.com/file/a7yq53ls18q7bvf/EfficientStorage-6639584.zip

it's just a VS2010 solution with the basic necessities for doing what i think meets your needs. there are lots of comments sprinkled through, but feel free to ask questions and i will check back as soon as i have internet... either way it's been a fun project.

after completing the example i intend to finish off the first iteration full library and release it somewhere; i will update here with the link when i have it.

obligatory warning: it compiles but has not been tested; i'm sure there are issues, it's a pretty big topic.

窝囊感情。 2024-11-26 10:15:49

您可以为类型中的每个属性创建一个 System.Array 数组,其中包含一个元素。这些子数组的大小等于您拥有的对象的数量。属性访问将为:

masterArray[propertyIndex][objectIndex]

这将允许您使用值类型数组而不是对象数组。

You make an Array of System.Array with an element for each property in your type. The size of these sub-arrays is equal to the number of objects you have. Property access would be:

masterArray[propertyIndex][objectIndex]

This will allow you to use value type arrays instead of arrays of object.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文