在 C#/.NEt 中，动态类型占用的空间是否比对象少？

发布于 2024-10-15 05:22:58 字数 528 浏览 2 评论 0原文

我有一个控制台应用程序，允许用户指定要处理的变量。这些变量有三种类型：字符串、双精度型和长整型（其中双精度型和长整型是迄今为止最常用的类型）。用户可以指定他们喜欢的任何变量并以任何顺序，所以我的系统必须能够处理它。为此，在我的应用程序中，我一直将它们存储为对象，然后根据需要强制转换/取消强制转换它们。例如：

public class UnitResponse
{
    public object Value { get; set; }
}

我的理解是装箱对象比标准值类型占用更多的内存（大约 12 字节）。

我的问题是：使用dynamic关键字来存储这些值会更有效吗？它可能会解决装箱/拆箱问题，如果它更有效，这将如何影响性能？

编辑

为了提供一些上下文并防止“你确定你使用了足够的 RAM 来担心这个”，在我最坏的情况下，我有 420,000,000 个数据点需要担心（60 个变量 * 7,000,000 条记录）。这是我保留的有关每个变量的一堆其他数据（包括一些布尔值等）的补充。所以减少内存确实会产生巨大的影响。

原文

I have a console application that allows the users to specify variables to process. These variables come in three flavors: string, double and long (with double and long being by far the most commonly used types). The user can specify whatever variables they like and in whatever order so my system has to be able to handle that. To this end in my application I had been storing these as object and then casting/uncasting them as required. for example:

public class UnitResponse
{
    public object Value { get; set; }
}

My understanding was that boxed objects take up a bit more memory (about 12 bytes) than a standard value type.

My question is: would it be more efficient to use the dynamic keyword to store these values? It might get around the boxing/unboxing issue, and if it is more efficient how would this impact performance?

EDIT

To provide some context and prevent the "are you sure you're using enough RAM to worry about this" in my worst case I have 420,000,000 datapoints to worry about (60 variables * 7,000,000 records). This is in addition to a bunch of other data I keep about each variable (including a few booleans, etc.). So reducing memory does have a HUGE impact.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

女中豪杰 2024-10-22 05:22:58

好的，所以这里真正的问题是“我有一个巨大的数据集存储在内存中，如何在时间和内存空间上优化其性能？”

几点想法：

你讨厌和害怕拳击是绝对正确的。拳击的成本很高。首先，是的，装箱的对象会占用额外的内存。其次，装箱对象存储在堆上，而不是堆栈或寄存器中。第三，它们被垃圾收集；这些对象中的每一个都必须在 GC 时被询问，以查看它是否包含对另一个对象的引用，但它永远不会，而且这会在 GC 线程上花费大量时间。几乎可以肯定，您需要采取一些措施来避免拳击。

动态不是吗？这是拳击加上很多其他开销。（与其他动态调度系统相比，C# 的动态速度非常快，但从绝对值来看，它并不快或很小）。

这很恶心，但您可以考虑使用一个结构，其布局在各个字段之间共享内存 - 就像 C 中的联合。这样做真的很恶心并且一点也不安全但在这种情况下它会有所帮助。在网络上搜索“StructLayoutAttribute”；你会找到教程。

真的是长的、双的还是字符串的？不能是 int、float 或 string？这些数据是否真的超过数十亿数量级或精确到小数点后 15 位？ int 和 float 不能满足 99% 的情况吗？它们的尺寸只有一半。

通常我不建议使用 float 而不是 double，因为这是一种错误的经济；当人们拥有一个数字时，通常会以这种方式节省开支，就像节省四个字节就会产生影响一样。 4200 万个浮点数和 4200 万个双精度数之间的差异相当大。

您可以利用的数据是否存在规律性？例如，假设在 4200 万条记录中，每个 long 仅有 100000 个实际值，每个 double 有 100000 个值，每个字符串有 100000 个值。在这种情况下，您可以为长整型、双精度型和字符串创建某种索引存储，然后每个记录都会获得一个整数，其中低位是索引，前两位指示要从哪个存储中取出它。现在您有 4200 万条记录，每条记录都包含一个 int，并且这些值以某种非常紧凑的形式存储在其他地方。
将布尔值存储为字节中的位；编写属性来进行位移以将它们取出。这样可以节省几个字节。
请记住，内存实际上是磁盘空间； RAM 只是其上的一个方便的缓存。如果数据集太大而无法保存在 RAM 中，那么某些东西会将其分页回磁盘并稍后读回；那可能是你，也可能是操作系统。您可能比操作系统更了解数据位置。您可以以某种方便的可分页形式（例如 b 树）将数据写入磁盘，并且可以更有效地将数据保存在磁盘上，并且仅在需要时将其放入内存中。

OK, so the real question here is "I've got a freakin' enormous data set that I am storing in memory, how do I optimize its performance in both time and memory space?"

Several thoughts:

You are absolutely right to hate and fear boxing. Boxing has big costs. First, yes, boxed objects take up extra memory. Second, boxed objects get stored on the heap, not on the stack or in registers. Third, they are garbage collected; every single one of those objects has to be interrogated at GC time to see if it contains a reference to another object, which it never will, and that's a lot of time on the GC thread. You almost certainly need to do something to avoid boxing.

Dynamic ain't it; it's boxing plus a whole lot of other overhead. (C#'s dynamic is very fast compared to other dynamic dispatch systems, but it is not fast or small in absolute terms).

It's gross, but you could consider using a struct whose layout shares memory between the various fields - like a union in C. Doing so is really really gross and not at all safe but it can help in situations like these. Do a web search for "StructLayoutAttribute"; you'll find tutorials.

Long, double or string, really? Can't be int, float or string? Is the data really either in excess of several billion in magnitude or accurate to 15 decimal places? Wouldn't int and float do the job for 99% of the cases? They're half the size.

Normally I don't recommend using float over double because its a false economy; people often economise this way when they have ONE number, like the savings of four bytes is going to make the difference. The difference between 42 million floats and 42 million doubles is considerable.

Is there regularity in the data that you can exploit? For example, suppose that of your 42 million records, there are only 100000 actual values for, say, each long, 100000 values for each double, and 100000 values for each string. In that case, you make an indexed storage of some sort for the longs, doubles and strings, and then each record gets an integer where the low bits are the index, and the top two bits indicate which storage to get it out of. Now you have 42 million records each containing an int, and the values are stored away in some nicely compact form somewhere else.
Store the booleans as bits in a byte; write properties to do the bit shifting to get 'em out. Save yourself several bytes that way.
Remember that memory is actually disk space; RAM is just a convenient cache on top of it. If the data set is going to be too large to keep in RAM then something is going to page it back out to disk and read it back in later; that could be you or it could be the operating system. It is possible that you know more about your data locality than the operating system does. You could write your data to disk in some conveniently pageable form (like a b-tree) and be more efficient about keeping stuff on disk and only bringing it in to memory when you need it.

回复收藏 0 原文