Haskell 数据类型的内存占用

发布于 2024-09-10 04:31:31 字数 592 浏览 5 评论 0原文

如何找到在 Haskell 中存储某种数据类型的值所需的实际内存量（主要是使用 GHC）？是否可以在运行时对其进行评估（例如在 GHCi 中），或者是否可以从其组件估计复合数据类型的内存需求？

一般来说，如果类型 a 和 b 的内存需求已知，那么代数数据类型的内存开销是多少，例如：

data Uno = Uno a
data Due = Due a b

例如，这些代数数据类型需要多少内存字节价值观占据？

1 :: Int8
1 :: Integer
2^100 :: Integer
\x -> x + 1
(1 :: Int8, 2 :: Int8)
[1] :: [Int8]
Just (1 :: Int8)
Nothing

据我所知，由于垃圾收集延迟，实际内存分配更高。由于惰性评估（并且 thunk 大小与值的大小无关），它可能会显着不同。问题是，给定一种数据类型，其值在完全求值时占用多少内存？

我发现 GHCi 中有一个 :set +s 选项可以查看内存统计信息，但不清楚如何估计单个值的内存占用量。

原文

How can I find the actual amount of memory required to store a value of some data type in Haskell (mostly with GHC)? Is it possible to evaluate it at runtime (e.g. in GHCi) or is it possible to estimate memory requirements of a compound data type from its components?

In general, if memory requirements of types a and b are known, what is the memory overhead of algebraic data types such as:

data Uno = Uno a
data Due = Due a b

For example, how many bytes in memory do these values occupy?

1 :: Int8
1 :: Integer
2^100 :: Integer
\x -> x + 1
(1 :: Int8, 2 :: Int8)
[1] :: [Int8]
Just (1 :: Int8)
Nothing

I understand that actual memory allocation is higher due to delayed garbage collection. It may be significantly different due to lazy evaluation (and thunk size is not related to the size of the value). The question is, given a data type, how much memory does its value take when fully evaluated?

I found there is a :set +s option in GHCi to see memory stats, but it is not clear how to estimate the memory footprint of a single value.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风和你 2024-09-17 04:31:31

（以下内容适用于 GHC，其他编译器可能使用不同的存储约定）

经验法则：构造函数的标头花费一个字，每个字段花费一个字。例外：没有字段的构造函数（例如 Nothing 或 True）不占用空间，因为 GHC 创建这些构造函数的单个实例并在所有用途之间共享它。

一个字在 32 位机器上是 4 个字节，在 64 位机器上是 8 个字节。

例如，

data Uno = Uno a
data Due = Due a b

Uno 需要 2 个单词，Due 需要 3 个单词。

Int 类型现在定义为

data Int = I# Int#

Int# 占用 1 个单词，因此 Int 总共占用 2 个单词。大多数未装箱类型占用一个单词，但 Int64#、Word64# 和 Double#（在 32 位计算机上）除外，它们占用一个单词。 2. GHC 实际上有一个Int 和Char 类型的小值的缓存，因此在许多情况下这些值根本不占用堆空间。 String 仅需要列表单元格的空间，除非您使用 Chars > 255.

Int8 与 Int 具有相同的表示形式。 Integer 的定义如下：

data Integer
  = S# Int#                            -- small integers
  | J# Int# ByteArray#                 -- large integers

所以一个小的 Integer (S#) 需要 2 个单词，但是一个大的整数需要不同数量的空间，具体取决于就其价值而言。 ByteArray# 需要 2 个字（标头 + 大小）加上数组本身的空间。

请注意，使用 newtype 定义的构造函数是免费的。 newtype 纯粹是编译时的想法，并且在运行时不占用空间且不消耗任何指令。

更多详细信息，请参阅GHC 注释中堆对象的布局。

(The following applies to GHC, other compilers may use different storage conventions)

Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing or True) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.

A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.

So e.g.

data Uno = Uno a
data Due = Due a b

an Uno takes 2 words, and a Due takes 3.

The Int type is defined as

data Int = I# Int#

now, Int# takes one word, so Int takes 2 in total. Most unboxed types take one word, the exceptions being Int64#, Word64#, and Double# (on a 32-bit machine) which take 2. GHC actually has a cache of small values of type Int and Char, so in many cases these take no heap space at all. A String only requires space for the list cells, unless you use Chars > 255.

An Int8 has identical representation to Int. Integer is defined like this:

data Integer
  = S# Int#                            -- small integers
  | J# Int# ByteArray#                 -- large integers

so a small Integer (S#) takes 2 words, but a large integer takes a variable amount of space depending on its value. A ByteArray# takes 2 words (header + size) plus space for the array itself.

Note that a constructor defined with newtype is free. newtype is purely a compile-time idea, and it takes up no space and costs no instructions at run time.

More details in The Layout of Heap Objects in the GHC Commentary.

回复收藏 0 原文