fromJust 定义中的模式匹配

发布于 2024-11-14 11:21:43 字数 664 浏览 5 评论 0原文

Data.Maybe 中的函数 fromJust 是这样定义的：

fromJust          :: Maybe a -> a
fromJust Nothing  = error "Maybe.fromJust: Nothing"
fromJust (Just x) = x

根据我对模式匹配（匹配从上到下进行）的理解，我会改变两者的顺序定义。因为通常 Nothing 部分在 it-is-sure-a-Just 情况下不匹配，但在达到第二个定义之前总是会检查它。

您能澄清我的推理错误吗？谢谢。

编辑：

示例：假设我有一个文件，每行包含一百万个 Int 类型的数字，并且您在我的程序中需要这些数字（作为 Int，而不是 String）对于其他东西。

import qualified Data.ByteString.Lazy.Char8 as L

readInt = fst . fromJust . L.readInt
-- more stuff

根据上面的 fromJust 定义，我需要更多时间来读取数字，不是吗？

原文

The function fromJust in Data.Maybe is defined in this way:

fromJust          :: Maybe a -> a
fromJust Nothing  = error "Maybe.fromJust: Nothing"
fromJust (Just x) = x

According to my understanding of pattern matching (matching proceeds from top to bottom) I would change the order of the two definitions. Since usually the Nothing-part doesn't match in a it-is-sure-a-Just situation, but it is always checked before the second definition is reached.

Can you please clarify my error in reasoning? Thanks.

Edit:

Example:
Suppose I have a file with a million numbers of type Int per line and y need this numbers (as Int, not String) in my program for other stuff.

import qualified Data.ByteString.Lazy.Char8 as L

readInt = fst . fromJust . L.readInt
-- more stuff

With the above definition of fromJust I would need more time to read the numbers, wouldn't I?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

硬不硬你别怂 2024-11-21 11:21:43

我认为这个问题是关于性能的。虽然语义模式匹配是从上到下测试的，但大多数 Haskell 编译器会将 ADT 构造函数上的匹配优化为相当于 C switch 语句。

您可以认为 ADT 的数据表示有一个“标签”，它说明它是使用哪个构造函数创建的，以及每个参数的指针。例如，Nothing 可能表示为 0 (null)，Just 42 表示为 1（指向 42 的指针）代码>.

然后在这样的函数中：

squash :: Maybe (Maybe Int) -> Int
squash (Just Nothing) = 0
squash Nothing = 0
squash (Just (Just x)) = x

编译器将建立一个决策树：

squash x = 
   check tag of x:
       0 -> 0
       1 y -> check tag of y:
           0 -> 0
           1 z -> z

其中每个标记都是由跳转表或其他东西计算的，因此将 0 作为 1 检查并不昂贵。请注意，无论我们定义中模式的原始顺序是什么，都会生成相同的决策树。

然而，当使用防护而不是在构造函数上匹配时，很可能会从上到下检查模式（编译器必须非常聪明才能优化它）。因此，如果我们以这种神秘的方式编写 fromJust ：

fromJust x
    | isNothing x = error "fromJust: Nothing"
    | isJust x = case x of { Just y -> y }

那么这可能会依次检查每个构造函数，并且我们可以通过切换案例的顺序来优化。幸运的是，以一种重要的方式编写是很麻烦的:-)。

I think this question is about performance. While semantically pattern matching is tested top-to-bottom, most Haskell compilers will optimize matching on an ADT's constructors to the equivalent of a C switch statement.

You can think of the data representation for an ADT has a "tag" which says which constructor it was made with, together with a pointer for each argument. So for example, Nothing might be represented as 0 (null) and Just 42 represented as 1 (pointer to 42).

Then in a function like this:

squash :: Maybe (Maybe Int) -> Int
squash (Just Nothing) = 0
squash Nothing = 0
squash (Just (Just x)) = x

The compiler would set up a decision tree:

squash x = 
   check tag of x:
       0 -> 0
       1 y -> check tag of y:
           0 -> 0
           1 z -> z

Where each tag is computed by a jump table or something, so it is no more expensive to check against 0 as 1. Note that this same decision tree would be made no matter what the original order of the patterns in our definition.

However, when using guards instead of matching on a constructor, the patterns are most likely checked from top to bottom (the compiler would have to be very smart to optimize that). So if we wrote fromJust in this arcane way:

fromJust x
    | isNothing x = error "fromJust: Nothing"
    | isJust x = case x of { Just y -> y }

Then this would probably check for each constructor in turn, and we could optimize by switching the order of the cases. Fortunately, writing in a way that this matters is cumbersome :-).

回复收藏 0 原文

诺曦 2024-11-21 11:21:43

这里需要意识到两件重要的事情：首先，Haskell 编译器完全意识到任何值只能是 Nothing 或 Just x。因此它只会测试其中之一。其次，GHC 使用指针标记，使其能够非常快速地区分指向各种构造函数的指针（详细信息）。

结果，GHC 为模式匹配生成了非常好的代码。下面是 fromJust 返回代码的样子，直接从程序集转储中获取：

        andq $7,%rax
        cmpq $2,%rax
        jae .LcO4

它获取返回值，然后使用位掩码操作提取标记（二进制中的 7 = 111）。如果此标记为 2，则代码知道它指向 Just x 构造函数。因此它会跳转到适当的代码。否则，它只会继续处理 Nothing 的代码。

我认为甚至可以利用程序中只有一个Nothing闭包的知识来进一步优化。因此，您可以用单个地址比较来替换它，从而在实际代码中保存位屏蔽操作甚至寄存器。不知道为什么 GHC 不这样做。

There are two important things to realize here: First, the Haskell compiler is fully aware that any value can only either be Nothing or Just x. Therefore it will only ever test for one of them. Second, GHC uses pointer tagging that allows it to distinguish between pointers pointing at various constructors very fast (details).

As a result, GHC generates quite nice code for the pattern match. Here's what the fromJust return code looks like, taken directly from the assembly dump:

        andq $7,%rax
        cmpq $2,%rax
        jae .LcO4

This takes the return value, then extracts the tag using a bit mask operation (7 = 111 in binary). If this tag is 2, then the code knows that it points to a Just x constructor. Therefore it jumps to the appropriate code. Otherwise, it just continues to the code dealing with Nothing.

I think this could even be optimized further using the knowledge that there is only ever one Nothing closure in the program. Therefore you could replace that with a single address comparison, saving the bit-masking operations and even a register in the actual code. Not sure why GHC doesn't do this.

回复收藏 0 原文