在惰性函数语言的模板实例化评估器中实现“case”表达式是否困难？

发布于 2025-01-14 11:07:15 字数 2169 浏览 0 评论 0原文

我正在关注“实现函数式语言：SPJ 的教程”，我被困在练习 2.18（第 70 页）上，复制如下。这是关于书中描述的简单惰性函数语言的模板实例化评估器的章节（类似于迷你 Miranda/Haskell）：

练习 2.18。为什么很难将 case 表达式引入模板实例化机？（提示：想想instantiate会对case表达式做什么。）

本教程接着介绍解构结构化数据的几个不太通用版本的实现： if 原语、casePair 原语和 caseList 原语。我还没有完成本节的实现（第 2 章标记 5），但我不明白为什么单独实现这些比实现单个 case 原语要容易得多。

我能提供的唯一合理的解释是，最通用的 case 形式在替代数量（要匹配的标签数量）和数量（结构化数据的参数数量）方面都是可变的。所有上述原语都是固定数量的，并且具有已知数量的替代方案。然而，我不明白为什么这会使实施变得更加困难。

case 语句的实例化非常简单：

实例化受检者。
~~实例化每个替代方案的主体表达式。（如果我们替换为未评估的分支，这可能有点浪费。）~~（我现在注意到这可能是一个问题，将在答案中发布。）

将结果封装在新的节点类型中，

NCase，其中：数据节点 = NAp Addr Addr
          | ...
          | NCases [(Int, [名称], 地址)]

从操作上来说，case 语句的减少非常简单。

检查参数是否被评估。
- 如果不是，则将其设为新堆栈并将当前堆栈推送到转储。（类似于评估任何原语的参数。）
如果评估了参数，则搜索具有匹配标签的替代项。
- 如果没有找到具有匹配标签的替代项，则抛出错误（不详尽的 case 分支）。
使用结构化数据参数增强的环境来实例化匹配替代方案的主体。（例如，在 case Pack {0, 2} 3 4 in <0> ab -> a + b 中，使用环境 [ 实例化 a + b a <- 3, b <- 4])

对于包含替代列表的情况 (NCases)，可能必须引入新的节点类型，但这并不是太劝阻。

我找到了一个 GitHub 存储库 @bollu/timi ，它似乎也按照本教程实现模板实例化评估器。有一个名为 “缺少 lambda 和 case” 的部分，它将缺少通用 case 语句归因于以下原因：

Case 要求我们有一些模式匹配/解构的概念，而这在本机中是不存在的。

然而，在本教程中没有模式匹配的概念；而是没有模式匹配的概念。我们只是通过标签号（整数）进行匹配，所以我不确定这个解释是否有效。

_{除此之外，部分是为了我自己：a在本教程的下一章中，关于对 case 语句的特殊处理，提出了非常类似的问题（涉及 G 机器而不是模板实例化）。}

原文

I'm following "Implementing functional languages: a tutorial" by SPJ, and I'm stuck on Exercise 2.18 (page 70), reproduced below. This is in the chapter about a template-instantiation evaluator for the simple lazy functional language described in the book (similar to a mini Miranda/Haskell):

Exercise 2.18. Why is it hard to introduce case expressions into the template instantiation machine?
(Hint: think about what instantiate would do with a case expression.)

The tutorial then goes on to cover an implementation of several less-general versions of destructuring structured data: an if primitive, a casePair primitive, and a caseList primitive. I haven't yet done the implementation for this section (Chapter 2 Mark 5), but I don't see why implementing these separately would be significantly easier than implementing a single case primitive.

The only plausible explanations I can offer is that the most generic case form is variadic in both number of alternatives (number of tags to match against) and arity (number of arguments to the structured data). All of the above primitives are fixed-arity and have a known number of alternatives. I don't see why this would make implementation significantly more difficult, however.

The instantiation of the case statement is straightforward:

Instantiate the scrutinee.
~~Instantiate the body expression of each alternative. (This may be somewhat wasteful if we substitute in unevaluated branches.)~~ (I notice now this may be a problem, will post in an answer.)

Encapsulate the result in a new node type, NCase, where:

data Node = NAp Addr Addr
          | ...
          | NCase [(Int, [Name], Addr)]

Operationally, the reduction of the case statement is straightforward.

Check if the argument is evaluated.
- If not, make it the new stack and push the current stack to the dump. (Similar to evaluating the argument of any primitive.)
If the argument is evaluated, then search for an alternative with a matching tag.
- If no alternative with a matching tag is found, then throw an error (inexhaustive case branches).
Instantiate the body of the matching alternative with the environment augmented with the structured data arguments. (E.g., in case Pack {0, 2} 3 4 in <0> a b -> a + b, instantiate a + b with environment [a <- 3, b <- 4])

A new node type would likely have to be introduced for case (NCase) containing the list of alternatives, but that's not too dissuading.

I found a GitHub repository @bollu/timi which seems to implement a template-instantiation evaluator also following this tutorial. There is a section called "Lack of lambda and case", which attributes the lack of a generic case statement to the following reason:

Case requires us to have some notion of pattern matching / destructuring which is not present in this machine.

However, in this tutorial there is no notion of pattern-matching; we would simply be matching by tag number (an integer), so I'm not sure if this explanation is valid.

_{Aside, partly for myself: a very similar question was asked about special treatment for case statements in the next chapter of the tutorial (concerning G-machines rather than template-instantiation).}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乖乖哒 2025-01-21 11:07:15

我想我在扩展问题中的推理时已经弄清楚了。我将在这里发布给后代，但如果有人有更容易理解的解释或能够纠正我，我将很乐意接受。

困难在于实例化步骤执行所有变量替换，并且这与评估（step 函数）分开进行。问题是正如bollu在原始问题中链接的GitHub存储库中所说的< /a>：在实例化时解构结构化数据并不容易。这使得实例化所有替代方案的主体变得困难。

为了说明这一点，请考虑 let 表达式的实例化。其工作原理如下：

实例化每个新的绑定表达式。
使用新绑定增强当前环境。
使用增强的表达式实例化身体。

但是，现在考虑 case 表达式的情况。我们要做的是：

实例化受检者。（最终应计算为 Pack {m, n} a0 a1 ... an 形式）
对于每个备选方案（每个选项的形式为 b0 b1 ... bn -> body)，使用新绑定增强环境 ([b0 <- a0, b1 <- a1, ..., bn <- an] 和然后实例化主体）

问题出在这两个步骤之间的某个地方：在受检者上调用 instantiate 会导致实例化 Addr，但我们不容易访问 a1, a2, ... an 用于在实例化时增强环境。虽然如果审查者是字面 Pack 值，则这可能是可能的，但如果它需要进一步评估（例如，是调用超级组合器的评估结果），那么我们需要首先评估它。

为了巩固我自己的理解，我想回答另一个问题：基元 if、casePair 和 caseList 是如何工作的避免这个问题？

if 可以轻松避免这个问题，因为布尔值是空值。 casePair 和 caseList 通过使用 thunk 推迟变量绑定来避免此问题；一旦 thunk 被调用，主体表达式就会被实例化，也就是在审查者被求值之后。

可能的解决方案：

我认为，如果我们在结构化数据对象上定义解构基元运算符，则可能可以解决这个问题。即，(Pack {m, n} a0 a1 ... an).3 将计算为 a3。
在这种情况下，我们可以做的是调用instantiate scrut，这将为我们提供地址scrutAddr，然后我们可以使用新的绑定来增强环境[b0 <- (NAp .0 scrut), b1 <- (NAp .1 scrut), ..., bn <- (NAp .n scrut）]。
问题似乎在于实例化（替换）和评估是分开的。如果变量不是与求值分开实例化，而是在绑定/使用时添加到环境中/从环境中查找，那么这不会成为问题。这就好像我们将 case 语句的主体放入 thunk 中，以便在对 scrutinee 求值后进行实例化，这与 casePair 和 caseList 的做法类似。

我还没有研究过这些替代解决方案，也没有研究过它们会带来多少额外的工作。

I think I figured it out while I was expanding on my reasoning in the question. I'll post here for posterity, but if someone has a more understandable explanation or is able to correct me I'll be happy to accept it.

The difficulty lies in the fact that the instantiate step performs all of the variable substitutions, and this happens separately from evaluation (the step function). The problem is as bollu says in the GitHub repository linked in the original question: it is not easy to destructure structured data at instantiation time. This makes it difficult to instantiate the bodies of all of the alternatives.

To illustrate this, consider the instantiation of let expressions. This works like so:

Instantiate each new binding expression.
Augment the current environment with the new bindings.
Instantiate the body with the augmented expression.

However, now consider the case of case expressions. What we want to do is:

Instantiate the scrutinee. (Which should eventually evaluate to the form Pack {m, n} a0 a1 ... an)
For each alternative (each of which has the form <m> b0 b1 ... bn -> body), augment the environment with the new bindings ([b0 <- a0, b1 <- a1, ..., bn <- an] and then instantiate the body of the alternative.)

The problem lies somewhere in between the two steps: calling instantiate on the scrutinee results in the instantiated Addr, but we don't readily have access to a1, a2, ... an to augment the environment with at instantiation time. While this might be possible if the scrutinee was a literal Pack value, if it needed further evaluation (e.g., was the evaluated result of a call to a supercombinator) then we would need to first evaluate it.

To solidify my own understanding, I'd like to answer the additional question: How do the primitives if, casePair, and caseList avoid this problem?

if trivially avoids this problem because boolean values are nullary. casePair and caseList avoid this problem by deferring the variable bindings using thunk(s); the body expressions get instantiated once the thunk is called, which is after the scrutinee is evaluated.

Possible solutions:

I'm thinking that it might be possible to get around this if we define a destructuring primitive operator on structured data objects. I.e., (Pack {m, n} a0 a1 ... an).3 would evaluate to a3.
In this case, what we could do is call instantiate scrut which would give us the address scrutAddr, and we could then augment the environment with new bindings [b0 <- (NAp .0 scrut), b1 <- (NAp .1 scrut), ..., bn <- (NAp .n scrut)].
The issue seems to lie in the fact that instantiation (substitution) and evaluation are separated. If variables were not instantiated separately from evaluation but rather added to/looked up from the environment upon binding/usage, then this would not be a problem. This is as if we placed the bodies of the case statements into thunks to be instantiated after the scrutinee is evaluated, which is similar to what casePair and caseList do.

I haven't worked through either of these alternate solutions or how much extra work they would incur.

回复收藏 0 原文

~没有更多了~