在惰性函数语言的模板实例化评估器中实现“case”表达式是否困难?
我正在关注“实现函数式语言:SPJ 的教程”,我被困在练习 2.18(第 70 页)上,复制如下。这是关于书中描述的简单惰性函数语言的模板实例化评估器的章节(类似于迷你 Miranda/Haskell):
练习 2.18。为什么很难将
case
表达式引入模板实例化机? (提示:想想instantiate
会对case
表达式做什么。)
本教程接着介绍解构结构化数据的几个不太通用版本的实现: if
原语、casePair
原语和 caseList
原语。我还没有完成本节的实现(第 2 章标记 5),但我不明白为什么单独实现这些比实现单个 case
原语要容易得多。
我能提供的唯一合理的解释是,最通用的 case
形式在替代数量(要匹配的标签数量)和数量(结构化数据的参数数量)方面都是可变的。所有上述原语都是固定数量的,并且具有已知数量的替代方案。然而,我不明白为什么这会使实施变得更加困难。
case
语句的实例化非常简单:
- 实例化受检者。
实例化每个替代方案的主体表达式。 (如果我们替换为未评估的分支,这可能有点浪费。)(我现在注意到这可能是一个问题,将在答案中发布。)- 将结果封装在新的节点类型中,
NCase,其中:
数据节点 = NAp Addr Addr | ... | NCases [(Int, [名称], 地址)]
从操作上来说,case
语句的减少非常简单。
- 检查参数是否被评估。
- 如果不是,则将其设为新堆栈并将当前堆栈推送到转储。 (类似于评估任何原语的参数。)
- 如果评估了参数,则搜索具有匹配标签的替代项。
- 如果没有找到具有匹配标签的替代项,则抛出错误(不详尽的 case 分支)。
- 使用结构化数据参数增强的环境来实例化匹配替代方案的主体。 (例如,在
case Pack {0, 2} 3 4 in <0> ab -> a + b
中,使用环境[ 实例化
)a + b
a <- 3, b <- 4]
对于包含替代列表的情况 (NCases
),可能必须引入新的节点类型,但这并不是太劝阻。
我找到了一个 GitHub 存储库 @bollu/timi ,它似乎也按照本教程实现模板实例化评估器。有一个名为 “缺少 lambda 和 case” 的部分,它将缺少通用 case
语句归因于以下原因:
Case
要求我们有一些模式匹配/解构的概念,而这在本机中是不存在的。
然而,在本教程中没有模式匹配的概念;而是没有模式匹配的概念。我们只是通过标签号(整数)进行匹配,所以我不确定这个解释是否有效。
除此之外,部分是为了我自己:a在本教程的下一章中,关于对 case
语句的特殊处理,提出了非常类似的问题(涉及 G 机器而不是模板实例化)。
I'm following "Implementing functional languages: a tutorial" by SPJ, and I'm stuck on Exercise 2.18 (page 70), reproduced below. This is in the chapter about a template-instantiation evaluator for the simple lazy functional language described in the book (similar to a mini Miranda/Haskell):
Exercise 2.18. Why is it hard to introduce
case
expressions into the template instantiation machine?
(Hint: think about whatinstantiate
would do with acase
expression.)
The tutorial then goes on to cover an implementation of several less-general versions of destructuring structured data: an if
primitive, a casePair
primitive, and a caseList
primitive. I haven't yet done the implementation for this section (Chapter 2 Mark 5), but I don't see why implementing these separately would be significantly easier than implementing a single case
primitive.
The only plausible explanations I can offer is that the most generic case
form is variadic in both number of alternatives (number of tags to match against) and arity (number of arguments to the structured data). All of the above primitives are fixed-arity and have a known number of alternatives. I don't see why this would make implementation significantly more difficult, however.
The instantiation of the case
statement is straightforward:
- Instantiate the scrutinee.
Instantiate the body expression of each alternative. (This may be somewhat wasteful if we substitute in unevaluated branches.)(I notice now this may be a problem, will post in an answer.)- Encapsulate the result in a new node type,
NCase
, where:data Node = NAp Addr Addr | ... | NCase [(Int, [Name], Addr)]
Operationally, the reduction of the case
statement is straightforward.
- Check if the argument is evaluated.
- If not, make it the new stack and push the current stack to the dump. (Similar to evaluating the argument of any primitive.)
- If the argument is evaluated, then search for an alternative with a matching tag.
- If no alternative with a matching tag is found, then throw an error (inexhaustive case branches).
- Instantiate the body of the matching alternative with the environment augmented with the structured data arguments. (E.g., in
case Pack {0, 2} 3 4 in <0> a b -> a + b
, instantiatea + b
with environment[a <- 3, b <- 4]
)
A new node type would likely have to be introduced for case (NCase
) containing the list of alternatives, but that's not too dissuading.
I found a GitHub repository @bollu/timi which seems to implement a template-instantiation evaluator also following this tutorial. There is a section called "Lack of lambda and case", which attributes the lack of a generic case
statement to the following reason:
Case
requires us to have some notion of pattern matching / destructuring which is not present in this machine.
However, in this tutorial there is no notion of pattern-matching; we would simply be matching by tag number (an integer), so I'm not sure if this explanation is valid.
Aside, partly for myself: a very similar question was asked about special treatment for case
statements in the next chapter of the tutorial (concerning G-machines rather than template-instantiation).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想我在扩展问题中的推理时已经弄清楚了。我将在这里发布给后代,但如果有人有更容易理解的解释或能够纠正我,我将很乐意接受。
困难在于实例化步骤执行所有变量替换,并且这与评估(
step
函数)分开进行。问题是正如bollu在原始问题中链接的GitHub存储库中所说的< /a>:在实例化时解构结构化数据并不容易。这使得实例化所有替代方案的主体变得困难。为了说明这一点,请考虑
let
表达式的实例化。其工作原理如下:但是,现在考虑
case
表达式的情况。我们要做的是:Pack {m, n} a0 a1 ... an
形式)b0 b1 ... bn -> body
),使用新绑定增强环境 ([b0 <- a0, b1 <- a1, ..., bn <- an]
和然后实例化主体)问题出在这两个步骤之间的某个地方:在受检者上调用
instantiate
会导致实例化Addr
,但我们不容易访问a1
,a2
, ...an
用于在实例化时增强环境。虽然如果审查者是字面Pack
值,则这可能是可能的,但如果它需要进一步评估(例如,是调用超级组合器的评估结果),那么我们需要首先评估它。为了巩固我自己的理解,我想回答另一个问题:基元
if
、casePair
和caseList
是如何工作的避免这个问题?if
可以轻松避免这个问题,因为布尔值是空值。casePair
和caseList
通过使用 thunk 推迟变量绑定来避免此问题;一旦 thunk 被调用,主体表达式就会被实例化,也就是在审查者被求值之后。可能的解决方案:
我认为,如果我们在结构化数据对象上定义解构基元运算符,则可能可以解决这个问题。即,
(Pack {m, n} a0 a1 ... an).3
将计算为a3
。在这种情况下,我们可以做的是调用
instantiate scrut
,这将为我们提供地址scrutAddr
,然后我们可以使用新的绑定来增强环境[b0 <- (NAp .0 scrut), b1 <- (NAp .1 scrut), ..., bn <- (NAp .n scrut)]
。问题似乎在于实例化(替换)和评估是分开的。如果变量不是与求值分开实例化,而是在绑定/使用时添加到环境中/从环境中查找,那么这不会成为问题。这就好像我们将 case 语句的主体放入 thunk 中,以便在对 scrutinee 求值后进行实例化,这与
casePair
和caseList
的做法类似。我还没有研究过这些替代解决方案,也没有研究过它们会带来多少额外的工作。
I think I figured it out while I was expanding on my reasoning in the question. I'll post here for posterity, but if someone has a more understandable explanation or is able to correct me I'll be happy to accept it.
The difficulty lies in the fact that the
instantiate
step performs all of the variable substitutions, and this happens separately from evaluation (thestep
function). The problem is as bollu says in the GitHub repository linked in the original question: it is not easy to destructure structured data at instantiation time. This makes it difficult to instantiate the bodies of all of the alternatives.To illustrate this, consider the instantiation of
let
expressions. This works like so:However, now consider the case of
case
expressions. What we want to do is:Pack {m, n} a0 a1 ... an
)<m> b0 b1 ... bn -> body
), augment the environment with the new bindings ([b0 <- a0, b1 <- a1, ..., bn <- an]
and then instantiate the body of the alternative.)The problem lies somewhere in between the two steps: calling
instantiate
on the scrutinee results in the instantiatedAddr
, but we don't readily have access toa1
,a2
, ...an
to augment the environment with at instantiation time. While this might be possible if the scrutinee was a literalPack
value, if it needed further evaluation (e.g., was the evaluated result of a call to a supercombinator) then we would need to first evaluate it.To solidify my own understanding, I'd like to answer the additional question: How do the primitives
if
,casePair
, andcaseList
avoid this problem?if
trivially avoids this problem because boolean values are nullary.casePair
andcaseList
avoid this problem by deferring the variable bindings using thunk(s); the body expressions get instantiated once the thunk is called, which is after the scrutinee is evaluated.Possible solutions:
I'm thinking that it might be possible to get around this if we define a destructuring primitive operator on structured data objects. I.e.,
(Pack {m, n} a0 a1 ... an).3
would evaluate toa3
.In this case, what we could do is call
instantiate scrut
which would give us the addressscrutAddr
, and we could then augment the environment with new bindings[b0 <- (NAp .0 scrut), b1 <- (NAp .1 scrut), ..., bn <- (NAp .n scrut)]
.The issue seems to lie in the fact that instantiation (substitution) and evaluation are separated. If variables were not instantiated separately from evaluation but rather added to/looked up from the environment upon binding/usage, then this would not be a problem. This is as if we placed the bodies of the case statements into thunks to be instantiated after the scrutinee is evaluated, which is similar to what
casePair
andcaseList
do.I haven't worked through either of these alternate solutions or how much extra work they would incur.