Lambda 演算前驱函数约简步骤
我对 lambda 演算中前驱函数的维基百科描述感到困惑。
维基百科的内容如下:
PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)
有人可以逐步解释还原过程吗?
谢谢。
I am getting stuck with the Wikipedia description of the predecessor function in lambda calculus.
What Wikipedia says is the following:
PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)
Can someone explain reduction processes step-by-step?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
好吧,丘奇数字的想法是使用函数对“数据”进行编码,对吧?其工作方式是通过您要执行的某些通用操作来表示一个值。因此,我们也可以朝另一个方向走,这有时会让事情变得更清楚。
教堂数字是自然数的一元表示。因此,我们使用
Z
表示零,使用Sn
表示n
的后继。现在我们可以这样计数:Z
、SZ
、SSZ
、SSSZ
... 等效的 Church 数字为两个参数 - 第一个对应于S
,第二个对应于Z
- 然后使用它们来构造上述模式。因此,给定参数f
和x
,我们可以这样计数:x
,f x
,f ( fx)
,f (f (fx))
...让我们看看 PRED 做了什么。
首先,它创建一个带有三个参数的 lambda——
n
是我们想要的 Church 数字,当然,这意味着f
和x
> 是结果数字的参数,这意味着 lambda 的主体将f
应用于x
一次,比n
少一次会。接下来,它将
n
应用于三个 参数。这是棘手的部分。第二个参数对应于前面的
Z
,是λu.x
——一个忽略一个参数并返回x
的常量函数。第一个参数对应于前面的
S
,是λgh.h (gf)
。我们可以将其重写为λg。 (λh.h (gf))
来反映只有最外层 lambda 被应用n
次的事实。此函数的作用是获取到目前为止的累积结果g
并返回一个带有一个参数的新函数,该函数将该参数应用于应用于f
g 的代码>.当然,这绝对令人困惑。那么...这是怎么回事?考虑用
S
和Z
直接替换。在非零数Sn
中,n
对应于绑定到g
的参数。因此,记住f
和x
绑定在外部范围内,我们可以这样计数:λu.x
,λh。 h ((λu.x) f)
,λh'. h' ((λh.h ((λu.x) f)) f)
... 执行明显的约简,我们得到:λu.x
,λh.h' ((λh.h ((λu.x) f)) f)
... h x ,λh'。 h' (fx)
...这里的模式是函数被“向内”传递一层,此时S
将应用它,而Z
将忽略它。因此,除了最外层之外,我们对每个S
都应用了一次f
。第三个参数只是恒等函数,由最外层的
S
尽职尽责地应用,返回最终结果 -f
应用的次数比的次数少一次S
层n
对应。Ok, so the idea of Church numerals is to encode "data" using functions, right? The way that works is by representing a value by some generic operation you'd perform with it. We can therefore go in the other direction as well, which can sometimes make things clearer.
Church numerals are a unary representation of the natural numbers. So, let's use
Z
to mean zero andSn
to represent the successor ofn
. Now we can count like this:Z
,SZ
,SSZ
,SSSZ
... The equivalent Church numeral takes two arguments--the first corresponding toS
, and second toZ
--then uses them to construct the above pattern. So given argumentsf
andx
, we can count like this:x
,f x
,f (f x)
,f (f (f x))
...Let's look at what PRED does.
First, it creates a lambda taking three arguments--
n
is the Church numeral whose predecessor we want, of course, which means thatf
andx
are the arguments to the resulting numeral, which thus means that the body of that lambda will bef
applied tox
one time fewer thann
would.Next, it applies
n
to three arguments. This is the tricky part.The second argument, that corresponds to
Z
from earlier, isλu.x
--a constant function that ignores one argument and returnsx
.The first argument, that corresponds to
S
from earlier, isλgh.h (g f)
. We can rewrite this asλg. (λh.h (g f))
to reflect the fact that only the outermost lambda is being appliedn
times. What this function does is take the accumulated result so far asg
and return a new function taking one argument, which applies that argument tog
applied tof
. Which is absolutely baffling, of course.So... what's going on here? Consider the direct substitution with
S
andZ
. In a non-zero numberSn
, then
corresponds to the argument bound tog
. So, remembering thatf
andx
are bound in an outside scope, we can count like this:λu.x
,λh. h ((λu.x) f)
,λh'. h' ((λh. h ((λu.x) f)) f)
... Performing the obvious reductions, we get this:λu.x
,λh. h x
,λh'. h' (f x)
... The pattern here is that a function is being passed "inward" one layer, at which point anS
will apply it, while aZ
will ignore it. So we get one application off
for eachS
except the outermost.The third argument is simply the identity function, which is dutifully applied by the outermost
S
, returning the final result--f
applied one fewer times than the number ofS
layersn
corresponds to.麦肯的回答很好地解释了这一点。让我们以 Pred 3 = 2 为例:
考虑表达式:n (λgh.h (gf)) (λu.x)。设 K = (λgh.h (gf))
对于 n = 0,我们编码
0 = λfx.x
,因此当我们对(λfx.x)(λgh. h(gf))
表示(λgh.h(gf))
被替换 0 次。进一步进行 beta 缩减后,我们得到:λfx.(λu.x)(λu.u)
缩减为
λfx.x
,其中
λfx.x = 0
代码>,如预期。对于 n = 1,我们应用 K 1 次:
(λgh.h (gf)) (λu.x)
=> λh。 h((λu.x) f)
=> λh。 h x
对于 n = 2,我们应用 K 2 次:
(λgh.h (gf)) (λh.hx)
=> λh。 h ((λh.hx) f)
=> λh。 h (fx)
对于 n = 3,我们应用 K 3 次:
(λgh.h (gf)) (λh.h (fx))
=> λh.h ((λh.h (fx)) f)
=> λh.h (f (fx))
最后,我们对这个结果应用 id 函数,我们得到
λh.h (f (fx)) (λu.u)
=> (λu.u)(f (fx))
=> f (fx)
这是数字 2 的定义。
基于列表的实现可能更容易理解,但需要许多中间步骤。所以在我看来,它不如教会最初的实施那么好。
McCann's answer explains it pretty well. Let's take a concrete example for Pred 3 = 2:
Consider expression: n (λgh.h (g f)) (λu.x). Let K = (λgh.h (g f))
For n = 0, we encode
0 = λfx.x
, so when we apply the beta reduction for(λfx.x)(λgh.h(gf))
means(λgh.h(gf))
is replaced 0 times. After further beta-reduction we get:λfx.(λu.x)(λu.u)
reduces to
λfx.x
where
λfx.x = 0
, as expected.For n = 1, we apply K for 1 times:
(λgh.h (g f)) (λu.x)
=> λh. h((λu.x) f)
=> λh. h x
For n = 2, we apply K for 2 times:
(λgh.h (g f)) (λh. h x)
=> λh. h ((λh. h x) f)
=> λh. h (f x)
For n = 3, we apply K for 3 times:
(λgh.h (g f)) (λh. h (f x))
=> λh.h ((λh. h (f x)) f)
=> λh.h (f (f x))
Finally, we take this result and apply an id function to it, we got
λh.h (f (f x)) (λu.u)
=> (λu.u)(f (f x))
=> f (f x)
This is the definition of number 2.
The list based implementation might be easier to understand, but it takes many intermediate steps. So it is not as nice as the Church's original implementation IMO.
阅读了之前的答案(好的答案)后,我想给出自己对此事的看法,希望它对某人有所帮助(欢迎指正)。我将举一个例子。
首先,我想在定义中添加一些括号,以使一切对我来说更加清晰。让我们将给定的公式重新定义为:
让我们还定义三个有助于示例的丘奇数字:
为了理解其工作原理,让我们首先关注公式的这一部分:
从这里,我们可以得出以下结论:
n
是丘奇数字,要应用的函数是λgλh.h (gf)
,起始数据是λu.x
请注意,让我们尝试一个例子:
让我们首先关注数字的减少(我们之前解释过的部分):
减少为:
最终为:
所以,我们有:
再次减少:
正如您在减少中看到的,我们结束减少应用该功能一次,谢谢巧妙地使用函数。
使用 add1 作为
f
和 0 作为x
,我们得到:希望这会有所帮助。
After Reading the previous answers (good ones), I’d like to give my own vision of the matter in hope it helps someone (corrections are welcomed). I’ll use an example.
First off, I’d like to add some parenthesis to the definition that made everything clearer to me. Let’s redifine the given formula to:
Let’s also define three Church numerals that will help with the example:
In order to understand how this works, let's focus first on this part of the formula:
From here, we can extract this conclusions:
n
is a Church numeral, the function to be applied isλgλh.h (g f)
and the starting data isλu.x
With this in mind, let's try an example:
Let's focus first on the reduction of the numeral (the part we explained before):
Which reduces to:
Ending up with:
So, we have:
Reducing again:
As you can see in the reductions, we end up applying the function one time less thanks to a clever way of using functions.
Using add1 as
f
and 0 asx
, we get:Hope this helps.
将此定义
分为 4 部分:
现在,忽略
D
。根据丘奇数字的定义,AB C
为B^n C
:将B
的n
次折叠应用到>C。
现在将
B
视为一台将一个输入转换为一个输出的机器。其输入g
的形式为λh.h *
,当附加f
时,变为(λh.h *) f = f *
。这会在*
中再添加一个f
应用程序。然后,结果f *
前面加上λh.h
,成为λh.h (f *)
。您会看到这样的模式:
B
的每次应用都会将λh.h *
转换为λh.h (f *)
。如果我们将λh.h x
作为开始项,我们将使用λh.h (f^nx)
作为结束项(在n
之后)B
的应用)。然而,起始项是
C = (λu.x)
,当附加f
时,变为(λu.x) f = x
,然后加上λh.h
成为λh.h x
。因此,我们在第一次应用B
之后而不是之前获得了λh.h x
。这就是为什么我们将λh.h (f^(n-1) x)
作为最终术语:f
的第一个应用被忽略。最后,将
λh.h (f^(n-1) x)
代入D = (λu.u)
(即恒等式),得到f^ (n-1) x 。那是:
Split this definition
into 4 parts:
For now, ignore
D
. By definition of Church numerals,A B C
isB^n C
: Applyn
folds ofB
toC
.Now treat
B
like a machine that turns one input into one output. Its inputg
has formλh.h *
, when appended byf
, becomes(λh.h *) f = f *
. This adds one more application off
to*
. The resultf *
is then prepended byλh.h
to becomeλh.h (f *)
.You see the pattern: Each application of
B
turnsλh.h *
intoλh.h (f *)
. If we hadλh.h x
as the begin term, we would haveλh.h (f^n x)
as the end term (aftern
applications ofB
).However, the begin term is
C = (λu.x)
, when appended byf
, becomes(λu.x) f = x
, then prepended byλh.h
to becomeλh.h x
. So we hadλh.h x
after, not before, the first application ofB
. This is why we haveλh.h (f^(n-1) x)
as the end term: The first application off
was ignored.Finally, apply
λh.h (f^(n-1) x)
toD = (λu.u)
, which is identity, to getf^(n-1) x
. That is:您可以尝试从延续的角度来理解前驱函数(不是我最喜欢的)的定义。
为了稍微简化一下问题,让我们考虑以下变体
,您可以将 S 替换为 f,将 0 替换为 x。
函数体在参数 N 上迭代变换 M 次。参数 N 是类型为 (nat -> nat) -> nat 的函数。 nat 期望 nat 的延续并返回 nat。最初,N = λu.0,即忽略延续并仅返回 0。
让我们将 N 称为当前计算。
函数M:(nat -> nat) -> nat)-> (nat -> nat)-> nat 如下修改计算 g:(nat -> nat)->nat。
它接受输入一个延续 h,并将其应用于
将当前计算 g 与 S 继续下去的结果。
由于初始计算忽略了继续,因此在一次应用 M 后,我们得到计算结果 (λh.h 0),然后得到 (λh.h (S 0)),依此类推。
最后,我们将计算应用于恒等延拓
提取结果。
You can try to understand this definition of the predecessor function (not my favourite one) in terms of continuations.
To simplify the matter a bit, let us consider the following variant
then, you can replace S with f, and 0 with x.
The body of the function iterates n times a transformation M over an argument N. The argument N is a function of type (nat -> nat) -> nat that expects a continuation for nat and returns a nat. Initially, N = λu.0, that is it ignores the continuation and just returns 0.
Let us call N the current computation.
The function M: (nat -> nat) -> nat) -> (nat -> nat) -> nat modifies the computation g: (nat -> nat)->nat as follows.
It takes in input a continuation h, and applies it to the
result of continuing the current computation g with S.
Since the initial computation ignored the continuation, after one application of M we get the computation (λh.h 0), then (λh.h (S 0)), and so on.
At the end, we apply the computation to the identity continuation
to extract the result.
我将在上述好的基础上添加我的解释,主要是为了我自己的理解。这是 PRED 的定义:
第一个点右侧的内容应该是 f 应用于 x 的 (n-1) 次组合:f^(n-1)(x)。
让我们通过逐步理解表达式来看看为什么会出现这种情况。
λu.x 是 x 处的常数函数。我们将其表示为 const_x。
λu.u 是恒等函数。我们称之为 id。
λg (λh.h (gf)) 是一个我们需要理解的奇怪函数。我们称之为 F。
好吧,PRED 告诉我们在常量函数上评估 F 的 n 次组合,然后在恒等函数上评估结果。
让我们仔细看看 F:
F 将 g 发送到 g(f) 处进行评估。
让我们用 ev_y 来表示对 y 值的评估。
也就是说,ev_y := λh.hy
所以
现在我们弄清楚 F^n const_x 是什么。
类似
地,
等等:
现在,
这就是我们想要的。
超级傻。这个想法是将做某件事 n 次变成做 f n-1 次。解决方案是将 F n 次应用于 const_x 以获得
ev_{f^(n-1)(x)},然后通过评估恒等函数来提取 f^(n-1)(x)。
I'll add my explanation to the above good ones, mostly for the sake of my own understanding. Here's the definition of PRED again:
The stuff on the right side of the first dot is supposed to be the (n-1) fold composition of f applied to x: f^(n-1)(x).
Let's see why this is the case by incrementally grokking the expression.
λu.x is the constant function valued at x. Let's just denote it const_x.
λu.u is the identity function. Let's call it id.
λg (λh.h (g f)) is a weird function that we need to understand. Let's call it F.
Ok, so PRED tells us to evaluate the n-fold composition of F on the constant function and then to evaluate the result on the identity function.
Let's take a closer look at F:
F sends g to evaluation at g(f).
Let's denote evaluation at value y by ev_y.
That is, ev_y := λh.h y
So
Now we figure out what F^n const_x is.
and
Similarly,
and so on:
Now,
which is what we wanted.
Super goofy. The idea is to turn doing something n times into doing f n-1 times. The solution is to apply F n times to const_x to obtain
ev_{f^(n-1)(x)} and then to extract f^(n-1)(x) by evaluating at the identity function.
Cyker的回答让我明白了
这个可以补充。我会尽力专注于困难的部分。
(回顾)数字的定义如下:
对表达式的某个 n 进行求值
,得出以下结果(取决于 n 的值):
函数
λg.λh.h(gf)
能够通过将h
转换为f
并在前面添加新的h
将一个转换为下一个。将
h
视为一个临时占位符,可以通过用f
替换它来确认它,并且可以通过用恒等函数替换它来擦除它,这就是在最开始发生的情况结尾。另请注意 0 和 1 是如何产生零的:
编辑:我的目的是分享让我理解 pred 函数的提示,但问题实际上是要求 beta 减少步骤。
因此,提供的 pred 函数已经是最简单的形式,因此您无法对其进行 beta 缩减。换句话说,n 被应用于三个项,但我们还不知道有关 n 的任何信息。因此,为了简化表达式,我们需要决定一个输入。我们可以尝试使用数字 3,看看结果是否是数字 2,两者都按上述方式进行编码。这些步骤本质上是机械的,但它们可以以不同的顺序完成,并且仍然产生相同的结果。在这里,我展示了一个对人类有意义的顺序,其他顺序混淆了设计的意图。
The answer by Cyker made me understand this
This can complement it. I'll try to concentrate on the hard part.
(recap) The numerals are defined like so:
The evaluation with a certain n of the expression
gives us these (depending on the value of n):
The function
λg.λh.h(gf)
is able to turn one into the next by converting theh
to anf
and adding a newh
in front.Think of the
h
as a temporal placeholder that can be confirmed by substituting it withf
and can be erased by substituting it with the identity function, which is what happens at the very end.Also note how the 0 and the 1 both yield zero:
Edit: My intent is to share the hints that made me understand the pred function but the question is actually asking for the beta reduction steps.
As such, the provided pred function is already in it’s simplest form, so you can’t beta reduce it. In other words, n is being applied to three terms but we don’t know anything about n yet. So in order to simplify the expression we need to decide an input. We could try with the number three and see if the result is number two, both encoded as described above. The steps are mechanical in nature but they can be done in different order, and still yield the same result. Here I show an order that makes sense for a human being, other sequences obfuscate the intent of the design.