如果编译一个不需要输入的程序会发生什么？（Haskell IO 纯度问题（再次））

发布于 2024-12-19 13:14:39 字数 593 浏览 2 评论 0原文

当使用任何参数调用 putStrLn 时，将始终返回 IO () 类型的值。我同意这很纯粹，我可以处理。但它是参照透明的吗？我认为是这样，因为对于任何给定的输入，您都可以用 IO () 替换函数调用，这会在标准输出中抛出正确的字符串。

因此，我对 putStrLn 很满意，但在不带参数调用时，getLine 可以返回任意数量的内容，只要它们是 IO String 类型即可。这既不纯粹也不参照透明，对吗？

愚蠢的迂腐问题，它可能不会改变我编写代码的方式，但我真的想一劳永逸地解决这个问题。（我知道 IO monad 会正确排序，这不是我的问题）

这给我带来了另一个问题。编译器是否足够聪明，能够识别不需要输入的程序？例如，假设我编译

main = putStrLn . show $ map (+1) [1..10]

GHC 是否足够智能，可以将该程序简化为 IO () ，从而导致 [2,3,4,5,6,7,8,9,10,11]打印出来？或者它仍然可以解决问题并在运行时评估/执行所有内容？对于不需要输入的任意程序也是如此。 GHC 是否采用了整个程序是引用透明的并且可以简单地用它的值替换的事实？

原文

putStrLn when called with any arguments will always return a value of type IO (). I agree that's pure, I can handle that. But is it referentially transparent? I think so, because for any given input you could replace the function call with an IO () which would throw the correct string at stdout.

So I'm cool with putStrLn, but getLine when called with no arguments could return any number of things provided they are of type IO String. That is neither pure nor referentially transparent right?

Silly pedantic question and it's probably not going to change how I write my code, but I really want to nail this once and for all. (I understand that the IO monad will sequence things correctly, that's not my issue)

This raises another question for me. Is the compiler smart enough to recognise a program that takes no input? For example say I compile

main = putStrLn . show $ map (+1) [1..10]

Is GHC smart enough to reduce that program to the IO () that causes [2,3,4,5,6,7,8,9,10,11] to be printed out? Or does it still work it out and evaluate/execute everything at runtime? Same goes for arbitrary programs where Input is not required. Does GHC employ the fact that the entire program is referentially transparent and can simply be replaced with it's value?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

無心 2024-12-26 13:14:39

我认为这里有两个问题。

IO 引用透明吗
GHC 会在编译时减少任意表达式

看看 IO 的类型，你可以想象它通过依赖没有数据构造函数的神秘值 RealWorld 来模拟引用透明，并且通过使每个语句依赖于最后一个语句（在单线程世界中）。对于 IO String 来说，这是一个围绕 RealWorld -> 的新类型包装器。 (RealWorld, String)...这是一个函数，而不是一个值。在没有 Monad 实例的情况下使用 IO 会使这一点变得特别明显，而且令人痛苦。

Prelude GHC.Types> :info IO
newtype IO a
  = IO (GHC.Prim.State# GHC.Prim.RealWorld
        -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))

至于GHC的优化，在这种情况下它不会在编译时将列表缩减为字符串。 GHC 7.2.1 生成的优化代码会延迟生成一个列表，对结果进行映射 (+1)，将列表转换为字符串，最后将其打印到控制台。与您的示例中的内容几乎完全相同。

I think there are two questions here.

Is IO referentially transparent
Will GHC reduce arbitrary expressions at compile time

Looking at the type of IO, you can imagine that it emulates referential transparency by relying on mysterious value RealWorld that does not have a data constructor, and by making each statement depend on the last (in a single threaded world). In the case of an IO String, this is a newtype wrapper around RealWorld -> (RealWorld, String)... which is a function, not a value. Using IO without the Monad instance makes this particularly, and painfully, obvious.

Prelude GHC.Types> :info IO
newtype IO a
  = IO (GHC.Prim.State# GHC.Prim.RealWorld
        -> (# GHC.Prim.State# GHC.Prim.RealWorld, a #))

As for GHC's optimization, in this case it does not reduce the list to a string at compile time. The optimized code produced by GHC 7.2.1 lazily generates a list, maps (+1) over the results, converts the list to a string, and finally prints it to the console. Pretty much exactly as it reads in your example.

回复收藏 0 原文

笨笨の傻瓜 2024-12-26 13:14:39

是的，这些一元函数是纯粹的引用透明的，因为替换规则仍然适用于它们。

在 Haskell 中，以下两个程序是等效的。

main = (puStrLn "17" >> puStrLn "17")
main = let x = putStrLn "17" in (x >> x)

在“正常”语言中，第二个示例只会打印一次，作为计算 x 的副作用。当您意识到 IO() 类型的值实际上并不是副作用计算，而是实际上是这样一个描述时，这两个程序实际上相同的方式就会变得更加清晰。计算，您可以将其用作构建块来构建更大的计算。

Yes, those monadic functions are pure referentialy transparent since the substitution rule still applies to them.

In Haskell, the following two programs are equivalent

main = (puStrLn "17" >> puStrLn "17")
main = let x = putStrLn "17" in (x >> x)

In a "normal" language the second example would only print once, as a side-effect of evaluating x. The way the two programs are actually the same becomes a little clearer when you realize that a value of type IO() is not actually a side-effecting computation but is actually a description of such a computation that you can use as building block to build larger computations from.

回复收藏 0 原文

飘逸的'云 2024-12-26 13:14:39

getLine :: IO String 是纯粹的；它的值是从标准输入读取并返回*一个字符串的 IO 操作。 getLine 始终等于该值。

*我在这里使用“返回”这个词是因为没有更好的词。

维基百科将引用透明度定义为：

如果一个表达式可以用它的值替换而不改变程序的行为（换句话说，在相同的输入上产生具有相同效果和输出的程序），则该表达式被认为是引用透明的。

所以 getLine 也是引用透明的。尽管我想不出一种很好的方法来以其他方式表达其“价值”，以达到“用其值替换表达式”的目的。

另外，对于诸如“putStrLn 当使用任何参数调用时将始终返回 IO ()”这样的语句应该小心一些。 IO () 是一个类型，而不是一个值。对于每个 s :: String，putStrLn s 都是一个 IO () 类型的值，是的。但这个值是多少，当然取决于s。

（此外，如果排除那些不安全的东西，一切都是纯粹且引用透明的，尤其是getLine。）

回复收藏 0 原文

倾城泪 2024-12-26 13:14:39

让我回答问题的第二部分（我已经在之前的问题中回答了第一部分）。只要不改变程序的语义，编译器就可以自由地对表达式执行任何操作。因此，您必须询问有关特定编译器的问题才能使其有意义。 ghc吗？不，不是当前版本。有没有编译器可以做到这一点？是的，有。

回复收藏 0 原文

夜血缘 2024-12-26 13:14:39

我不确定这是否有帮助（如果它只会让更多人感到困惑，我提前道歉），但是在 Mercury 中使 IO 引用透明的方式是显式地将 io 类型的值传递给所有 IO - 执行代码，还必须返回 io 类型的新值。

输入io 表示调用代码之前的“世界状态”。程序之外的整个世界；磁盘内容、屏幕上打印的内容、用户将要输入的内容、将从网络接收的内容等等。

输出 io 表示调用代码后的世界状态。输入 io 和输出 io 之间的差异包含该代码对世界所做的更改（理论上加上外部发生的所有其他事情）。

Mercury 的模式系统确保 io 类型的值唯一；它们只有其中之一，因此您不能将相同的 io 值传递给两个不同的 IO 执行过程。您将一个 io 传递到一个过程中，使其对您毫无用处，然后收到一个新的返回。

当然，现实世界的真实状态不会编码为io类型的值；而是被编码为io类型的值。事实上，io 的底层是完全空的！根本没有任何信息被传递！但是io值代表世界的状态。

您可以将 IO monad 中的函数视为执行相同的操作。它们采用额外的隐式世界状态参数，并返回额外的隐式世界状态值。 IO monad 实现处理将此额外输出传递给下一个函数。这使得 IO monad 非常像 State monad；很容易看出该 monad 中的 get 是纯粹的，即使它看起来不带任何参数。

按照这种理解，main 在程序运行之前接收世界的初始状态，并通过将其线程化到程序中的所有 IO 代码，将其转换为程序运行后的世界状态。

而且因为您自己无法获得最新的值，所以您无法在其他代码中间启动自己的小 IO 链。这就是确保纯度的原因，因为事实上我们不可能凭空出现一个拥有自己国家的全新世界。

所以 getLine :: IO String 可以被认为是类似于 getLine :: World -> 的东西。（世界，字符串）。它是纯粹的，因为它在不同的时间被调用并返回不同的字符串，每次它都会收到一个不同 World。

无论您考虑的是 IO 操作的值，还是在函数之间传递的世界状态，或任何其他机制，所有这些构造都是代表性的。在底层，所有 IO 都是用不纯的代码实现的，因为这就是世界的运作方式；当您写入文件时，您已经更改了磁盘的状态。但我们可以在更高的抽象层次上表示这一点，让您以不同的方式思考它。

打个比方，您可以使用搜索树或哈希表或无数其他方式来实现映射。但是实现它后，当您推理使用映射的代码时，您不会考虑左右子树或存储桶和哈希，而是考虑映射的抽象。

如果我们能够以保持纯度和引用透明性的方式表示 IO，那么我们就可以将任何需要引用透明性的推理应用到使用这种表示的代码中。这使得适用于此类代码的所有数学都可以发挥作用（其中大部分用于实现纯度强制语言的高级编译器），甚至对于执行 IO 的程序也是如此。

关于第二个问题的快速附录。理论上，GHC 可以将输入程序简化为输出。但我不相信它会非常努力地这样做，因为这通常是不可判定的。想象一个程序不接受任何输入，但生成一个无限列表，然后打印它的最后 3 个元素。理论上，任何不依赖于其输入的程序都可以简化为其输出，但为了做到这一点，编译器必须执行与在编译时执行程序等效的操作。因此，要完全普遍做到这一点，您必须对程序有时在编译时进入无限循环感到高兴。几乎每个程序都依赖于它的输入，因此即使尝试这样做也不会获得太多好处。

通过识别程序中不依赖于任何输入的部分并用结果替换它们，可以获得一些东西。这称为部分评估，这是一个活跃的研究课题，但它也非常困难，并且没有一刀切的解决方案。为此，您必须能够识别程序中不会使编译器陷入无限循环并试图找出它们返回的内容的区域，并且您必须决定是否删除一些需要花费一些时间的代码。如果意味着将其返回的数百兆字节数据结构嵌入到程序二进制文件中，那么运行时的秒数是一个足够好的好处。您必须完成所有这些分析，而无需花费数小时来编译中等复杂的程序。

I'm not sure if this will help (I apologise in advance if it only confuses more), but the way IO is made referentially transparent in Mercury is to explicitly pass a value of type io to all IO-performing code, which must also return a new value of type io.

The input io represents "the state of the world" just before the code is called. The entire world outside the program; disk contents, what's printed on the screen, what the user is about to type, what's about to be received from the network, everything.

The output io represents the state of the world just after the code is called. The difference between the input io and the output io contains the changes to the world that were made by that code (plus everything else that happened externally, in theory).

Mercury's mode system ensures that values of type io are unique; there's only ever one of them, so you can't pass the same io value to two different IO-performing procedures. You pass an io into a procedure, rendering it useless to you, then receive a new one back.

Of course the real state of the actual world isn't encoded into values of type io; in fact under the hood io is completely empty! There's no information being passed at all! But the io values represent the state of the world.

You can think of functions in the IO monad as doing the same. They take an extra implicit state-of-the-world argument, and return an extra implicit state-of-the-world value. The IO monad implementation handles passing this extra output on to the next function. This makes the IO monad very like the State monad; it's easy to see that get in that monad is pure, even though it appears to take no arguments.

In that understanding, main receives the initial state of the world before your program runs and transforms it into the state of the world after the program is run, by threading it through all the IO code in your program.

And because you can't get a state-of-the-world value yourself, you have no way of starting your own little IO chain in the middle of other code. This is what ensures purity, since in actual fact we can't have a brand new world with its own state spring out of nowhere.

So getLine :: IO String can be thought of as something like getLine :: World -> (World, String). It's pure, because all those different times it's called and returns different strings it received a different World each time.

Whether you think about values that are IO actions, or the state-of-the-world being passed around between functions, or any other mechanism, all these constructs are representational. Under the hood, all IO is implemented with impure code, because that's how the world works; when you write to a file, you have changed the state of the disk. But we can represent this at a higher level of abstraction, allowing you to think about it differently.

An analogy is that you can implement a map with search trees or hash tables or a zillion other ways. But having implemented it, when you're reasoning about code that uses the map, you don't think about left and right sub-trees or buckets and hashes, you think about the abstraction that is a map.

If we can represent IO in a way that maintains purity and referential transparency, then we can apply any reasoning that requires referential transparency to code using this representation. This allows all the mathematics that applies to such code to work (much of which is used in the implementation of advanced compilers for purity-enforced languages), even for programs that perform IO.

And a quick addendum about your second question. GHC could theoretically reduce that input program to just the output. I don't believe it tries terribly hard to do so though, because this is undecidable in general. Imagine a program that took no input but generated an infinite list and then printed its last 3 elements. Theoretically any program that is not dependent on its input can be reduced to its output, but in order to do that the compiler has to do something equivalent to executing the program at compile time. So to do this fully generally you'd have to be happy for your programs to sometimes go into infinite loops at compile time. And almost every program is dependent on its input, so there's not a lot to be gained by even trying to do this.

There is something to be gained by identifying parts of programs that aren't dependent on any input and replacing them with their result. This is called partial evaluation, and it's an active research topic, but it's also very hard and there's no one-size-fits-all solution. To do it, you have to be able to identify areas of the program that won't send the compiler into an infinite loop trying to figure out what they return, and you have to make make decisions about whether removing some code that takes a few seconds at run time is a good enough benefit if it means embedding the multi-hundred-megabyte data structure it returns in the program binary. And you have to do all this analysis without taking hours to compile moderately complex programs.

回复收藏 0 原文