生成 ByteString 的函数的纯度(或具有ForeignPtr 组件的任何对象)
由于 ByteString
是一个带有 ForeignPtr
的构造函数:
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
如果我有一个返回 ByteString
的函数,那么给定一个输入,比如一个常量 Word8
,该函数将返回一个具有非确定性ForeignPtr值的ByteString - 至于该值是什么由内存管理器确定。
那么,这是否意味着返回 ByteString 的函数不是纯函数?如果您使用过 ByteString 和 Vector 库,情况显然并非如此。当然,如果是这样的话,它会被广泛讨论(并希望出现在谷歌搜索的顶部)。这种纯度是如何实现的?
提出这个问题的原因是我很好奇,从 GHC 编译器的角度来看,在构造函数中考虑到ForeignPtr 成员,使用 ByteString 和 Vector 对象涉及的微妙之处是什么。
Since a ByteString
is a constructor with ForeignPtr
:
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
If I have a function that returns ByteString
, then given an input, say a constant Word8
, the function will return a ByteString with non-deterministic ForeignPtr value - as to what that value will be is determined by the memory manager.
So, does that mean that a function that returns ByteString is not pure? That doesn't seem to be the case obviously, if you have used ByteString and Vector libraries. Surely, it would have been discussed widely if it were the case (and hopefully show up on top of google search). How is that purity enforced?
The reason for asking this question is I am curious what are the subtle points involved in using ByteString and Vector objects, from the GHC compiler perspective, given ForeignPtr member in their constructor.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
无法从
Data.ByteString
模块外部观察ForeignPtr
内部指针的值;它的实现内部是不纯的,但外部是纯的,因为它确保只要您看不到ByteString<内部,就可以维持纯的不变量。 /code> 构造函数 - 你不能这样做,因为它没有导出。
这是 Haskell 中的一种常见技术:在幕后用不安全的技术实现某些东西,但暴露一个纯接口;您可以获得不安全技术带来的性能和功耗,而不会影响 Haskell 的安全性。 泄漏其抽象的可能性更少吗?:))
(当然,实现模块可能存在错误,但是您认为如果
ByteString
是用C编写的, 就微妙之处而言,如果您从用户的角度进行讨论,请不要担心:您可以使用 ByteString 和 Vector 库导出的任何函数而无需担心,只要它们不以unsafe.它们都是非常成熟且经过充分测试的库,因此您根本不应该遇到任何纯度问题,如果您遇到了问题,那就是库中的错误,您应该报告它。
至于编写自己的代码来通过不安全的内部实现提供外部安全性,规则非常简单:保持引用透明度。
以 ByteString 为例,构造 ByteString 的函数使用 unsafePerformIO 来分配数据块,然后将其变异并放入构造函数中。如果我们导出构造函数,那么用户代码将能够获取
ForeignPtr
。这有问题吗?为了确定是否是这样,我们需要找到一个纯函数(即不在IO
中)来让我们区分以这种方式分配的两个ForeignPtr。快速浏览一下文档 表明有这样一个函数:instance Eq (ForeignPtr a)
可以让我们区分这些。因此我们不能允许用户代码访问ForeignPtr
。最简单的方法是不导出构造函数。总结:当您使用不安全的机制来实现某些功能时,请验证它引入的杂质不会泄漏到模块之外,例如通过检查您用它生成的值。
就编译器问题而言,您实际上不必担心它们;虽然这些函数不安全,但除了违反纯度之外,它们不应该允许您做任何比您在
IO
monad 中可以做的事情更危险的事情。一般来说,如果你想做的事情可能会产生真正意想不到的结果,你就必须不遗余力地这样做:例如,你可以使用unsafeDupablePerformIO
如果你可以处理两个线程评估的可能性同时使用unsafeDupablePerformIO m
形式的相同 thunk。unsafePerformIO
比unsafeDupablePerformIO
稍慢,因为它可以防止这种情况发生。 (在使用 GHC 正常执行期间,程序中的 thunk 可以同时由两个线程求值;这通常不是问题,因为两次求值相同的纯值应该不会产生不利的副作用(根据定义),但是在编写不安全代码时,这是你必须考虑的事情。)GHC
unsafePerformIO
(和unsafeDupablePerformIO
,如我上面链接的)的文档详细介绍了您可能遇到的一些陷阱;类似的文档unsafeCoerce#
(应通过其可移植名称 Unsafe.Coerce.unsafeCoerce)。There is no way to observe the value of the pointer inside the
ForeignPtr
from outside theData.ByteString
module; its implementation is internally impure, but externally pure, because it makes sure that the invariants required to be pure are maintained as long as you cannot see inside theByteString
constructor — which you can't, because it's not exported.This is a common technique in Haskell: implementing something with unsafe techniques under the hood, but exposing a pure interface; you get both the performance and power unsafe techniques bring, without compromising Haskell's safety. (Of course, the implementation modules can have bugs, but do you think
ByteString
would be less likely to leak its abstraction if it was written in C? :))As far as the subtle points go, if you're talking from a user's perspective, don't worry: you can use any function the ByteString and Vector libraries export without worrying, as long as they don't start with
unsafe
. They are both very mature and well-tested libraries, so you shouldn't run into any purity problems at all, and if you do, that's a bug in the library, and you should report it.As far as writing your own code that provides external safety with an unsafe internal implementation, the rule is very simple: maintain referential transparency.
Taking ByteString as an example, the functions to construct ByteStrings use
unsafePerformIO
to allocate blocks of data, which they then mutate and put in the constructor. If we exported the constructor, then user code would be able to get at theForeignPtr
. Is this problematic? To determine whether it is, we need to find a pure function (i.e. not inIO
) that lets us distinguish two ForeignPtrs allocated in this way. A quick glance at the documentation shows that there is such a function:instance Eq (ForeignPtr a)
would let us distinguish these. So we must not allow user code to access theForeignPtr
. The easiest way to do this is to not export the constructor.In summary: When you use an unsafe mechanism to implement something, verify that the impurity it introduces cannot leak outside of the module, e.g. by inspecting the values you produce with it.
As far as compiler issues go, you shouldn't really have to worry about them; while the functions are unsafe, they shouldn't allow you to do anything more dangerous, beyond violating purity, than you can do in the
IO
monad to start with. Generally, if you want to do something that could produce really unexpected results, you'll have to go out of your way to do so: for instance, you can useunsafeDupablePerformIO
if you can deal with the possibility of two threads evaluating the same thunk of the formunsafeDupablePerformIO m
simultaneously.unsafePerformIO
is slightly slower thanunsafeDupablePerformIO
because it prevents this from happening. (Thunks in your program can be evaluated by two threads simultaneously during normal execution with GHC; this is normally not a problem, as evaluating the same pure value twice should have no adverse side-effects (by definition), but when writing unsafe code, it's something you have to take into account.)The GHC documentation for
unsafePerformIO
(andunsafeDupablePerformIO
, as I linked above) details some pitfalls you might run into; similarly the documentation forunsafeCoerce#
(which should be used through its portable name, Unsafe.Coerce.unsafeCoerce).