Haskell 中的随机枢轴快速排序

发布于 2024-10-20 19:20:55 字数 551 浏览 2 评论 0原文

是否有可能在 Haskell 中实现快速排序(使用 RANDOM-PIVOT),但仍然有一个简单的 Ord a =>; [a]->[a] 签名?

我开始了解 Monad,目前,我将 monad 解释为某种“命令模式”,这对于 IO 非常有用。

所以,我知道返回随机数的函数实际上应该返回像 IO 这样的一元值,因为否则它会破坏引用透明度。我还明白,应该没有办法从返回的一元值中“提取”随机整数,因为否则,它会再次破坏引用透明度。

但是,我仍然认为应该可以实现“纯”[a]->[a] 快速排序函数,即使它使用随机枢轴,因为它是引用透明的。从我的角度来看,随机枢轴只是一个实现细节,不应该改变函数的签名

OBS:我实际上对特定的快速排序问题并不感兴趣(所以,我不想听起来粗鲁,但我'我不是在寻找“使用合并排序”“随机枢轴在实践中不会提高性能”之类的答案)我实际上对如何实现“纯”感兴趣' 函数内部使用“不纯”函数,在快速排序之类的情况下,我可以确保该函数实际上是一个纯函数。

快速排序只是一个很好的例子。

Is it possible to implement a quicksort in Haskell (with RANDOM-PIVOT) that still has a simple Ord a => [a]->[a] signature?

I'm starting to understand Monads, and, for now, I'm kind of interpreting monads as somethink like a 'command pattern', which works great for IO.

So, I understand that a function that returns a random number should actually return a monadic value like IO, because, otherwise, it would break referential transparency. I also understand that there should be no way to 'extract' the random integer from the returned monadic value, because, otherwise, it would, again, break referential transparency.

But yet, I still think that it should be possible to implement a 'pure' [a]->[a] quicksort function, even if it uses random pivot, because, it IS referential transparent. From my point of view, the random pivot is just a implementation detail, and shouldn't change the function's signature

OBS: I'm not actually interested in the specific quicksort problem (so, I don't want to sound rude but I'm not looking for "use mergesort" or "random pivot doesn't increase performance in practice" kind of answers) I'm actually interested in how to implement a 'pure' function that uses 'impure' functions inside it, in cases like quicksort, where I can assure that the function actually is a pure one.

Quicksort is just a good example.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

南薇 2024-10-27 19:20:55

您做出了错误的假设,即选择枢轴点只是一个实现细节。考虑集合上的部分排序。就像卡上的快速排序一样,其中

卡 a <如果面值较小但要评估布尔值,则卡片 b:

  4 spades < 4 hearts (false)
  4 hearts < 4 spades (false)
  4 hearts = 4 spades (false)

在这种情况下,主元的选择将决定卡片的最终顺序。 以完全相同的方式

对于像这样的函数,

a = get random integer  
b = a + 3
print b 

由 a 确定。如果您随机选择某些内容,那么您的计算是或可能是不确定的。

You are making a false assumption that picking the pivot point is just an implementation detail. Consider a partial ordering on a set. Like a quicksort on cards where

card a < card b if the face value is less but if you were to evaluate booleans:

  4 spades < 4 hearts (false)
  4 hearts < 4 spades (false)
  4 hearts = 4 spades (false)

In that case the choice of pivots would determine the final ordering of the cards. In precisely the same way

for a function like

a = get random integer  
b = a + 3
print b 

is determined by a. If you are randomly choosing something then your computation is or could be non deterministic.

旧伤还要旧人安 2024-10-27 19:20:55

好的,看看这个。

选择从 hashable 包 复制的部分,以及 voodoo magic language pragmas

{-# LANGUAGE FlexibleInstances, UndecidableInstances, NoMonomorphismRestriction, OverlappingInstances #-}

import System.Random (mkStdGen, next, split)
import Data.List (foldl')
import Data.Bits (shiftL, xor)

class Hashable a where
    hash :: a -> Int

instance (Integral a) => Hashable a where
    hash = fromIntegral

instance Hashable Char where
    hash = fromEnum

instance (Hashable a) => Hashable [a] where
    hash = foldl' combine 0 . map hash

-- ask the authors of the hashable package about this if interested
combine h1 h2 = (h1 + h1 `shiftL` 5) `xor` h2

OK,所以现在我们可以采取任何 Hashable 的列表,并将其转换为 Int。我在这里提供了 CharIntegral a 实例,更多更好的实例位于可哈希包中,它还允许加盐等。

这只是我们可以制作一个数字生成器。

genFromHashable = mkStdGen . hash

现在是有趣的部分。让我们编写一个带有随机数生成器、比较器函数和列表的函数。然后,我们将通过咨询生成器来选择一个主元,并咨询比较器来对列表进行分区,从而对列表进行排序。

qSortByGen _ _ [] = []
qSortByGen g f xs = qSortByGen g'' f l ++ mid ++ qSortByGen g''' f r
    where (l, mid, r) = partition (`f` pivot) xs
          pivot = xs !! (pivotLoc `mod` length xs)
          (pivotLoc, g') = next g
          (g'', g''') = split g'

partition f = foldl' step ([],[],[])
    where step (l,mid,r) x = case f x of
              LT -> (x:l,mid,r)
              EQ -> (l,x:mid,r)
              GT -> (l,mid,x:r)

库函数next从生成器中获取一个Int,并生成一个新的生成器。 split 将生成器分叉为两个不同的生成器。

我的函数partition使用f :: a ->排序 将列表划分为三个列表。如果你了解折叠的话,应该就很清楚了。 (请注意,它不会保留子列表中元素的初始顺序;它会颠倒它们。如果这是一个问题,使用foldr 可以解决这个问题。)qSortByGen 的工作方式就像我之前所说的:查阅枢轴的生成器,对列表进行分区,分叉生成器以在两个递归调用中使用,对左侧和右侧进行递归排序,并将其全部连接在一起。

从这里可以很容易地编写方便的函数,

qSortBy f xs = qSortByGen (genFromHashable xs) f xs
qSort = qSortBy compare

请注意最终函数的签名。

ghci> :t qSort
qSort :: (Ord a, Hashable a) => [a] -> [a]

列表内的类型必须同时实现 HashableOrd。这是您所要求的“纯”功能,并且有一个合乎逻辑的附加要求。越通用的功能对其要求的限制越少。

ghci> :t qSortBy
qSortBy :: (Hashable a) => (a -> a -> Ordering) -> [a] -> [a]
ghci> :t qSortByGen
qSortByGen
  :: (System.Random.RandomGen t) =>
     t -> (a -> a -> Ordering) -> [a] -> [a]

最终说明

对于所有输入,qSort 的行为方式完全相同。 “随机”主元选择是。事实上,确定性。但它通过散列列表然后播种随机数生成器而变得模糊,使其对我来说足够“随机”。 ;)

qSort 也仅适用于长度小于 maxBound :: Int 的列表,ghci 告诉我是 9,223,372,036,854,775,807。我认为负索引会出现问题,但在我的临时测试中我还没有遇到它。


或者,您可以使用 IO monad 来实现“更真实”的随机性。

qSortIO xs = do g <- getStdGen -- add getStdGen to your imports
                return $ qSortByGen g compare xs


ghci> :t qSortIO
qSortIO :: (Ord a) => [a] -> IO [a]
ghci> qSortIO "Hello world"
" Hdellloorw"
ghci> qSort "Hello world"
" Hdellloorw"

OK, check this out.

Select portions copied form the hashable package, and voodoo magic language pragmas

{-# LANGUAGE FlexibleInstances, UndecidableInstances, NoMonomorphismRestriction, OverlappingInstances #-}

import System.Random (mkStdGen, next, split)
import Data.List (foldl')
import Data.Bits (shiftL, xor)

class Hashable a where
    hash :: a -> Int

instance (Integral a) => Hashable a where
    hash = fromIntegral

instance Hashable Char where
    hash = fromEnum

instance (Hashable a) => Hashable [a] where
    hash = foldl' combine 0 . map hash

-- ask the authors of the hashable package about this if interested
combine h1 h2 = (h1 + h1 `shiftL` 5) `xor` h2

OK, so now we can take a list of anything Hashable and turn it into an Int. I've provided Char and Integral a instances here, more and better instances are in the hashable packge, which also allows salting and stuff.

This is all just so we can make a number generator.

genFromHashable = mkStdGen . hash

So now the fun part. Let's write a function that takes a random number generator, a comparator function, and a list. Then we'll sort the list by consulting the generator to select a pivot, and the comparator to partition the list.

qSortByGen _ _ [] = []
qSortByGen g f xs = qSortByGen g'' f l ++ mid ++ qSortByGen g''' f r
    where (l, mid, r) = partition (`f` pivot) xs
          pivot = xs !! (pivotLoc `mod` length xs)
          (pivotLoc, g') = next g
          (g'', g''') = split g'

partition f = foldl' step ([],[],[])
    where step (l,mid,r) x = case f x of
              LT -> (x:l,mid,r)
              EQ -> (l,x:mid,r)
              GT -> (l,mid,x:r)

Library functions: next grabs an Int from the generator, and produces a new generator. split forks the generator into two distinct generators.

My functions: partition uses f :: a -> Ordering to partition the list into three lists. If you know folds, it should be quite clear. (Note that it does not preserve the initial ordering of the elements in the sublists; it reverses them. Using a foldr could remedy this were it an issue.) qSortByGen works just like I said before: consult the generator for the pivot, partition the list, fork the generator for use in the two recursive calls, recursively sort the left and right sides, and concatenate it all together.

Convenience functions are easy to compose from here

qSortBy f xs = qSortByGen (genFromHashable xs) f xs
qSort = qSortBy compare

Notice the final function's signature.

ghci> :t qSort
qSort :: (Ord a, Hashable a) => [a] -> [a]

The type inside the list must implement both Hashable and Ord. There's the "pure" function you were asking for, with one logical added requirement. The more general functions are less restrictive in their requirements.

ghci> :t qSortBy
qSortBy :: (Hashable a) => (a -> a -> Ordering) -> [a] -> [a]
ghci> :t qSortByGen
qSortByGen
  :: (System.Random.RandomGen t) =>
     t -> (a -> a -> Ordering) -> [a] -> [a]

Final notes

qSort will behave exactly the same way for all inputs. The "random" pivot selection is. in fact, deterministic. But it is obscured by hashing the list and then seeding a random number generator, making it "random" enough for me. ;)

qSort also only works for lists with length less than maxBound :: Int, which ghci tells me is 9,223,372,036,854,775,807. I thought there would be an issue with negative indexes, but in my ad-hoc testing I haven't run into it yet.


Or, you can just live with the IO monad for "truer" randomness.

qSortIO xs = do g <- getStdGen -- add getStdGen to your imports
                return $ qSortByGen g compare xs


ghci> :t qSortIO
qSortIO :: (Ord a) => [a] -> IO [a]
ghci> qSortIO "Hello world"
" Hdellloorw"
ghci> qSort "Hello world"
" Hdellloorw"
还在原地等你 2024-10-27 19:20:55

在这种情况下,如果您知道该函数是引用透明的,但无法向编译器证明这一点,则可以使用函数 unsafePerformIO :: IO a -> a 来自模块 Data.Unsafe

例如,您可以使用 unsafePerformIO 来获取初始随机状态,然后仅使用此状态执行任何操作。

但请注意:如果不是真正需要,请不要使用它。即便如此,也要三思而后行。 unsafePerformIO 在某种程度上是万恶之源,因为它的后果可能是戏剧性的 - 从强制不同类型到使用此函数使 RTS 崩溃,任何事情都有可能发生。

In such cases, where you know that the function is referentially transparent, but you can't proof it to the compiler, you may use the function unsafePerformIO :: IO a -> a from the module Data.Unsafe.

For instance, you may use unsafePerformIO to get an initial random state and then do anything using just this state.

But please notice: Don't use it if it's not really needed. And even then, think twice about it. unsafePerformIO is somewhat the root of all evil, since it's consequences can be dramatical - anything is possible from coercing different types to crashing the RTS using this function.

如日中天 2024-10-27 19:20:55

Haskell 提供了 ST monad 执行非引用透明操作并获得引用透明结果。

请注意,它不会强制引用透明度;它只是确保潜在的非引用透明临时状态不会泄漏。没有什么可以阻止您返回以不可重现的方式重新排列的经过操作的纯输入数据。最好的办法是以 ST 和纯粹的方式实现相同的事情,并使用 QuickCheck 在随机输入上比较它们。

Haskell provides the ST monad to perform non-referentially-transparent actions with a referentially transparent result.

Note that it doesn't enforce referential transparency; it just insures that potentially non-referentially-transparent temporary state can't leak out. Nothing can prevent you from returning manipulated pure input data that was rearranged in a non-reproducible way. Best is to implement the same thing in both ST and pure ways and use QuickCheck to compare them on random inputs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文