如何在 Haskell 中处理无限的 IO 对象列表？

发布于 2024-12-09 18:00:45 字数 1033 浏览 2 评论 0原文

我正在编写一个从文件列表中读取的程序。每个文件要么包含到下一个文件的链接，要么标记它是链的末尾。

作为 Haskell 的新手，处理这个问题的惯用方法似乎是为此目的提供一个可能文件的惰性列表，到目前为止

getFirstFile :: String -> DataFile
getNextFile :: Maybe DataFile -> Maybe DataFile

loadFiles :: String -> [Maybe DataFile]
loadFiles = iterate getNextFile . Just . getFirstFile

getFiles :: String -> [DataFile]
getFiles = map fromJust . takeWhile isJust . loadFiles

，一切都很好。唯一的问题是，由于 getFirstFile 和 getNextFile 都需要打开文件，因此我需要将它们的结果放在 IO monad 中。这给出了修改后的形式

getFirstFile :: String -> IO DataFile
getNextFile :: Maybe DataFile -> IO (Maybe DataFile)

loadFiles :: String -> [IO Maybe DataFile]
loadFiles = iterate (getNextFile =<<) . Just . getFirstFile

getFiles :: String -> IO [DataFile]
getFiles = liftM (map fromJust . takeWhile isJust) . sequence . loadFiles

。问题在于，由于 iterate 返回一个无限列表，因此序列变成了无限循环。我不知道如何从这里继续。是否有一种更惰性的序列形式不会命中所有列表元素？我是否应该重新调整映射并 takeWhile 以便在每个列表元素的 IO monad 内进行操作？或者我是否需要放弃整个无限列表过程并编写一个递归函数来手动终止列表？

原文

I'm writing a program that reads from a list of files. The each file either contains a link to the next file or marks that it's the end of the chain.

Being new to Haskell, it seemed like the idiomatic way to handle this is is a lazy list of possible files to this end, I have

getFirstFile :: String -> DataFile
getNextFile :: Maybe DataFile -> Maybe DataFile

loadFiles :: String -> [Maybe DataFile]
loadFiles = iterate getNextFile . Just . getFirstFile

getFiles :: String -> [DataFile]
getFiles = map fromJust . takeWhile isJust . loadFiles

So far, so good. The only problem is that, since getFirstFile and getNextFile both need to open files, I need their results to be in the IO monad. This gives the modified form of

getFirstFile :: String -> IO DataFile
getNextFile :: Maybe DataFile -> IO (Maybe DataFile)

loadFiles :: String -> [IO Maybe DataFile]
loadFiles = iterate (getNextFile =<<) . Just . getFirstFile

getFiles :: String -> IO [DataFile]
getFiles = liftM (map fromJust . takeWhile isJust) . sequence . loadFiles

The problem with this is that, since iterate returns an infinite list, sequence becomes an infinite loop. I'm not sure how to proceed from here. Is there a lazier form of sequence that won't hit all of the list elements? Should I be rejiggering the map and takeWhile to be operating inside the IO monad for each list element? Or do I need to drop the whole infinite list process and write a recursive function to terminate the list manually?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

亂 2024-12-16 18:00:45

朝着正确方向迈出的一步

令我困惑的是getNextFile。和我一起进入一个简化的世界，我们还没有处理 IO。类型是Maybe DataFile ->也许是数据文件。在我看来，这应该只是 DataFile ->也许是DataFile，我会在这种调整可能的假设下进行操作。这看起来很适合unfoldr。我要做的第一件事是制作我自己的 Expandr 简化版本，它不太通用，但使用起来更简单。

import Data.List

-- unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
myUnfoldr :: (a -> Maybe a) -> a -> [a]
myUnfoldr f v = v : unfoldr (fmap tuplefy . f) v
  where tuplefy x = (x,x)

现在类型 f :: a ->也许 a 匹配 getNextFile :: DataFile ->也许 DataFile

getFiles :: String -> [DataFile]
getFiles = myUnfoldr getNextFile . getFirstFile

很漂亮，对吧？ unfoldr 很像 iterate，只不过一旦它命中 Nothing，它就会终止列表。

现在，我们有一个问题。 IO。我们怎样才能用 IO 来做同样的事情呢？甚至不要考虑“不得命名的函数”。我们需要一个增强的展开器来处理这个问题。幸运的是，unfoldr 的源代码< /a> 可供我们使用。

unfoldr      :: (b -> Maybe (a, b)) -> b -> [a]
unfoldr f b  =
  case f b of
   Just (a,new_b) -> a : unfoldr f new_b
   Nothing        -> []

现在我们需要什么？健康剂量的IO。 liftM2 Expandr 几乎为我们提供了正确的类型，但这次不会完全削减它。

实际的解决方案

unfoldrM :: Monad m => (b -> m (Maybe (a, b))) -> b -> m [a]
unfoldrM f b = do
  res <- f b
  case res of
    Just (a, b') -> do
      bs <- unfoldrM f b'
      return $ a : bs
    Nothing -> return []

这是一个相当简单的转换；我想知道是否有一些组合器可以完成同样的任务。

有趣的事实：我们现在可以定义 unfoldr fb = runIdentity $unfoldrM (return . f) b

让我们再次定义一个简化的 myUnfoldrM，我们只需添加一个 liftM 在那里：

myUnfoldrM :: Monad m => (a -> m (Maybe a)) -> a -> m [a]
myUnfoldrM f v = (v:) `liftM` unfoldrM (liftM (fmap tuplefy) . f) v
  where tuplefy x = (x,x)

现在我们已经准备好了，就像以前一样。

getFirstFile :: String -> IO DataFile
getNextFile :: DataFile -> IO (Maybe DataFile)

getFiles :: String -> IO [DataFile]
getFiles str = do
  firstFile <- getFirstFile str
  myUnfoldrM getNextFile firstFile

-- alternatively, to make it look like before
getFiles' :: String -> IO [DataFile]
getFiles' = myUnfoldrM getNextFile <=< getFirstFile

顺便说一下，我使用 data DataFile = NoClueWhatGoesHere 以及 getFirstFile 和 getNextFile 的类型签名对所有这些进行了类型检查，并将它们的定义设置为未定义。

[edit] 更改了 myUnfoldr 和 myUnfoldrM 使其行为更像 iterate，包括结果列表中的初始值。

[编辑]关于展开的其他见解：

如果您很难理解展开，Collatz 序列可能是最简单的例子之一。

collatz :: Integral a => a -> Maybe a
collatz 1 = Nothing -- the sequence ends when you hit 1
collatz n | even n    = Just $ n `div` 2
          | otherwise = Just $ 3 * n + 1

collatzSequence :: Integral a => a -> [a]
collatzSequence = myUnfoldr collatz

请记住，myUnfoldr 是“下一个种子”和“当前输出值”相同的情况下的简化展开，就像 collatz 的情况一样。考虑到 myUnfoldr 在 unfoldr 和 tuplfy x = (x,x) 方面的简单定义，这种行为应该很容易看出。

ghci> collatzSequence 9
[9,28,14,7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]

更多，大多是不相关的想法

其余的与这个问题完全无关，但我就是忍不住沉思。我们可以用 myUnfoldrM 来定义 myUnfoldr：

myUnfoldr f v = runIdentity $ myUnfoldrM (return . f) v

看起来很熟悉吗？我们甚至可以抽象这个模式：

sinkM :: ((a -> Identity b) -> a -> Identity c) -> (a -> b) -> a -> c
sinkM hof f = runIdentity . hof (return . f)

unfoldr = sinkM unfoldrM
myUnfoldr = sinkM myUnfoldrM

形式的函数

sinkM 应该能够“下沉”（与“提升”相反）任何Monad m =>; 。 (a→mb)→一个-> m c 。

因为这些函数中的 Monad m 可以与 sinkM 的 Identity monad 约束统一。但是，我没有看到任何sinkM实际上有用的东西。

A step in the right direction

What puzzles me is getNextFile. Step into a simplified world with me, where we're not dealing with IO yet. The type is Maybe DataFile -> Maybe DataFile. In my opinion, this should simply be DataFile -> Maybe DataFile, and I will operate under the assumption that this adjustment is possible. And that looks like a good candidate for unfoldr. The first thing I am going to do is make my own simplified version of unfoldr, which is less general but simpler to use.

import Data.List

-- unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
myUnfoldr :: (a -> Maybe a) -> a -> [a]
myUnfoldr f v = v : unfoldr (fmap tuplefy . f) v
  where tuplefy x = (x,x)

Now the type f :: a -> Maybe a matches getNextFile :: DataFile -> Maybe DataFile

getFiles :: String -> [DataFile]
getFiles = myUnfoldr getNextFile . getFirstFile

Beautiful, right? unfoldr is a lot like iterate, except once it hits Nothing, it terminates the list.

Now, we have a problem. IO. How can we do the same thing with IO thrown in there? Don't even think about The Function Which Shall Not Be Named. We need a beefed up unfoldr to handle this. Fortunately, the source for unfoldr is available to us.

unfoldr      :: (b -> Maybe (a, b)) -> b -> [a]
unfoldr f b  =
  case f b of
   Just (a,new_b) -> a : unfoldr f new_b
   Nothing        -> []

Now what do we need? A healthy dose of IO. liftM2 unfoldr almost gets us the right type, but won't quite cut it this time.

An actual solution

unfoldrM :: Monad m => (b -> m (Maybe (a, b))) -> b -> m [a]
unfoldrM f b = do
  res <- f b
  case res of
    Just (a, b') -> do
      bs <- unfoldrM f b'
      return $ a : bs
    Nothing -> return []

It is a rather straightforward transformation; I wonder if there is some combinator that could accomplish the same.

Fun fact: we can now define unfoldr f b = runIdentity $ unfoldrM (return . f) b

Let's again define a simplified myUnfoldrM, we just have to sprinkle in a liftM in there:

myUnfoldrM :: Monad m => (a -> m (Maybe a)) -> a -> m [a]
myUnfoldrM f v = (v:) `liftM` unfoldrM (liftM (fmap tuplefy) . f) v
  where tuplefy x = (x,x)

And now we're all set, just like before.

getFirstFile :: String -> IO DataFile
getNextFile :: DataFile -> IO (Maybe DataFile)

getFiles :: String -> IO [DataFile]
getFiles str = do
  firstFile <- getFirstFile str
  myUnfoldrM getNextFile firstFile

-- alternatively, to make it look like before
getFiles' :: String -> IO [DataFile]
getFiles' = myUnfoldrM getNextFile <=< getFirstFile

By the way, I typechecked all of these with data DataFile = NoClueWhatGoesHere, and the type signatures for getFirstFile and getNextFile, with their definitions set to undefined.

[edit] changed myUnfoldr and myUnfoldrM to behave more like iterate, including the initial value in the list of results.

[edit] Additional insight on unfolds:

If you have a hard time wrapping your head around unfolds, the Collatz sequence is possibly one of the simplest examples.

collatz :: Integral a => a -> Maybe a
collatz 1 = Nothing -- the sequence ends when you hit 1
collatz n | even n    = Just $ n `div` 2
          | otherwise = Just $ 3 * n + 1

collatzSequence :: Integral a => a -> [a]
collatzSequence = myUnfoldr collatz

Remember, myUnfoldr is a simplified unfold for the cases where the "next seed" and the "current output value" are the same, as is the case for collatz. This behavior should be easy to see given myUnfoldr's simple definition in terms of unfoldr and tuplefy x = (x,x).

ghci> collatzSequence 9
[9,28,14,7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]

More, mostly unrelated thoughts

The rest has absolutely nothing to do with the question, but I just couldn't resist musing. We can define myUnfoldr in terms of myUnfoldrM:

myUnfoldr f v = runIdentity $ myUnfoldrM (return . f) v

Look familiar? We can even abstract this pattern:

sinkM :: ((a -> Identity b) -> a -> Identity c) -> (a -> b) -> a -> c
sinkM hof f = runIdentity . hof (return . f)

unfoldr = sinkM unfoldrM
myUnfoldr = sinkM myUnfoldrM

sinkM should work to "sink" (opposite of "lift") any function of the form

Monad m => (a -> m b) -> a -> m c.

since the Monad m in those functions can be unified with the Identity monad constraint of sinkM. However, I don't see anything that sinkM would actually be useful for.

回复收藏 0 原文

書生途 2024-12-16 18:00:45

sequenceWhile :: Monad m => (a -> Bool) -> [m a] -> m [a]
sequenceWhile _ [] = return []
sequenceWhile p (m:ms) = do
  x <- m
  if p x
    then liftM (x:) $ sequenceWhile p ms
    else return []

产量：

getFiles = liftM (map fromJust) . sequenceWhile isJust . loadFiles

sequenceWhile :: Monad m => (a -> Bool) -> [m a] -> m [a]
sequenceWhile _ [] = return []
sequenceWhile p (m:ms) = do
  x <- m
  if p x
    then liftM (x:) $ sequenceWhile p ms
    else return []

Yields:

getFiles = liftM (map fromJust) . sequenceWhile isJust . loadFiles

回复收藏 0 原文

不即不离 2024-12-16 18:00:45

正如您所注意到的，IO 结果不能是惰性的，因此您不能（轻松）使用 IO 构建无限列表。然而，有一个出路，在 unsafeInterleaveIO；有了这个，您可以执行以下操作：

ioList startFile = do
    v <- processFile startFile
    continuation <- unsafeInterleaveIO (nextFile startFile >>= ioList)
    return (v:continuation)

不过，这里要小心很重要 - 您只是将 ioList 的结果推迟到将来某个不可预测的时间。事实上，它可能永远不会运行。所以当你像这样聪明™时要非常小心。

就我个人而言，我只会构建一个手动递归函数。

As you have noticed, IO results can't be lazy, so you can't (easily) build an infinite list using IO. There is a way out, however, in unsafeInterleaveIO; with this, you can do something like:

ioList startFile = do
    v <- processFile startFile
    continuation <- unsafeInterleaveIO (nextFile startFile >>= ioList)
    return (v:continuation)

It's important to be careful here, though - you've just deferred the results of ioList to some unpredictable time in the future. It may never be run at all, in fact. So be very careful when you're being Clever™ like this.

Personally, I would just build a manual recursive function.

回复收藏 0 原文

小伙你站住 2024-12-16 18:00:45

惰性和 I/O 是一个棘手的组合。使用 unsafeInterleaveIO 是在 IO monad 中生成惰性列表的一种方法（这是标准 getContents、readFile 等使用的技术）。然而，尽管这很方便，但它会将纯代码暴露给可能的 I/O 错误，并使释放资源（例如文件句柄）变得不确定。这就是为什么现在大多数“严肃”的 Haskell 应用程序（尤其是那些关心效率的应用程序）使用称为 Enumerators 和 Iteratees 的东西来进行流式 I/O。 Hackage 中实现这一概念的一个库是 enumerator。

您可能很乐意在应用程序中使用惰性 I/O，但我想我仍然会以此作为解决此类问题的另一种方法的示例。您可以在此处和此处。

例如，您的数据文件流可以实现为枚举器，如下所示：

import Data.Enumerator
import Control.Monad.IO.Class (liftIO)

iterFiles :: String -> Enumerator DataFile IO b
iterFiles s = first where
    first (Continue k) = do
        file <- liftIO nbsp;getFirstFile s
        k (Chunks [file]) >>== next file
    first step = returnI step

    next prev (Continue k) = do
        file <- liftIO nbsp;getNextFile (Just prev)
        case file of
            Nothing -> k EOF
            Just df -> k (Chunks [df]) >>== next df
    next _ step = returnI step

Laziness and I/O are a tricky combination. Using unsafeInterleaveIO is one way to produce lazy lists in the IO monad (and this is the technique used by the standard getContents, readFile and friends). However, as convenient as this is, it exposes pure code to possible I/O errors and makes makes releasing resources (such as file handles) non-deterministic. This is why most "serious" Haskell applications (especially those concerned with efficiency) nowadays use things called Enumerators and Iteratees for streaming I/O. One library in Hackage that implements this concept is enumerator.

You are probably fine with using lazy I/O in your application, but I thought I'd still give this as an example of another way to approach these kind of problems. You can find more in-depth tutorials about iteratees here and here.

For example, your stream of DataFiles could be implemented as an Enumerator like this:

import Data.Enumerator
import Control.Monad.IO.Class (liftIO)

iterFiles :: String -> Enumerator DataFile IO b
iterFiles s = first where
    first (Continue k) = do
        file <- liftIO $ getFirstFile s
        k (Chunks [file]) >>== next file
    first step = returnI step

    next prev (Continue k) = do
        file <- liftIO $ getNextFile (Just prev)
        case file of
            Nothing -> k EOF
            Just df -> k (Chunks [df]) >>== next df
    next _ step = returnI step

回复收藏 0 原文

~没有更多了~