过滤/分支枚举

发布于 2024-11-04 19:53:15 字数 1678 浏览 0 评论 0 原文

我正在使用 enumerator-0.4.10,并且我需要分发处理 传入流的不同部分到不同的迭代器(我是 解析一个巨大的XML文件,不同的子树有不同的 处理逻辑)。一次只有一个 iteratee 处于活动状态 因为子树不相交。

我写了一个简单的例子来过滤流并传递结果 到一个迭代者;请参阅下文。然而,如果有多个嵌套 iteratee 在我看来,我不能再使用 enumeratee 了。我是吗 需要编写我自己的多枚举来保存多个内部 迭代者?还有更好的想法吗?

这是我的(初学者)单个嵌套迭代器的代码:

module Main ( main ) where

import qualified Data.Enumerator as E ( Enumeratee, Step(..), Stream(..),
  checkDone, checkDoneEx, continue, enumList, joinI, run_, yield )
import Data.Enumerator ( ($$), (>>==) )
import qualified Data.Enumerator.List as EL ( consume )

-- cribbed from EL.concatMap
concatMapAccum :: Monad m => (s -> ao -> (s, [ai])) -> s ->
E.Enumeratee ao ai m b
concatMapAccum f s0 = E.checkDone (E.continue . step s0)
  where
    step _ k E.EOF = E.yield (E.Continue k) E.EOF
    step s k (E.Chunks xs) = loop s k xs
    loop s k [] = E.continue (step s k)
    loop s k (x:xs) = case f s x of
      (s', ais) -> k (E.Chunks $ ais) >>==
        E.checkDoneEx (E.Chunks xs) (\k' -> loop s' k' xs)

passFromTo :: Monad m => ((a -> Bool), (a -> Bool)) -> Bool -> E.Enumeratee a a m b
passFromTo (from, to) pass0 =
  concatMapAccum updatePass pass0
    where
      updatePass pass el = case (pass, from el, to el) of
        (True, _, to_el) -> (not to_el, [el])
        (False, True, _) -> (True, [el])
        (False, False, _) -> (False, [])

main :: IO()
main = do
  E.run_ (E.enumList 3 [1..20] $$
    E.joinI $ passFromTo ((\e -> e == 3 || e == 13), (\e -> e == 7 || e == 17)) False $$
    EL.consume) >>= print

$ ./dist/build/StatefulEnumeratee/StatefulEnumeratee
[3,4,5,6,7,13,14,15,16,17]

I am using enumerator-0.4.10, and I need to distribute processing of
different parts of the incoming stream to different iteratees (I am
parsing a huge XML file, and different sub-trees have different
processing logic). Only a single iteratee will be active at a time
since the sub-trees don't intersect.

I wrote a simple example that filters the stream and passes the result
to one iteratee; please see below. However, with multiple nested
iteratees it seems to me that I can no longer use an enumeratee. Do I
need to write my own multi-enumeratee that holds multiple inner
iteratees? Any better ideas?

Here is my (beginner's) code for a single nested iteratee:

module Main ( main ) where

import qualified Data.Enumerator as E ( Enumeratee, Step(..), Stream(..),
  checkDone, checkDoneEx, continue, enumList, joinI, run_, yield )
import Data.Enumerator ( ($), (>>==) )
import qualified Data.Enumerator.List as EL ( consume )

-- cribbed from EL.concatMap
concatMapAccum :: Monad m => (s -> ao -> (s, [ai])) -> s ->
E.Enumeratee ao ai m b
concatMapAccum f s0 = E.checkDone (E.continue . step s0)
  where
    step _ k E.EOF = E.yield (E.Continue k) E.EOF
    step s k (E.Chunks xs) = loop s k xs
    loop s k [] = E.continue (step s k)
    loop s k (x:xs) = case f s x of
      (s', ais) -> k (E.Chunks $ ais) >>==
        E.checkDoneEx (E.Chunks xs) (\k' -> loop s' k' xs)

passFromTo :: Monad m => ((a -> Bool), (a -> Bool)) -> Bool -> E.Enumeratee a a m b
passFromTo (from, to) pass0 =
  concatMapAccum updatePass pass0
    where
      updatePass pass el = case (pass, from el, to el) of
        (True, _, to_el) -> (not to_el, [el])
        (False, True, _) -> (True, [el])
        (False, False, _) -> (False, [])

main :: IO()
main = do
  E.run_ (E.enumList 3 [1..20] $
    E.joinI $ passFromTo ((\e -> e == 3 || e == 13), (\e -> e == 7 || e == 17)) False $
    EL.consume) >>= print

$ ./dist/build/StatefulEnumeratee/StatefulEnumeratee
[3,4,5,6,7,13,14,15,16,17]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

疯狂的代价 2024-11-11 19:53:15

是的,您需要一个将流传递给多个迭代器的枚举器,例如 Data.Iteratee.sequence_Data.Iteratee.Parallel .psequence_ 来自 iteratee-0.8.6。 sequence_ 获取要同时运行的迭代列表,并通过 mapM 处理该列表中的每个输入块。 psequence_ 采用类似的参数,但在单独的 forkIO 线程中运行每个输入迭代器。

在过去的一年里,haskell-cafe 和 iteratee 邮件列表对此进行了一些讨论,例如:http://www.haskell.org/pipermail/haskell-cafe/2011-January/088319.html 需要注意的主要事情是处理来自内部迭代器的错误:在你的在应用程序中,如果一个内部迭代器失败,您是否想要终止所有迭代器或仅终止该迭代器,以及您想要[如何]传播这些错误。

Yes, you need an enumeratee that passes the stream to multiple iteratees, like Data.Iteratee.sequence_ and Data.Iteratee.Parallel.psequence_ from iteratee-0.8.6. sequence_ takes a list of iteratees to run simultaneously, and handles each input chunk by mapM across that list. psequence_ takes similar arguments, but runs each input iteratee in a separate forkIO thread.

There has been some discussion on haskell-cafe and the iteratee mailing lists about these over the past year, eg: http://www.haskell.org/pipermail/haskell-cafe/2011-January/088319.html The main thing to be careful about is handling errors from the inner iteratees: in your application, if one inner iteratee fails do you want to terminate all iteratees or just that one, and [how] do you want to propagate those errors.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文