如何保持“状态”?当您使用 Haskell (HaXml) 进行 SAX 解析时

发布于 2024-12-15 18:14:10 字数 1264 浏览 0 评论 0原文

我是 Haskell 的新手,对于我的第一个真正的问题 使用 Haskell,我尝试使用 HaXml SAX 解析来解析一个巨大的 XML 文件。

我遇到的最大问题是如何找出封闭的内容 任何特定“charData”SaxElement 的元素标记。如果我在做 在命令式语言中,我只会有一个有状态数组 当 SAX 事件发生时维护元素标签堆栈的对象。我 当“SAX.SaxElementOpen”为时,会将元素名称推送到堆栈 遇到“SAX.SaxElementClose”时弹出一个。 然后,如果我得到一个“SAX.SaxCharData”事件/元素,我可以看看 知道

现在我试图在 Haskell 中解决这个问题,但我不 如何解决缺乏全局状态变量的问题。我只有一个 关于 Monad 的作用的模糊概念,所以如果它们是解决方案,我可以使用 一两个小费。

希望这里有足够的代码来显示我已经走了多远:

module Main where

import qualified Text.XML.HaXml.SAX as SAX
import Text.XML.HaXml
import Data.Maybe
import Text.XML.HaXml.Namespaces

main = let inputFilename = "/path/to/file.xml" in
    do content <- readFile inputFilename
       let (elements, error) = SAX.saxParse inputFilename content
       mapM_ putStrLn (summarizeElements elements)

summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = filter (\s -> length s > 0) $ map summarizeElement elements

summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
    (SAX.SaxElementOpen name attrs)  -> myProcessElem name attrs
    (SAX.SaxCharData charData)       -> myProcessCharData charData 
    (SAX.SaxElementTag name attrs)  -> myProcessElem name attrs
    _ -> ""

I'm a total newbie at Haskell, and for my first real problem problem
with Haskell I'm trying to parse a huge XML file with HaXml SAX parsing.

The big problem I'm running into is how to figure out what the enclosing
element tag of any particular "charData" SaxElement is. If I were doing
this in an imperative language, I would just have a stateful Array
object that maintains the element tag stack as SAX events happen. I
would push an element name to the stack when a "SAX.SaxElementOpen" is
encountered, and pop one off when "SAX.SaxElementClose" is encountered.
Then if I got a "SAX.SaxCharData" event/element, I could just look at
the top of the stack to see what tag it was enclosed in.

Now that I am trying to solve this problem in Haskell, I have no idea
how to get around the lack of global stateful variables. I only have a
vague notion of what Monads do, so if they are the solution, I could use
a tip or two.

Here is hopefully enough code to show how far I've gotten:

module Main where

import qualified Text.XML.HaXml.SAX as SAX
import Text.XML.HaXml
import Data.Maybe
import Text.XML.HaXml.Namespaces

main = let inputFilename = "/path/to/file.xml" in
    do content <- readFile inputFilename
       let (elements, error) = SAX.saxParse inputFilename content
       mapM_ putStrLn (summarizeElements elements)

summarizeElements :: [SAX.SaxElement] -> [String]
summarizeElements elements = filter (\s -> length s > 0) $ map summarizeElement elements

summarizeElement :: SAX.SaxElement -> String
summarizeElement element = case element of
    (SAX.SaxElementOpen name attrs)  -> myProcessElem name attrs
    (SAX.SaxCharData charData)       -> myProcessCharData charData 
    (SAX.SaxElementTag name attrs)  -> myProcessElem name attrs
    _ -> ""

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

千寻… 2024-12-22 18:14:10

这里的问题是 map 没有按照您的意愿携带状态。一种简单的方法是将您想要的内容编写为递归函数,通过递归调用传递状态。您需要决定在状态堆栈中保留什么类型的值,但这只是一个问题......

go :: MyStack -> [SAX.SaxElement] -> [String]
go _ [] = []
go s (e:es) = myProcessElem e : go s' es
  where s' = pushPop s

The problem here is that map does not carry state as you wish. A straightforward approach is to write what you want as a recursive function that passes state through the recursive calls. You will need to decide what type of value you keep on your state stack, but then it's just a matter of...

go :: MyStack -> [SAX.SaxElement] -> [String]
go _ [] = []
go s (e:es) = myProcessElem e : go s' es
  where s' = pushPop s
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文