可以用 IO 代码扩展纯函数吗?

发布于 2024-10-09 20:40:29 字数 4229 浏览 2 评论 0原文

我用 Haskell 编写了一个简单的 XML 解析器。 函数convertXML接收XML文件的内容并返回一个提取值的列表,这些值将被进一步处理。

XML 标签的一个属性还包含产品图像的 URL,我想扩展该功能,以便在找到该标签时也下载它。

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> [String]
convertXML xml = productToCSV products
    where
        productToCSV [] = []
        productToCSV (x:xs) = (getFields x) ++ (productToCSV
                                (elChildren x)) ++ (productToCSV xs)
        getFields elm = case (qName . elName) elm of
                            "product" -> [attrField "uid", attrField "code"]
                            "name" -> [trim $ strContent elm]
                            "annotation" -> [trim $ strContent elm]
                            "text" -> [trim $ strContent elm]
                            "category" -> [attrField "uid", attrField "name"]
                            "manufacturer" -> [attrField "uid",
                                                attrField "name"]
                            "file" -> [getImgName]
                            _ -> []
            where
                attrField fldName = trim . fromJust $
                                        findAttr (unqual fldName) elm
                getImgName = if (map toUpper $ attrField "type") == "FULL"
                                then
                                    -- here I need some IO code
                                    -- to download an image
                                    -- fetchFile :: String -> IO String
                                    attrField "file"
                                else []
        products = findElements (unqual "product") productsTree
        productsTree = fromJust $ findElement (unqual "products") xmlTree
        xmlTree = fromJust $ parseXMLDoc xml

知道如何在 getImgName 函数中插入 IO 代码吗?或者我是否必须将 ConvertXML 函数完全重写为不纯的版本?

更新二 ConvertXML 函数的最终版本。卡尔建议的混合纯/不纯但干净的方式。返回对的第二个参数是一个 IO 操作,该操作运行图像下载并保存到磁盘,并包装存储图像的本地路径列表。

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [String])
convertXML xml = productToCSV products (return [])
    where
        productToCSV :: [Element] -> IO String -> ([String], IO [String])
        productToCSV [] _ = ([], return [])
        productToCSV (x:xs) (ys) = storeFields (getFields x)
                            ( storeFields (productToCSV (elChildren x) (return []))
                                (productToCSV xs ys) )
        getFields elm = case (qName . elName) elm of
                            "product" -> ([attrField "uid", attrField "code"], return [])
                            "name" -> ([trim $ strContent elm], return [])
                            "annotation" -> ([trim $ strContent elm], return [])
                            "text" -> ([trim $ strContent elm], return [])
                            "category" -> ([attrField "uid", attrField "name"], return [])
                            "manufacturer" -> ([attrField "uid",
                                                attrField "name"], return [])
                            "file" -> getImg
                            _ -> ([], return [])
            where
                attrField fldName = trim . fromJust $
                                        findAttr (unqual fldName) elm
                getImg = if (map toUpper $ attrField "type") == "FULL"
                            then
                                ( [attrField "file"], fetchFile url >>=
                                    saveFile localPath >>
                                    return [localPath] )
                                else ([], return [])
                    where
                        fName = attrField "file"
                        localPath = imagesDir ++ "/" ++ fName
                        url = attrField "folderUrl" ++ "/" ++ fName

        storeFields (x1s, y1s) (x2s, y2s) = (x1s ++ x2s, liftM2 (++) y1s y2s)
        products = findElements (unqual "product") productsTree
        productsTree = fromJust $ findElement (unqual "products") xmlTree
        xmlTree = fromJust $ parseXMLDoc xml

I've written a simple XML parser in Haskell.
The function convertXML recieves contents of a XML file and returns a list of extracted values that are further processed.

One attribute of XML tag contains also an URL of a product image and I would like to extend the function to also download it if the tag is found.

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> [String]
convertXML xml = productToCSV products
    where
        productToCSV [] = []
        productToCSV (x:xs) = (getFields x) ++ (productToCSV
                                (elChildren x)) ++ (productToCSV xs)
        getFields elm = case (qName . elName) elm of
                            "product" -> [attrField "uid", attrField "code"]
                            "name" -> [trim $ strContent elm]
                            "annotation" -> [trim $ strContent elm]
                            "text" -> [trim $ strContent elm]
                            "category" -> [attrField "uid", attrField "name"]
                            "manufacturer" -> [attrField "uid",
                                                attrField "name"]
                            "file" -> [getImgName]
                            _ -> []
            where
                attrField fldName = trim . fromJust $
                                        findAttr (unqual fldName) elm
                getImgName = if (map toUpper $ attrField "type") == "FULL"
                                then
                                    -- here I need some IO code
                                    -- to download an image
                                    -- fetchFile :: String -> IO String
                                    attrField "file"
                                else []
        products = findElements (unqual "product") productsTree
        productsTree = fromJust $ findElement (unqual "products") xmlTree
        xmlTree = fromJust $ parseXMLDoc xml

Any idea how to insert an IO code in the getImgName function or do I have to completely rewrite convertXML function to an impure version ?

UPDATE II
Final version of convertXML function. Hybrid pure/impure but clean way suggested by Carl. Second parameter of returned pair is an IO action that runs images downloading and saving to disk and wraps list of local paths where are images stored.

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [String])
convertXML xml = productToCSV products (return [])
    where
        productToCSV :: [Element] -> IO String -> ([String], IO [String])
        productToCSV [] _ = ([], return [])
        productToCSV (x:xs) (ys) = storeFields (getFields x)
                            ( storeFields (productToCSV (elChildren x) (return []))
                                (productToCSV xs ys) )
        getFields elm = case (qName . elName) elm of
                            "product" -> ([attrField "uid", attrField "code"], return [])
                            "name" -> ([trim $ strContent elm], return [])
                            "annotation" -> ([trim $ strContent elm], return [])
                            "text" -> ([trim $ strContent elm], return [])
                            "category" -> ([attrField "uid", attrField "name"], return [])
                            "manufacturer" -> ([attrField "uid",
                                                attrField "name"], return [])
                            "file" -> getImg
                            _ -> ([], return [])
            where
                attrField fldName = trim . fromJust $
                                        findAttr (unqual fldName) elm
                getImg = if (map toUpper $ attrField "type") == "FULL"
                            then
                                ( [attrField "file"], fetchFile url >>=
                                    saveFile localPath >>
                                    return [localPath] )
                                else ([], return [])
                    where
                        fName = attrField "file"
                        localPath = imagesDir ++ "/" ++ fName
                        url = attrField "folderUrl" ++ "/" ++ fName

        storeFields (x1s, y1s) (x2s, y2s) = (x1s ++ x2s, liftM2 (++) y1s y2s)
        products = findElements (unqual "product") productsTree
        productsTree = fromJust $ findElement (unqual "products") xmlTree
        xmlTree = fromJust $ parseXMLDoc xml

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

遇到 2024-10-16 20:40:30

更好的方法是让函数返回要下载的文件列表作为结果的一部分:

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], [URL])

并在单独的函数中下载它们。

The better approach would be to have the function return the list of files to download as part of the result:

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], [URL])

and download them in a separate function.

是你 2024-10-16 20:40:30

Haskell 中类型系统的全部要点是,除了 IO 操作(IO a 类型的值)之外,您无法执行 IO。有多种方法可以违反这一点,但由于与优化和惰性求值的相互作用,它们的行为可能与您的预期完全不同。因此,在您理解 IO 为何以这种方式工作之前,不要尝试使其以不同的方式工作。

但这种设计的一个非常重要的结果是 IO 操作是一流的。只要稍微聪明一点,您就可以将函数编写为这样:

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [Image])

该对中的第二项将是一个 IO 操作,执行时会给出当前图像的列表。这将避免需要在 ConvertXML 之外添加图像加载代码,并且仅当您确实需要图像时才允许执行 IO。

The entire point of the type system in Haskell is that you can't do IO except with IO actions - values of type IO a. There are ways to violate this, but they run the risk of behaving entirely unlike what you'd expect, due to interactions with optimizations and lazy evaluation. So until you understand why IO works the way it does, don't try to make it work differently.

But a very important consequence of this design is that IO actions are first class. With a bit of cleverness, you could write your function as this:

convertXML ::  (Text.XML.Light.Lexer.XmlSource s) => s -> ([String], IO [Image])

The second item in the pair would be an IO action that, when executed, would give a list of the images present. That would avoid the need to have image loading code outside of convertXML, and it would allow you to do IO only if you actually needed the images.

从来不烧饼 2024-10-16 20:40:30

我基本上采用的方法是:

  1. 让函数也给出找到的图像列表,然后用不纯的函数处理它们。懒惰会完成剩下的事情。
  2. 让整个野兽变得不纯粹

我一般更喜欢第一种方法。 d

I basically see to approaches:

  1. let the function give out a list of found images too and process them with an impure function afterwards. Laziness will do the rest.
  2. Make the whole beast impure

I generally like the first approach more. d

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文