将秒差距与 Data.Text 结合使用

发布于 2024-09-30 04:23:26 字数 1488 浏览 8 评论 0原文

使用 Parsec 3.1，可以解析多种类型的输入：

[Char] 和 Text.Parsec.String
Data.ByteString 与 Text.Parsec.ByteString
Data.ByteString.Lazy 与 Text.Parsec.ByteString.Lazy

我没有看到任何东西对于Data.Text 模块。我希望解析 Unicode 内容，而不会遭受 String 低效率的困扰。因此，我基于 Text.Parsec.ByteString 模块创建了以下模块：

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}

module Text.Parsec.Text
    ( Parser, GenParser
    ) where

import Text.Parsec.Prim

import qualified Data.Text as T

instance (Monad m) => Stream T.Text m Char where
    uncons = return . T.uncons

type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st

这样做有意义吗？
它与 Parsec API 的其余部分兼容吗？

附加评论：

我必须在我的解析模块中添加 {-# LANGUAGE NoMonomorphismRestriction #-} pragma 才能使其工作。

解析 Text 是一回事，用 Text 构建 AST 是另一回事。在返回之前，我还需要打包我的String：

module TestText where

import Data.Text as T

import Text.Parsec
import Text.Parsec.Prim
import Text.Parsec.Text

input = T.pack "xxxxxxxxxxxxxxyyyyxxxxxxxxxp"

parser = do
  x1 <- many1 (char 'x')
  y <- many1 (char 'y')
  x2 <- many1 (char 'x')
  return (T.pack x1, T.pack y, T.pack x2)

test = runParser parser () "test" input

原文

Using Parsec 3.1, it is possible to parse several types of inputs:

[Char] with Text.Parsec.String
Data.ByteString with Text.Parsec.ByteString
Data.ByteString.Lazy with Text.Parsec.ByteString.Lazy

I don't see anything for the Data.Text module. I want to parse Unicode content without suffering from the String inefficiencies. So I've created the following module based on the Text.Parsec.ByteString module:

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}

module Text.Parsec.Text
    ( Parser, GenParser
    ) where

import Text.Parsec.Prim

import qualified Data.Text as T

instance (Monad m) => Stream T.Text m Char where
    uncons = return . T.uncons

type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st

Does it make sense to do so?
It this compatible with the rest of the Parsec API?

Additional comments:

I had to add {-# LANGUAGE NoMonomorphismRestriction #-} pragma in my parse modules to make it work.

Parsing Text is one thing, building an AST with Text is another thing. I will also need to pack my String before return:

module TestText where

import Data.Text as T

import Text.Parsec
import Text.Parsec.Prim
import Text.Parsec.Text

input = T.pack "xxxxxxxxxxxxxxyyyyxxxxxxxxxp"

parser = do
  x1 <- many1 (char 'x')
  y <- many1 (char 'y')
  x2 <- many1 (char 'x')
  return (T.pack x1, T.pack y, T.pack x2)

test = runParser parser () "test" input

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

往事风中埋 2024-10-07 04:23:26

自 Parsec 3.1.2 起，内置了对 Data.Text 的支持！
请参阅http://hackage.haskell.org/package/parsec-3.1.2

如果您坚持使用旧版本，其他答案中的代码片段也很有帮助。

回复收藏 0 原文

∝单色的世界 2024-10-07 04:23:26

这看起来正是您需要做的。

它应该与 Parsec 的其余部分兼容，包括 Parsec.Char 解析器。

如果您使用 Cabal 来构建程序，请在程序包描述中添加 parsec-3.1 的上限，以防维护者决定将该实例包含在 Parsec 的未来版本中。

回复收藏 0 原文

谜兔 2024-10-07 04:23:26

我添加了一个函数 parseFromUtf8File 来帮助以高效的方式读取 UTF-8 编码的文件。与元音变音字符完美配合。函数类型与 Text.Parsec.ByteString 中的 parseFromFile 匹配。此版本使用严格的字节字符串。

-- A derivate work from
-- http://stackoverflow.com/questions/4064532/using-parsec-with-data-text

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}

module Text.Parsec.Text
    ( Parser, GenParser, parseFromUtf8File
    ) where

import Text.Parsec.Prim
import qualified Data.Text as T
import qualified Data.ByteString as B
import Data.Text.Encoding
import Text.Parsec.Error

instance (Monad m) => Stream T.Text m Char where
    uncons = return . T.uncons

type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st

-- | @parseFromUtf8File p filePath@ runs a strict bytestring parser
-- @p@ on the input read from @filePath@ using
-- 'ByteString.readFile'. Returns either a 'ParseError' ('Left') or a
-- value of type @a@ ('Right').
--
-- >  main    = do{ result <- parseFromFile numbers "digits.txt"
-- >              ; case result of
-- >                  Left err  -> print err
-- >                  Right xs  -> print (sum xs)
-- >              }
parseFromUtf8File :: Parser a -> String -> IO (Either ParseError a)
parseFromUtf8File p fname = do 
  raw <- B.readFile fname
  let input = decodeUtf8 raw
  return (runP p () fname input)

I added a function parseFromUtf8File to help reading UTF-8 encoded files in an efficient fashion. Works flawlessly with umlaut characters. Function type matches parseFromFile from Text.Parsec.ByteString. This version uses strict ByteStrings.

-- A derivate work from
-- http://stackoverflow.com/questions/4064532/using-parsec-with-data-text

{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}

module Text.Parsec.Text
    ( Parser, GenParser, parseFromUtf8File
    ) where

import Text.Parsec.Prim
import qualified Data.Text as T
import qualified Data.ByteString as B
import Data.Text.Encoding
import Text.Parsec.Error

instance (Monad m) => Stream T.Text m Char where
    uncons = return . T.uncons

type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st

-- | @parseFromUtf8File p filePath@ runs a strict bytestring parser
-- @p@ on the input read from @filePath@ using
-- 'ByteString.readFile'. Returns either a 'ParseError' ('Left') or a
-- value of type @a@ ('Right').
--
-- >  main    = do{ result <- parseFromFile numbers "digits.txt"
-- >              ; case result of
-- >                  Left err  -> print err
-- >                  Right xs  -> print (sum xs)
-- >              }
parseFromUtf8File :: Parser a -> String -> IO (Either ParseError a)
parseFromUtf8File p fname = do 
  raw <- B.readFile fname
  let input = decodeUtf8 raw
  return (runP p () fname input)

回复收藏 0 原文

~没有更多了~