将秒差距与 Data.Text 结合使用
使用 Parsec 3.1
,可以解析多种类型的输入:
[Char]
和Text.Parsec.String
Data.ByteString
与Text.Parsec.ByteString
Data.ByteString.Lazy
与Text.Parsec.ByteString.Lazy
我没有看到任何东西对于Data.Text
模块。我希望解析 Unicode 内容,而不会遭受 String
低效率的困扰。因此,我基于 Text.Parsec.ByteString
模块创建了以下模块:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Text.Parsec.Text
( Parser, GenParser
) where
import Text.Parsec.Prim
import qualified Data.Text as T
instance (Monad m) => Stream T.Text m Char where
uncons = return . T.uncons
type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st
- 这样做有意义吗?
- 它与 Parsec API 的其余部分兼容吗?
附加评论:
我必须在我的解析模块中添加 {-# LANGUAGE NoMonomorphismRestriction #-}
pragma 才能使其工作。
解析 Text
是一回事,用 Text
构建 AST 是另一回事。在返回之前,我还需要打包
我的String
:
module TestText where
import Data.Text as T
import Text.Parsec
import Text.Parsec.Prim
import Text.Parsec.Text
input = T.pack "xxxxxxxxxxxxxxyyyyxxxxxxxxxp"
parser = do
x1 <- many1 (char 'x')
y <- many1 (char 'y')
x2 <- many1 (char 'x')
return (T.pack x1, T.pack y, T.pack x2)
test = runParser parser () "test" input
Using Parsec 3.1
, it is possible to parse several types of inputs:
[Char]
withText.Parsec.String
Data.ByteString
withText.Parsec.ByteString
Data.ByteString.Lazy
withText.Parsec.ByteString.Lazy
I don't see anything for the Data.Text
module. I want to parse Unicode content without suffering from the String
inefficiencies. So I've created the following module based on the Text.Parsec.ByteString
module:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
{-# OPTIONS_GHC -fno-warn-orphans #-}
module Text.Parsec.Text
( Parser, GenParser
) where
import Text.Parsec.Prim
import qualified Data.Text as T
instance (Monad m) => Stream T.Text m Char where
uncons = return . T.uncons
type Parser = Parsec T.Text ()
type GenParser t st = Parsec T.Text st
- Does it make sense to do so?
- It this compatible with the rest of the Parsec API?
Additional comments:
I had to add {-# LANGUAGE NoMonomorphismRestriction #-}
pragma in my parse modules to make it work.
Parsing Text
is one thing, building an AST with Text
is another thing. I will also need to pack
my String
before return:
module TestText where
import Data.Text as T
import Text.Parsec
import Text.Parsec.Prim
import Text.Parsec.Text
input = T.pack "xxxxxxxxxxxxxxyyyyxxxxxxxxxp"
parser = do
x1 <- many1 (char 'x')
y <- many1 (char 'y')
x2 <- many1 (char 'x')
return (T.pack x1, T.pack y, T.pack x2)
test = runParser parser () "test" input
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
自 Parsec 3.1.2 起,内置了对 Data.Text 的支持!
请参阅http://hackage.haskell.org/package/parsec-3.1.2
如果您坚持使用旧版本,其他答案中的代码片段也很有帮助。
Since Parsec 3.1.2 support of Data.Text is built-in!
See http://hackage.haskell.org/package/parsec-3.1.2
If you are stuck with older version, the code snippets in other answers are helpful, too.
这看起来正是您需要做的。
它应该与 Parsec 的其余部分兼容,包括 Parsec.Char 解析器。
如果您使用 Cabal 来构建程序,请在程序包描述中添加 parsec-3.1 的上限,以防维护者决定将该实例包含在 Parsec 的未来版本中。
That looks like exactly what you need to do.
It should be compatible with the rest of Parsec, include the Parsec.Char parsers.
If you're using Cabal to build your program, please put an upper bound of parsec-3.1 in your package description, in case the maintainer decides to include that instance in a future version of Parsec.
我添加了一个函数
parseFromUtf8File
来帮助以高效的方式读取 UTF-8 编码的文件。与元音变音字符完美配合。函数类型与Text.Parsec.ByteString
中的parseFromFile
匹配。此版本使用严格的字节字符串。I added a function
parseFromUtf8File
to help reading UTF-8 encoded files in an efficient fashion. Works flawlessly with umlaut characters. Function type matchesparseFromFile
fromText.Parsec.ByteString
. This version uses strict ByteStrings.