相当于 attoparsecs `inClass` 的秒差距

发布于 2024-12-22 22:25:32 字数 410 浏览 1 评论 0原文

我正在将一些代码从 attoparsec 转换为 Parsec,因为解析器需要生成更好的错误消息。 attoparsec 代码广泛使用 inClass (和 notInClass)。 Parsec 是否有类似的函数可以让我机械地翻译 inClass 事件? Hayoo 和 Hoogle 没有对此事提供任何见解。

inClass :: String -> Char -> Bool

inClass "ac'-)0-3-" 相当于 \ x -> elem x "abc'()0123-",但后者对于大范围编写效率低下且繁琐。

如果没有其他可用的功能,我会自己重新实现该功能。

I am translating some code from attoparsec to Parsec, because the parser needs to produce better error messages. The attoparsec code uses inClass (and notInClass) extensively. Is there a similar function for Parsec that lets me translate inClass-occurences mechanically? Hayoo and Hoogle didn't offer any insight into the matter.

inClass :: String -> Char -> Bool

inClass "a-c'-)0-3-" is equivalent to \ x -> elem x "abc'()0123-", but the latter is inefficient and tedious to write for large ranges.

I will reimplement the function myself if nothing else is available.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

野却迷人 2024-12-29 22:25:32

不存在这样的组合器;如果有,它将位于 Text 中.Parsec.Char(这是定义所有涉及 Char 的标准解析器组合器函数的地方)。您应该能够相当容易地定义它。

我认为您无法获得 attoparsec 与 其实现,不过;它依赖于内部 FastSet 类型,该类型仅适用于 8 位字符。当然,如果您不需要 Unicode 支持,那可能不是问题,但是 FastSet 的代码意味着您将得到不可预测的结果,传递的字符大于'\255',因此,如果您想重用基于 FastSet 的解决方案,您至少必须读取在 二进制模式。 (您还必须将 FastSet 的实现复制到您的程序中,因为它没有导出...)

如果您的范围字符串很短,那么像这样的简单解决方案可能会非常快:

type Range = (Char, Char)

inClass :: String -> Char -> Bool
inClass = inClass' . parseClass

parseClass :: String -> [Range]
parseClass "" = []
parseClass (a:'-':b:xs) = (a, b) : parseClass xs
parseClass (x:xs) = (x, x) : parseClass xs

inClass' :: [Range] -> Char -> Bool
inClass' cls c = any (\(a,b) -> c >= a && c <= b) cls

您甚至可以尝试这样的方法,它应该至少与上述版本一样高效(包括对单个 inClass 进行多次调用时),此外避免列表遍历开销:(

inClass :: String -> Char -> Bool
inClass "" = const False
inClass (a:'-':b:xs) = \c -> (c >= a && c <= b) || f c where f = inClass xs
inClass (x:xs) = \c -> c == x || f c where f = inClass xs

注意移动lambda 的递归;我不知道 GHC 是否可以/将会这样做。)

There isn't any such combinator; if there was, it would be in Text.Parsec.Char (which is where all the standard parser combinator functions that involve Char are defined). You should be able to define it fairly easily.

I don't think you'll be able to get the same performance advantages attoparsec does with its implementation, though; it relies on the internal FastSet type, which only works with 8-bit characters. Of course, if you don't need Unicode support, that might not be a problem, but the code for FastSet implies you'll get unpredictable results passing Chars greater than '\255', so if you want to reuse the FastSet-based solution, you'll at least have to read the strings you're parsing in binary mode. (You'll also have to copy the implementation of FastSet into your program, as it's not exported...)

If your range strings are short, then a simple solution like this is likely to be pretty fast:

type Range = (Char, Char)

inClass :: String -> Char -> Bool
inClass = inClass' . parseClass

parseClass :: String -> [Range]
parseClass "" = []
parseClass (a:'-':b:xs) = (a, b) : parseClass xs
parseClass (x:xs) = (x, x) : parseClass xs

inClass' :: [Range] -> Char -> Bool
inClass' cls c = any (\(a,b) -> c >= a && c <= b) cls

You could even try something like this, which should be at least as efficient as the above version (including when many calls to a single inClass s are made), and additionally avoid the list traversal overhead:

inClass :: String -> Char -> Bool
inClass "" = const False
inClass (a:'-':b:xs) = \c -> (c >= a && c <= b) || f c where f = inClass xs
inClass (x:xs) = \c -> c == x || f c where f = inClass xs

(taking care to move the recursion out of the lambda; I don't know if GHC can/will do this itself.)

离鸿 2024-12-29 22:25:32

不,没有相当于秒差距的单位。你必须自己写。我看到两个主要选项,

  1. 解析 inClass 语法以从中创建 String,与 oneOf 一起解析
  2. 它以创建要传递的函数满足

前者当然是后者的特例,如果你的类中有较长的范围,效率就会降低。但实施起来可能更容易一些。

(|||) :: (a -> Bool) -> (a -> Bool) -> a -> Bool
p ||| q = \x -> p x || q x
(&&&) :: (a -> Bool) -> (a -> Bool) -> a -> Bool
p &&& q = \x -> p x && q x

parseClass (l:'-':h:more) = ((>= l) &&& (<= h)) ||| parseClass more
parseClass (c:cs) = (== c) ||| parseClass cs
parseClass [] = const False

是一种头脑简单的可能性。

No, there's no equivalent in parsec. You have to write it yourself. I see two main options,

  1. parse the inClass syntax to create a String from it, to use with oneOf
  2. parse it to create a function to pass to satisfy

the former is of course a special case of the latter, and if you have longer ranges in your class, it will be less efficient. But it's probably a bit easier to implement.

(|||) :: (a -> Bool) -> (a -> Bool) -> a -> Bool
p ||| q = \x -> p x || q x
(&&&) :: (a -> Bool) -> (a -> Bool) -> a -> Bool
p &&& q = \x -> p x && q x

parseClass (l:'-':h:more) = ((>= l) &&& (<= h)) ||| parseClass more
parseClass (c:cs) = (== c) ||| parseClass cs
parseClass [] = const False

is a simple-minded possibility.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文