FParsec 中的同步匹配
如果我尝试将以下内容解析为 lines
和 fields
。行由 '\n'
分隔,字段由 '|'
分隔。
abcd|efgh|ijkl
mnopq\|rst|uvwxy
za|bcd
efg|hijk|lmnop
我可以定义以下内容:
let displayCharacter = satisfy (fun c -> ' ' <= c && c <= '~')
let escapedDC = pchar '\\' >>. displayCharacter
let test1 =
run (manyChars (escapedDC <|> displayCharacter)) "asdf\|efgh|ijkl"
// Success: "asdf|efgh|ijkl"
但是 let fields = sepBy (manyChars (escapedDC <|> displayCharacter)) (pchar '|')
无法排除 '|'
来自现场。这些分隔符是上下文相关的,因此我希望避免将它们硬编码到 displayCharacter
中,因为 '|'
是显示字符,但可能需要在某些上下文中转义。
如果我尝试使用 manyCharsTill
定义单个 field
,那么我需要使用 anyOf "|\n"
来解释行上的最后一个元素>,但这会将所有行读入一行行
。
除了 '|'
之外,我可能还有在某些上下文中受支持的更多子分隔符。因此,必须为每种情况定义 displayCharacter 和 escapedDC 的版本似乎很混乱。相反,使用前瞻功能似乎更干净。或者可能是一个名为 both
的解析器,它以某种方式需要同时在两个解析器上进行匹配。
manyCharsSepBy (escapedDC <|> displayCharacter) (pchar '|')
或者
let contextualDisplayCharacter1 = both displayCharacter (satisfy ((<>) '|'))
有更简单的方法来完成这个任务吗?也许这只是我隐含的 BNF 有缺陷——如果修复了,就可以很容易地翻译?
============
这是我能想到的最好的方法,但我想向专家了解这是否是最灵活的方法。
let displayCharacter (excludeDelimiters : string) = satisfy (fun c -> ' ' <= c && c <= '~' && not (Seq.exists ((=) c) excludeDelimiters))
let escapedDisplayCharacter = pchar '\\' >>. displayCharacter ""
let field =
manyChars (escapedDisplayCharacter <|> displayCharacter "|")
If I'm trying to parse the following into lines
and fields
. Lines are delimited by '\n'
and fields are delimited by '|'
.
abcd|efgh|ijkl
mnopq\|rst|uvwxy
za|bcd
efg|hijk|lmnop
I can define the following:
let displayCharacter = satisfy (fun c -> ' ' <= c && c <= '~')
let escapedDC = pchar '\\' >>. displayCharacter
let test1 =
run (manyChars (escapedDC <|> displayCharacter)) "asdf\|efgh|ijkl"
// Success: "asdf|efgh|ijkl"
But let fields = sepBy (manyChars (escapedDC <|> displayCharacter)) (pchar '|')
cannot work to exclude the '|'
from the field. These delimiters are context sensitive, so I want to avoid hard-coding them into displayCharacter
since '|'
is a display character, but just might need escaping in certain contexts.
If I try to define a single field
with manyCharsTill
, then I need to account for the final element on a line with anyOf "|\n"
, but this reads in all of the lines into one line
.
I may have further subdelimiters beyond '|'
that are supported in certain contexts. For this reason, it seems messy to have to define versions of displayCharacter and escapedDC for every case. Rather, using lookahead features seems cleaner. Or perhaps a parser called both
which somehow requires a match on two parsers simultaneously.
manyCharsSepBy (escapedDC <|> displayCharacter) (pchar '|')
or
let contextualDisplayCharacter1 = both displayCharacter (satisfy ((<>) '|'))
Is there an easier way to accomplish this? Perhaps it is just my implied BNF that is flawed - that if fixed, would translate easily?
============
This is the best I can come up with, but I would like to know from the experts if it is the most flexible way.
let displayCharacter (excludeDelimiters : string) = satisfy (fun c -> ' ' <= c && c <= '~' && not (Seq.exists ((=) c) excludeDelimiters))
let escapedDisplayCharacter = pchar '\\' >>. displayCharacter ""
let field =
manyChars (escapedDisplayCharacter <|> displayCharacter "|")
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论