FParsec 中的同步匹配

发布于 2024-12-21 11:41:36 字数 1727 浏览 3 评论 0原文

如果我尝试将以下内容解析为 lines 和 fields。行由 '\n' 分隔，字段由 '|' 分隔。

abcd|efgh|ijkl
mnopq\|rst|uvwxy
za|bcd
efg|hijk|lmnop

我可以定义以下内容：

let displayCharacter = satisfy (fun c -> ' ' <= c && c <= '~')
let escapedDC = pchar '\\' >>. displayCharacter
let test1 = 
    run (manyChars (escapedDC <|> displayCharacter)) "asdf\|efgh|ijkl"
    // Success: "asdf|efgh|ijkl"

但是 let fields = sepBy (manyChars (escapedDC <|> displayCharacter)) (pchar '|') 无法排除 '|' 来自现场。这些分隔符是上下文相关的，因此我希望避免将它们硬编码到 displayCharacter 中，因为 '|' 是显示字符，但可能需要在某些上下文中转义。

如果我尝试使用 manyCharsTill 定义单个 field，那么我需要使用 anyOf "|\n" 来解释行上的最后一个元素>，但这会将所有行读入一行行。

除了 '|' 之外，我可能还有在某些上下文中受支持的更多子分隔符。因此，必须为每种情况定义 displayCharacter 和 escapedDC 的版本似乎很混乱。相反，使用前瞻功能似乎更干净。或者可能是一个名为 both 的解析器，它以某种方式需要同时在两个解析器上进行匹配。

manyCharsSepBy (escapedDC <|> displayCharacter) (pchar '|')

或者

let contextualDisplayCharacter1 = both displayCharacter (satisfy ((<>) '|'))

有更简单的方法来完成这个任务吗？也许这只是我隐含的 BNF 有缺陷——如果修复了，就可以很容易地翻译？

============

这是我能想到的最好的方法，但我想向专家了解这是否是最灵活的方法。

let displayCharacter (excludeDelimiters : string) = satisfy (fun c -> ' ' <= c && c <= '~' && not (Seq.exists ((=) c) excludeDelimiters))
let escapedDisplayCharacter = pchar '\\' >>. displayCharacter ""

let field = 
    manyChars (escapedDisplayCharacter <|> displayCharacter "|")

原文

If I'm trying to parse the following into lines and fields. Lines are delimited by '\n' and fields are delimited by '|'.

abcd|efgh|ijkl
mnopq\|rst|uvwxy
za|bcd
efg|hijk|lmnop

I can define the following:

let displayCharacter = satisfy (fun c -> ' ' <= c && c <= '~')
let escapedDC = pchar '\\' >>. displayCharacter
let test1 = 
    run (manyChars (escapedDC <|> displayCharacter)) "asdf\|efgh|ijkl"
    // Success: "asdf|efgh|ijkl"

But let fields = sepBy (manyChars (escapedDC <|> displayCharacter)) (pchar '|') cannot work to exclude the '|' from the field. These delimiters are context sensitive, so I want to avoid hard-coding them into displayCharacter since '|' is a display character, but just might need escaping in certain contexts.

If I try to define a single field with manyCharsTill, then I need to account for the final element on a line with anyOf "|\n", but this reads in all of the lines into one line.

I may have further subdelimiters beyond '|' that are supported in certain contexts. For this reason, it seems messy to have to define versions of displayCharacter and escapedDC for every case. Rather, using lookahead features seems cleaner. Or perhaps a parser called both which somehow requires a match on two parsers simultaneously.

manyCharsSepBy (escapedDC <|> displayCharacter) (pchar '|')

let contextualDisplayCharacter1 = both displayCharacter (satisfy ((<>) '|'))

Is there an easier way to accomplish this? Perhaps it is just my implied BNF that is flawed - that if fixed, would translate easily?

============

This is the best I can come up with, but I would like to know from the experts if it is the most flexible way.

let displayCharacter (excludeDelimiters : string) = satisfy (fun c -> ' ' <= c && c <= '~' && not (Seq.exists ((=) c) excludeDelimiters))
let escapedDisplayCharacter = pchar '\\' >>. displayCharacter ""

let field = 
    manyChars (escapedDisplayCharacter <|> displayCharacter "|")

分享到QQ

分享到微博