使用语法在没有前瞻的情况下解析字符串?
得到了此文本:
想要此||不是这个
该行看起来也可能像这样:
想要此|
不是这个管道的
。我正在使用这种语法来解析它:
grammar HC {
token TOP { <pre> <divider> <post> }
token pre { \N*? <?before <divider>> }
token divider { <[|]> ** 1..2 }
token post { \N* }
}
有没有更好的方法来做到这一点?我很想能够做更多这样的事情:
grammar HC {
token TOP { <pre> <divider> <post> }
token pre { \N*? }
token divider { <[|]> ** 1..2 }
token post { \N* }
}
但这是行不通的。如果我这样做:
grammar HC {
token TOP { <pre>* <divider> <post> }
token pre { \N }
token divider { <[|]> ** 1..2 } }
token post { \N* }
}
Divider之前的每个字符都会获得自己的&lt; pre&gt;
捕获。谢谢。
Got this text:
Want this || Not this
The line may also look like this:
Want this | Not this
with a single pipe.
I'm using this grammar to parse it:
grammar HC {
token TOP { <pre> <divider> <post> }
token pre { \N*? <?before <divider>> }
token divider { <[|]> ** 1..2 }
token post { \N* }
}
Is there a better way to do this? I'd love to be able to do something more like this:
grammar HC {
token TOP { <pre> <divider> <post> }
token pre { \N*? }
token divider { <[|]> ** 1..2 }
token post { \N* }
}
But this does not work. And if I do this:
grammar HC {
token TOP { <pre>* <divider> <post> }
token pre { \N }
token divider { <[|]> ** 1..2 } }
token post { \N* }
}
Each character before divider gets its own <pre>
capture. Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一如既往,蒂姆托威迪。
你也可以。只需将前两个规则声明从
token
切换为regex
:这有效,因为
regex
禁用:ratchet
(与启用它的token
和rule
不同)。(解释为什么你需要为这两条规则关闭它超出了我的工资水平,当然是今晚,并且很可能直到其他人向我解释原因,这样我就可以假装我一直都知道。)
默认情况下,“调用命名正则表达式会安装同名的命名捕获” [...后面几句话:]“如果不需要捕获,则前导点或与号将抑制它”。因此,将
接下来,您可以手动通过包装
$=[pattern]
中的模式。因此,要捕获与pre
规则的连续调用匹配的整个字符串,请包装非捕获模式 (<.pre> ;*?
) 在$
As always, TIMTOWTDI.
You can. Just switch the first two rule declarations from
token
toregex
:This works because
regex
disables:ratchet
(unliketoken
andrule
which enable it).(Explaining why you need to switch it off for both rules is beyond my paygrade, certainly for tonight, and quite possibly till someone else explains why to me so I can pretend I knew all along.)
By default, "calling a named regex installs a named capture with the same name" [... couple sentences later:] "If no capture is desired, a leading dot or ampersand will suppress it". So change
<pre>
to<.pre>
.Next, you can manually add a named capture by wrapping a pattern in
$<name>=[pattern]
. So to capture the whole string matched by consecutive calls of thepre
rule, wrap the non-capturing pattern (<.pre>*?
) in$<pre>=[...]
):好的 - 我尝试了
使用rammar :: tracer;
(我们最好的朋友!),并从您的原始答案和第一个答案中得到了这一点...这两个都对我来说都是错误的...这给了我感觉到您的前部和分隔线的组合并没有融合。因此,我将代码更改为此(对PRE的定义更为明确)...
并得到了……
SOOO-我得出的结论是,(i)使用Grammar :: Tracer检查语法的操作是必须做的(( ii)像原件一样的宽松定义要求解析器在每个炭边界上进行测试,(iii)尤其是如果分隔线难以固定,
我的 感觉语法(解析器)可能不太适合基础的原始数据结构,并且一组Regexes可能是一种更好的方法。
我无法确定如何使用
&lt; .ws&gt;
或等效地从捕获的结果中修剪空白空间。OK - I tried
use Grammar::Tracer;
(our best friend!) and got this from your original and the first answer with regexes ... both seemed wrong to me...This gives me the feeling that your combination of pre and divider are not converging. So I altered the code to this (with a more definitive definition of pre)...
and got this...
Sooo - I conclude that (i) using Grammar::Tracer to inspect the operation of Grammars is a must do, (ii) a loose definition like the original requires the parser to test on every char boundary should be avoided, (iii) especially if the divider is hard to pin down
I have the wider feeling that a Grammar (parser) may not be well suited to the underlying raw data structure and that a set of regexes may be a better approach.
I failed to work out how to use
<.ws>
or equivalent to trim the empty spaces from the captured results.