解析和字符集:为什么我的脚本不起作用
我只想提取 attribute1 和 attribute3 值。我不明白为什么 charset 在我的情况下似乎无法“跳过”任何其他属性(attribute3 没有按照我的意愿提取):
content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}
attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]
spacer: charset reduce [tab newline #" "]
letter: complement spacer
to-space: [some letter | end]
attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]
rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]
parse content rule
输出是
>> parse content rule
valueattribute1
none
== true
>>
I want to extract attribute1 and attribute3 values only. I don't understand why charset doesn't seem to work in my case to "skip" any other attributes (attribute3 is not extracted as I would like):
content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}
attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]
spacer: charset reduce [tab newline #" "]
letter: complement spacer
to-space: [some letter | end]
attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]
rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]
parse content rule
output is
>> parse content rule
valueattribute1
none
== true
>>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先,您没有使用
parse/all
。在 Rebol 2 中,这意味着在解析运行之前空格已被有效去除。在 Rebol 3 中情况并非如此:如果您的解析规则采用块格式(正如您在此处所做的那样),则隐含/all
。(注:人们似乎一致认为 Rebol 3 将抛出出非块形式的解析规则,有利于那些“最小”解析场景的
split
函数,这将完全摆脱/all
。不幸的是,尚未对此采取任何行动。)其次,您的代码存在错误,我不会花时间来解决这些错误。 (这主要是因为我认为使用 Rebol 的解析来处理 XML/HTML 是一个相当愚蠢的想法 :P)
但是不要忘记您有一个重要的工具。如果您在解析规则中使用设置字,那么会将解析位置捕获到变量中。然后您可以将其打印出来并查看您所在的位置。将
attribute-rule
中您首先说any letter
的部分更改为pos: (print pos) any letter
,您将看到以下内容:看到前导空格了吗?您在
任何字母
之前的规则将您置于一个空格...既然您说任何字母都可以,那么没有字母都可以,一切都被抛弃了。(注意:Rebol 3 有一个更好的调试工具...单词
??
。当您将其放入解析块时,它会告诉您当前正在处理的标记/规则以及输入的状态。使用此工具,您可以更轻松地了解正在发生的情况:...尽管现在 r3 mac intel 上确实存在错误。)
此外,如果如果您没有使用
copy
,那么您的to X thru X
模式是不必要的,您只需使用thru X
即可实现这一点。如果您想进行复制,也可以使用更简短的copy Y to X X
来完成,或者如果它只是一个符号,您可以编写更清晰的copy Y to Xski
在您发现自己编写重复代码的地方,请记住 Rebol 可以通过使用
compose
等来更进一步:Firstly you're not using
parse/all
. In Rebol 2 that means that whitespace has been effectively stripped out before the parse runs. That's not true in Rebol 3: if your parse rules are in block format (as you are doing here) then/all
is implied.(Note: There seemed to be consensus that Rebol 3 would throw out the non-block form of parse rules, in favor of the
split
function for those "minimal" parse scenarios. That would get rid of/all
entirely. No action has yet been taken on this, unfortunately.)Secondly your code has bugs, which I'm not going to spend time sorting out. (That's mostly because I think using Rebol's parse to process XML/HTML is a fairly silly idea :P)
But don't forget you have an important tool. If you use a set-word in the parse rule, then that will capture the parse position into a variable. You can then print it out and see where you're at. Change the part of
attribute-rule
where you first sayany letter
topos: (print pos) any letter
and you'll see this:See the leading space? Your rules right before the
any letter
put you at a space... and since you said any letter was ok, no letters are fine, and everything's thrown off.(Note: Rebol 3 has an even better debugging tool...the word
??
. When you put it in the parse block it tells you what token/rule you're currently processing as well as the state of the input. With this tool you can more easily find out what's going on:...though it's really buggy on r3 mac intel right now.)
Additionally, if you're not using
copy
then your pattern ofto X thru X
is unnecessary, you can achieve that with justthru X
. If you want to do a copy you can also do that with the briefercopy Y to X X
or if it's just a single symbol you could write the clearercopy Y to X skip
In places where you see yourself writing repetitive code, remember that Rebol can go a step above by using
compose
etc:简短的回答,[任何字母]都会吃掉你的 attribute3="..." 因为 #"^"" 字符根据你的定义是一个“字母”。此外,如果没有 attribute2,那么你的通用第二个属性规则可能会出现问题将吃attribute3并且你的attribute3规则不会有任何匹配的东西 - 最好明确有一个可选的attribute2或一个可选的除attribute3以外的任何东西
此外,'不带/all细化的解析会忽略空格(或者至少是非常)在涉及空格的地方很笨拙) - 强烈建议 /all 用于这种类型的解析。
Short answer, [any letter] eats your attribute3="..." as the #"^"" character is by your definition a 'letter. Additionally, you may have problems where there is no attribute2, then your generic second attribute rule will eat attribute3 and your attribute3 rule will not have anything to match - better to either be explicit that there is an optional attribute2 or an optional anything-but-attribute3
Also, 'parse without the /all refinement ignores spaces (or at least is very unwieldy where spaces are concerned) - /all is highly recommended for this type of parsing.
当添加 parse/all 时,它似乎没有改变任何东西。最后这似乎可行(使用 set-word 确实对调试有很大帮助!),你觉得怎么样?
其输出:
When adding parse/all it didn't seem to change anything. Finally this seems to work (using set-word has been indeed a great help for debugging !!!), what do you think ?
which outputs: