解析和字符集:为什么我的脚本不起作用

发布于 2024-08-24 15:01:22 字数 1121 浏览 4 评论 0原文

我只想提取 attribute1 和 attribute3 值。我不明白为什么 charset 在我的情况下似乎无法“跳过”任何其他属性(attribute3 没有按照我的意愿提取):

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]

spacer: charset reduce [tab newline #" "]
letter: complement spacer 
to-space: [some letter | end]

attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

输出是

>> parse content rule
valueattribute1
none
== true
>>

I want to extract attribute1 and attribute3 values only. I don't understand why charset doesn't seem to work in my case to "skip" any other attributes (attribute3 is not extracted as I would like):

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]

spacer: charset reduce [tab newline #" "]
letter: complement spacer 
to-space: [some letter | end]

attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

output is

>> parse content rule
valueattribute1
none
== true
>>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

秋叶绚丽 2024-08-31 15:01:22

首先,您没有使用parse/all。在 Rebol 2 中,这意味着在解析运行之前空格已被有效去除。在 Rebol 3 中情况并非如此:如果您的解析规则采用块格式(正如您在此处所做的那样),则隐含 /all

(注:人们似乎一致认为 Rebol 3 将抛出出非块形式的解析规则,有利于那些“最小”解析场景的 split 函数,这将完全摆脱 /all 。不幸的是,尚未对此采取任何行动。)

其次,您的代码存在错误,我不会花时间来解决这些错误。 (这主要是因为我认为使用 Rebol 的解析来处理 XML/HTML 是一个相当愚蠢的想法 :P)

但是不要忘记您有一个重要的工具。如果您在解析规则中使用设置字,那么会将解析位置捕获到变量中。然后您可以将其打印出来并查看您所在的位置。将 attribute-rule 中您首先说 any letter 的部分更改为 pos: (print pos) any letter,您将看到以下内容:

>> parse/all content rule
 attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>

valueattribute1
none
== true

看到前导空格了吗?您在任何字母之前的规则将您置于一个空格...既然您说任何字母都可以,那么没有字母都可以,一切都被抛弃了。

(注意:Rebol 3 有一个更好的调试工具...单词 ??。当您将其放入解析块时,它会告诉您当前正在处理的标记/规则以及输入的状态。使用此工具,您可以更轻松地了解正在发生的情况:

>> parse "hello world" ["hello" ?? space ?? "world"]
space: " world"
"world": "world"
== true

...尽管现在 r3 mac intel 上确实存在错误。)

此外,如果如果您没有使用copy,那么您的to X thru X模式是不必要的,您只需使用thru X即可实现这一点。如果您想进行复制,也可以使用更简短的 copy Y to X X 来完成,或者如果它只是一个符号,您可以编写更清晰的 copy Y to Xski

在您发现自己编写重复代码的地方,请记住 Rebol 可以通过使用 compose 等来更进一步:

>> temp: [thru (rejoin [{attribute} num {=}]) 
          copy (to-word rejoin [{valueattribute} num]) to {"} thru {"}]

>> num: 1
>> attribute1: compose temp
== [thru "attribute1=" copy valueattribute1 to {"} thru {"}]

>> num: 2
>> attribute2: compose temp
== [thru "attribute2=" copy valueattribute2 to {"} thru {"}]

Firstly you're not using parse/all. In Rebol 2 that means that whitespace has been effectively stripped out before the parse runs. That's not true in Rebol 3: if your parse rules are in block format (as you are doing here) then /all is implied.

(Note: There seemed to be consensus that Rebol 3 would throw out the non-block form of parse rules, in favor of the split function for those "minimal" parse scenarios. That would get rid of /all entirely. No action has yet been taken on this, unfortunately.)

Secondly your code has bugs, which I'm not going to spend time sorting out. (That's mostly because I think using Rebol's parse to process XML/HTML is a fairly silly idea :P)

But don't forget you have an important tool. If you use a set-word in the parse rule, then that will capture the parse position into a variable. You can then print it out and see where you're at. Change the part of attribute-rule where you first say any letter to pos: (print pos) any letter and you'll see this:

>> parse/all content rule
 attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>

valueattribute1
none
== true

See the leading space? Your rules right before the any letter put you at a space... and since you said any letter was ok, no letters are fine, and everything's thrown off.

(Note: Rebol 3 has an even better debugging tool...the word ??. When you put it in the parse block it tells you what token/rule you're currently processing as well as the state of the input. With this tool you can more easily find out what's going on:

>> parse "hello world" ["hello" ?? space ?? "world"]
space: " world"
"world": "world"
== true

...though it's really buggy on r3 mac intel right now.)

Additionally, if you're not using copy then your pattern of to X thru X is unnecessary, you can achieve that with just thru X. If you want to do a copy you can also do that with the briefer copy Y to X X or if it's just a single symbol you could write the clearer copy Y to X skip

In places where you see yourself writing repetitive code, remember that Rebol can go a step above by using compose etc:

>> temp: [thru (rejoin [{attribute} num {=}]) 
          copy (to-word rejoin [{valueattribute} num]) to {"} thru {"}]

>> num: 1
>> attribute1: compose temp
== [thru "attribute1=" copy valueattribute1 to {"} thru {"}]

>> num: 2
>> attribute2: compose temp
== [thru "attribute2=" copy valueattribute2 to {"} thru {"}]
贱人配狗天长地久 2024-08-31 15:01:22

简短的回答,[任何字母]都会吃掉你的 attribute3="..." 因为 #"^"" 字符根据你的定义是一个“字母”。此外,如果没有 attribute2,那么你的通用第二个属性规则可能会出现问题将吃attribute3并且你的attribute3规则不会有任何匹配的东西 - 最好明确有一个可选的attribute2或一个可选的除attribute3以外的任何东西

attribute1="foo"       attribute2="bar" attribute3="foobar" 
<- attribute1="..." -> <-     any letter                 -> <- attibute3="..." ->

此外,'不带/all细化的解析会忽略空格(或者至少是非常)在涉及空格的地方很笨拙) - 强烈建议 /all 用于这种类型的解析。

Short answer, [any letter] eats your attribute3="..." as the #"^"" character is by your definition a 'letter. Additionally, you may have problems where there is no attribute2, then your generic second attribute rule will eat attribute3 and your attribute3 rule will not have anything to match - better to either be explicit that there is an optional attribute2 or an optional anything-but-attribute3

attribute1="foo"       attribute2="bar" attribute3="foobar" 
<- attribute1="..." -> <-     any letter                 -> <- attibute3="..." ->

Also, 'parse without the /all refinement ignores spaces (or at least is very unwieldy where spaces are concerned) - /all is highly recommended for this type of parsing.

我乃一代侩神 2024-08-31 15:01:22

当添加 parse/all 时,它似乎没有改变任何东西。最后这似乎可行(使用 set-word 确实对调试有很大帮助!),你觉得怎么样?

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [to {attribute1="} thru {attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [to {attribute3="} thru {attribute3="} copy valueattribute3 to {"} thru {"}]

letter: charset reduce ["ABCDEFGHIJKLMNOPQRSTUabcdefghijklmnopqrstuvwxyz1234567890="]

attributes-rule: [(valueattribute1: none valueattribute3: none) 
[attribute1 | none] any letter pos: 
[attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

其输出:

>> parse/all content rule
valueattribute1
valueattribute3
valueattribute11
none
== true
>>

When adding parse/all it didn't seem to change anything. Finally this seems to work (using set-word has been indeed a great help for debugging !!!), what do you think ?

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [to {attribute1="} thru {attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [to {attribute3="} thru {attribute3="} copy valueattribute3 to {"} thru {"}]

letter: charset reduce ["ABCDEFGHIJKLMNOPQRSTUabcdefghijklmnopqrstuvwxyz1234567890="]

attributes-rule: [(valueattribute1: none valueattribute3: none) 
[attribute1 | none] any letter pos: 
[attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

which outputs:

>> parse/all content rule
valueattribute1
valueattribute3
valueattribute11
none
== true
>>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文