解析和字符集：为什么我的脚本不起作用

发布于 2024-08-24 15:01:22 字数 1121 浏览 4 评论 0原文

我只想提取 attribute1 和 attribute3 值。我不明白为什么 charset 在我的情况下似乎无法“跳过”任何其他属性（attribute3 没有按照我的意愿提取）：

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]

spacer: charset reduce [tab newline #" "]
letter: complement spacer 
to-space: [some letter | end]

attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

输出是

>> parse content rule
valueattribute1
none
== true
>>

原文

I want to extract attribute1 and attribute3 values only. I don't understand why charset doesn't seem to work in my case to "skip" any other attributes (attribute3 is not extracted as I would like):

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [{attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [{attribute3="} copy valueattribute3 to {"} thru {"}]

spacer: charset reduce [tab newline #" "]
letter: complement spacer 
to-space: [some letter | end]

attributes-rule: [(valueattribute1: none valueattribute3: none) [attribute1 | none] any letter [attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

output is

>> parse content rule
valueattribute1
none
== true
>>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

秋叶绚丽 2024-08-31 15:01:22

首先，您没有使用parse/all。在 Rebol 2 中，这意味着在解析运行之前空格已被有效去除。在 Rebol 3 中情况并非如此：如果您的解析规则采用块格式（正如您在此处所做的那样），则隐含 /all 。

（注：人们似乎一致认为 Rebol 3 将抛出出非块形式的解析规则，有利于那些“最小”解析场景的 split 函数，这将完全摆脱 /all 。不幸的是，尚未对此采取任何行动。）

其次，您的代码存在错误，我不会花时间来解决这些错误。（这主要是因为我认为使用 Rebol 的解析来处理 XML/HTML 是一个相当愚蠢的想法 :P）

但是不要忘记您有一个重要的工具。如果您在解析规则中使用设置字，那么会将解析位置捕获到变量中。然后您可以将其打印出来并查看您所在的位置。将 attribute-rule 中您首先说 any letter 的部分更改为 pos: (print pos) any letter，您将看到以下内容：

>> parse/all content rule
 attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>

valueattribute1
none
== true

看到前导空格了吗？您在任何字母之前的规则将您置于一个空格...既然您说任何字母都可以，那么没有字母都可以，一切都被抛弃了。

（注意：Rebol 3 有一个更好的调试工具...单词 ??。当您将其放入解析块时，它会告诉您当前正在处理的标记/规则以及输入的状态。使用此工具，您可以更轻松地了解正在发生的情况：

>> parse "hello world" ["hello" ?? space ?? "world"]
space: " world"
"world": "world"
== true

...尽管现在 r3 mac intel 上确实存在错误。）

此外，如果如果您没有使用copy，那么您的to X thru X模式是不必要的，您只需使用thru X即可实现这一点。如果您想进行复制，也可以使用更简短的 copy Y to X X 来完成，或者如果它只是一个符号，您可以编写更清晰的 copy Y to Xski

在您发现自己编写重复代码的地方，请记住 Rebol 可以通过使用 compose 等来更进一步：

>> temp: [thru (rejoin [{attribute} num {=}]) 
          copy (to-word rejoin [{valueattribute} num]) to {"} thru {"}]

>> num: 1
>> attribute1: compose temp
== [thru "attribute1=" copy valueattribute1 to {"} thru {"}]

>> num: 2
>> attribute2: compose temp
== [thru "attribute2=" copy valueattribute2 to {"} thru {"}]

Firstly you're not using parse/all. In Rebol 2 that means that whitespace has been effectively stripped out before the parse runs. That's not true in Rebol 3: if your parse rules are in block format (as you are doing here) then /all is implied.

(Note: There seemed to be consensus that Rebol 3 would throw out the non-block form of parse rules, in favor of the split function for those "minimal" parse scenarios. That would get rid of /all entirely. No action has yet been taken on this, unfortunately.)

Secondly your code has bugs, which I'm not going to spend time sorting out. (That's mostly because I think using Rebol's parse to process XML/HTML is a fairly silly idea :P)

But don't forget you have an important tool. If you use a set-word in the parse rule, then that will capture the parse position into a variable. You can then print it out and see where you're at. Change the part of attribute-rule where you first say any letter to pos: (print pos) any letter and you'll see this:

>> parse/all content rule
 attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>

valueattribute1
none
== true

See the leading space? Your rules right before the any letter put you at a space... and since you said any letter was ok, no letters are fine, and everything's thrown off.

(Note: Rebol 3 has an even better debugging tool...the word ??. When you put it in the parse block it tells you what token/rule you're currently processing as well as the state of the input. With this tool you can more easily find out what's going on:

>> parse "hello world" ["hello" ?? space ?? "world"]
space: " world"
"world": "world"
== true

...though it's really buggy on r3 mac intel right now.)

Additionally, if you're not using copy then your pattern of to X thru X is unnecessary, you can achieve that with just thru X. If you want to do a copy you can also do that with the briefer copy Y to X X or if it's just a single symbol you could write the clearer copy Y to X skip

In places where you see yourself writing repetitive code, remember that Rebol can go a step above by using compose etc:

>> temp: [thru (rejoin [{attribute} num {=}]) 
          copy (to-word rejoin [{valueattribute} num]) to {"} thru {"}]

>> num: 1
>> attribute1: compose temp
== [thru "attribute1=" copy valueattribute1 to {"} thru {"}]

>> num: 2
>> attribute2: compose temp
== [thru "attribute2=" copy valueattribute2 to {"} thru {"}]

回复收藏 0 原文

贱人配狗天长地久 2024-08-31 15:01:22

简短的回答，[任何字母]都会吃掉你的 attribute3="..." 因为 #"^"" 字符根据你的定义是一个“字母”。此外，如果没有 attribute2，那么你的通用第二个属性规则可能会出现问题将吃attribute3并且你的attribute3规则不会有任何匹配的东西 - 最好明确有一个可选的attribute2或一个可选的除attribute3以外的任何东西

attribute1="foo"       attribute2="bar" attribute3="foobar" 
<- attribute1="..." -> <-     any letter                 -> <- attibute3="..." ->

此外，'不带/all细化的解析会忽略空格（或者至少是非常）在涉及空格的地方很笨拙） - 强烈建议 /all 用于这种类型的解析。

Short answer, [any letter] eats your attribute3="..." as the #"^"" character is by your definition a 'letter. Additionally, you may have problems where there is no attribute2, then your generic second attribute rule will eat attribute3 and your attribute3 rule will not have anything to match - better to either be explicit that there is an optional attribute2 or an optional anything-but-attribute3

attribute1="foo"       attribute2="bar" attribute3="foobar" 
<- attribute1="..." -> <-     any letter                 -> <- attibute3="..." ->

Also, 'parse without the /all refinement ignores spaces (or at least is very unwieldy where spaces are concerned) - /all is highly recommended for this type of parsing.

回复收藏 0 原文

我乃一代侩神 2024-08-31 15:01:22

当添加 parse/all 时，它似乎没有改变任何东西。最后这似乎可行（使用 set-word 确实对调试有很大帮助！），你觉得怎么样？

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [to {attribute1="} thru {attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [to {attribute3="} thru {attribute3="} copy valueattribute3 to {"} thru {"}]

letter: charset reduce ["ABCDEFGHIJKLMNOPQRSTUabcdefghijklmnopqrstuvwxyz1234567890="]

attributes-rule: [(valueattribute1: none valueattribute3: none) 
[attribute1 | none] any letter pos: 
[attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

其输出：

>> parse/all content rule
valueattribute1
valueattribute3
valueattribute11
none
== true
>>

When adding parse/all it didn't seem to change anything. Finally this seems to work (using set-word has been indeed a great help for debugging !!!), what do you think ?

content: {<tag attribute1="valueattribute1" attribute2="valueattribute2" attribute3="valueattribute3">
</tag>
<tag attribute2="valueattribute21" attribute1="valueattribute11" >
</tag>
}


attribute1: [to {attribute1="} thru {attribute1="} copy valueattribute1 to {"} thru {"}]
attribute3: [to {attribute3="} thru {attribute3="} copy valueattribute3 to {"} thru {"}]

letter: charset reduce ["ABCDEFGHIJKLMNOPQRSTUabcdefghijklmnopqrstuvwxyz1234567890="]

attributes-rule: [(valueattribute1: none valueattribute3: none) 
[attribute1 | none] any letter pos: 
[attribute3 | none] (print valueattribute1 print valueattribute3)
| [attribute3 | none] any letter [attribute1 | none] (print valueattribute3 print valueattribute1
valueattribute1: none valueattribute3: none
)
| none
]

rule: [any [to {<tag } thru {<tag } attributes-rule {>} to {</tag>} thru {</tag>}] to end]

parse content rule

which outputs:

>> parse/all content rule
valueattribute1
valueattribute3
valueattribute11
none
== true
>>

回复收藏 0 原文

~没有更多了~

关于作者

风吹雨成花

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

解析和字符集：为什么我的脚本不起作用

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

解析和字符集：为什么我的脚本不起作用

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。