在 Java 中使用正则表达式解析 wikiText

发布于 2024-11-13 09:03:41 字数 744 浏览 3 评论 0原文

给定一个 wikiText 字符串,例如:

{{ValueDescription
    |key=highway
    |value=secondary
    |image=Image:Meyenburg-L134.jpg
    |description=A highway linking large towns.
    |onNode=no
    |onWay=yes
    |onArea=no
    |combination=
    * {{Tag|name}}
    * {{Tag|ref}}
    |implies=
    * {{Tag|motorcar||yes}}
    }}

我想在 Java/Groovy 中解析模板 ValueDescriptionTag。 我尝试使用正则表达式 /\{\{\s*Tag(.+)\}\}/ ,它很好(它返回 |name |ref|motorcar||yes),但是 /\{\{\s*ValueDescription(.+)\}\}/ 不起作用(它应该返回上面的所有文本)。

预期输出

有没有办法跳过正则表达式中的嵌套模板?

理想情况下,我宁愿使用简单的 wikiText 2 xml 工具,但我找不到类似的东西。

谢谢! 穆隆

Given a wikiText string such as:

{{ValueDescription
    |key=highway
    |value=secondary
    |image=Image:Meyenburg-L134.jpg
    |description=A highway linking large towns.
    |onNode=no
    |onWay=yes
    |onArea=no
    |combination=
    * {{Tag|name}}
    * {{Tag|ref}}
    |implies=
    * {{Tag|motorcar||yes}}
    }}

I'd like to parse templates ValueDescription and Tag in Java/Groovy.
I tried with with regex /\{\{\s*Tag(.+)\}\}/ and it's fine (it returns |name |ref and |motorcar||yes), but
/\{\{\s*ValueDescription(.+)\}\}/ doesn't work (it should return all the text above).

The expected output

Is there a way to skip nested templates in the regex?

Ideally I would rather use a simple wikiText 2 xml tool, but I couldn't find anything like that.

Thanks!
Mulone

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

转角预定愛 2024-11-20 09:03:41

任意嵌套的标签不起作用,因为这使得语法非-常规。您需要能够处理上下文无关语法的东西。 ANTLR 是一个不错的选择。

Arbitrarily nested tags won't work since that's makes the grammar non-regular. You need something capable of dealing with a context-free grammar. ANTLR is a fine option.

北方。的韩爷 2024-11-20 09:03:41

使用 Pattern.DOTALL 选项创建正则表达式模式,如下所示:

Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}", Pattern.DOTALL);

示例代码:

Pattern p=Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}",Pattern.DOTALL);
Matcher m=p.matcher(str);
while (m.find())
   System.out.println("Matched: [" + m.group(1) + ']');

输出

Matched: [
|key=highway
|value=secondary
|image=Image:Meyenburg-L134.jpg
|description=A highway linking large towns.
|onNode=no
|onWay=yes
|onArea=no
|combination=
* {{Tag|name}}
* {{Tag|ref}}
|implies=
* {{Tag|motorcar||yes}}
]

更新

假设结束 }} 出现在 {{ValueDescription 的单独行上> 以下模式将用于捕获多个 ValueDescription

Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+?)\n\\}\\}", Pattern.DOTALL);

Create your regex pattern using Pattern.DOTALL option like this:

Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}", Pattern.DOTALL);

Sample Code:

Pattern p=Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}",Pattern.DOTALL);
Matcher m=p.matcher(str);
while (m.find())
   System.out.println("Matched: [" + m.group(1) + ']');

OUTPUT

Matched: [
|key=highway
|value=secondary
|image=Image:Meyenburg-L134.jpg
|description=A highway linking large towns.
|onNode=no
|onWay=yes
|onArea=no
|combination=
* {{Tag|name}}
* {{Tag|ref}}
|implies=
* {{Tag|motorcar||yes}}
]

Update

Assuming closing }} appears on a separate line for {{ValueDescription following pattern will work to capture multiple ValueDescription:

Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+?)\n\\}\\}", Pattern.DOTALL);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文