如何修复此 wiki 链接解析正则表达式?

发布于 2024-10-27 18:55:16 字数 663 浏览 4 评论 0原文

我有一个旧的 wiki,我正在将其转换为使用 Markdown 和 [[]] wiki 链接格式的新 wiki。不幸的是,旧的维基确实很旧,并且有很多生成链接的方式,包括。 CamelCase、单括号 ([]) wiki 链接等。

我正在 sed 中转换正则表达式,并使用以下正则表达式将独立的 CamelCase 链接转换为双括号 ([[]]) wiki 链接:

s/([^[|])([A-Z][a-z]+[A-Z][A-Za-z]+)([^]|])/\1\[\[\2\]\]\3/g

不幸的是,上述的一个问题(在我尝试不转换现有单括号维基链接中的驼峰命名法,因为两者混合的情况下)是像 [BluetoothConnection|UsingBluetoothIndex] 这样的东西会被转换到[BluetoothConnection|使用[[BluetoothInde]]x]

我怎样才能解决这个问题并迫使比赛变得更加贪婪,从而失败并且在这种情况下不进行替换?如果 sed 的增强正则表达式过于有限,我愿意通过 perl 而不是 sed

I've got an old wiki that I'm converting to a new wiki which uses Markdown and [[]] wiki link format. Unfortunately, the old wiki is really old and had many ways of producing links, incl. CamelCase, single-bracket ([]) wiki links, among others.

I'm converting w/regular expressions in sed and use the following regular expression to convert stand-alone CamelCase links to double-bracket ([[]]) wiki links:

s/([^[|])([A-Z][a-z]+[A-Z][A-Za-z]+)([^]|])/\1\[\[\2\]\]\3/g

Unfortunately, the one problem with the above (in my attempt to not convert CamelCase in existing single-bracket wiki links, since there's a mix of both) is that something like [BluetoothConnection|UsingBluetoothIndex] will get converted to [BluetoothConnection|Using[[BluetoothInde]]x].

How can I resolve this issue and force the match to be more greedy and therefore fail and not make a substitution in that case? If sed's enhanced regular expressions turn out to be too limiting, I'm willing to pass through perl instead of sed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜灵血窟げ 2024-11-03 18:55:16

好吧,你可以试试这个:

$ echo "UsingBluetoothIndex" | sed -E 's!([^\[\|]?)([A-Z][a-z]+[A-Z][A-Za-z]+)($|\b|[]|])!\1\[\[\2\]\]\3!g'
Output: [[UsingBluetoothIndex]]

$ echo "[BluetoothConnection|UsingBluetoothIndex]" | sed -E 's!([^\[\|]?)([A-Z][a-z]+[A-Z][A-Za-z]+)($|\b|[]|])!\1\[\[\2\]\]\3!g'
Output: [[[BluetoothConnection]]|[[UsingBluetoothIndex]]]

更新:

好吧,我相信现在我已经使用 perl 的否定查找指令。所以这里是:

perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'

echo "BluetoothConnection" | perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'
Output: [[BluetoothConnection]]

echo "[BluetoothConnection|UsingBluetoothIndex]" | perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'
Output: [BluetoothConnection|UsingBluetoothIndex]

它所做的就是检查文本是否不是以“|”开头或 '[' 并且不以 |] 结尾,然后将其括在 [[]] 中。

Alright can you try this:

$ echo "UsingBluetoothIndex" | sed -E 's!([^\[\|]?)([A-Z][a-z]+[A-Z][A-Za-z]+)($|\b|[]|])!\1\[\[\2\]\]\3!g'
Output: [[UsingBluetoothIndex]]

$ echo "[BluetoothConnection|UsingBluetoothIndex]" | sed -E 's!([^\[\|]?)([A-Z][a-z]+[A-Z][A-Za-z]+)($|\b|[]|])!\1\[\[\2\]\]\3!g'
Output: [[[BluetoothConnection]]|[[UsingBluetoothIndex]]]

Update:

Alright I believe now I have regex for your problem using perl's negative look behind directive. So here it is:

perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'

echo "BluetoothConnection" | perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'
Output: [[BluetoothConnection]]

echo "[BluetoothConnection|UsingBluetoothIndex]" | perl -pe 's#(^|\b)((?![|\[])[A-Z][a-z]+[A-Z][A-Za-z]+(?![|\]]))($|\b)#\[\[\2\]\]#g'
Output: [BluetoothConnection|UsingBluetoothIndex]

All it is doing is checking if text is not starting with '|' or '[' and NOT ending with | or ] then enclose it in [[ and ]].

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文