修复正则表达式以解决 ICU/RegexKitLite 错误

发布于 2024-10-17 12:26:38 字数 364 浏览 2 评论 0原文

我正在使用 RegexKitLite，它又使用 ICU 作为其引擎。尽管有文档，但在搜索“xxxxxxxxxx”时，像 /x*/ 这样的正则表达式将匹配空字符串。它的行为应该像 /x*?/ 一样。我想在存在此错误时绕过它，并且当正则表达式匹配返回 0 长度结果时，我正在考虑将任何未转义的 * 重写为 + 。我天真的猜测是，用 + 代替 * 的正则表达式将始终返回正确结果的子集。这会带来什么意想不到的后果？我走的路对吗？

FWIW，ICU 还提供了 *+ 运算符，但它也不起作用。

编辑：我应该更清楚：这是交互式应用程序的搜索字段。我无法控制用户输入的正则表达式。损坏的 * 支持似乎是 ICU 中的一个错误。我当然希望我不需要在我的代码中包含该 POS，但这是镇上唯一的游戏。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你好，陌生人 2024-10-24 12:26:38

如果您只是将每个 * 量词更改为 +，则正则表达式将无法在 * 应该匹配了零次。换句话说，问题将从总是匹配零转变为从不匹配零。如果你问我，这两种方法都没有用。

但是，您也许可以使用负前瞻来单独处理零出现的情况。例如，x* 可以重写为 (?:(?!x)|x+)。我知道这很可怕，但这是我目前能想到的最独立的解决方案。您也必须对所有格星号 (*+) 执行此操作，但不能对不情愿的星号 (*?) 执行此操作。

这是表格形式：

BEFORE       AFTER
x*           (?:(?!x)|x+)
x*+          (?:(?!x)|x++)
x*?          x*?

More complex atoms would need to have their own parentheses preserved:

(?:xyz)*     (?:(?!(?:xyz))|(?:xyz)+)

You could probably drop them inside the lookahead, but they don't hurt anything except readability, and that's a lost cause anyway. :D If the {min,} and {min,max} forms are affected too, they would get the same treatment (with the same modifications for possessive variants):

x{0,}        same as x*
x{0,n}       (?:(?!x)|x{1,n})

我认为条件语句 -(?(condition)yes-pattern|no-pattern)-- 在这里非常适合；不幸的是，ICU似乎并不支持他们。

If you simply change every * quantifier to a +, the regex will fail to work in those instances where the * should have matched zero occurrences. In other words, the problem will have morphed from always matching zero to never matching zero. If you ask me, it's useless either way.

However, you might be able to handle the zero-occurrences case separately, with a negative lookahead. For example, x* could be rewritten as (?:(?!x)|x+). It's hideous I know, but it's the most self-contained fix I can envision at the moment. You would have to do this for possessive stars as well (*+), but not reluctant stars (*?).

Here it is in table form:

BEFORE       AFTER
x*           (?:(?!x)|x+)
x*+          (?:(?!x)|x++)
x*?          x*?

More complex atoms would need to have their own parentheses preserved:

(?:xyz)*     (?:(?!(?:xyz))|(?:xyz)+)

x{0,}        same as x*
x{0,n}       (?:(?!x)|x{1,n})

It occurs to me that conditionals--(?(condition)yes-pattern|no-pattern)--would be a perfect fit here; unfortunately, ICU doesn't seem to support them.

回复收藏 0 原文