MySQL REGEXP 中的负反向引用

发布于 2024-09-30 23:15:02 字数 1027 浏览 1 评论 0原文

MySQL 手册对于它支持哪些表达式不是很详细,所以我不确定 MySQL 是否可以实现以下操作。

我正在尝试使用 RLIKE 创建一个与以下内容匹配的查询。

任务是从 SQL 中获取包含给定句子中至少任意两个单词的所有句子。

比方说,我在正则表达式中使用了一些特定的单词:

hello, dog

我在数据库中有以下句子:

hello from dog
hello hello cat
dog says hello
dog dog goes away
big bad dog

从这些句子中我只想匹配

hello from dog
dog says hello

现在我有这样的:

SELECT *
FROM test
WHERE 
test RLIKE '(hello|dog).*(hello|dog)'

问题是 - 我也得到了那些不需要的

hello hello cat
dog dog goes away

所以我想,我需要在第二个(hello|dog)之前进行反向引用。

在伪代码中,它看起来像这样:

RLIKE '(hello OR dog) anything can be here (hello OR dog, but not the word which already was in the previous group)'

所以它可能是这样的:

'(hello|dog).*(negative backreference to the 1st group goes here)(hello|dog)'

这种负反向引用可以在 MySQL 正则表达式中完成吗? 或者也许有一些更好的方法来编写执行相同操作的正则表达式,但也考虑到查询将由某些 C++ 代码生成,因此生成起来应该不会太复杂?

MySQL manual is not very detailed about what expressions it supports, so I am not sure if the following is possible with MySQL at all.

I am trying to create a query with RLIKE which matches the following.

The task is to get from SQL all the sentences which contains at least any two words from the given sentence.

Let's say, I have some certain words to use in regex:

hello, dog

I have following sentences in the database:

hello from dog
hello hello cat
dog says hello
dog dog goes away
big bad dog

From those all I want to match only

hello from dog
dog says hello

For now I have it like this:

SELECT *
FROM test
WHERE 
test RLIKE '(hello|dog).*(hello|dog)'

The problem is - I get also those unneeded

hello hello cat
dog dog goes away

So I guess, I need a backreference right before the second (hello|dog).

In pseudo code it would look like this:

RLIKE '(hello OR dog) anything can be here (hello OR dog, but not the word which already was in the previous group)'

so it could be like:

'(hello|dog).*(negative backreference to the 1st group goes here)(hello|dog)'

Can such negative backreference be done in MySQL regex?
Or maybe there is some better way to write the regex which does the same thing, but also considering that the query will get generated by some C++ code, so it shouldn't be too complex to generate?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

魄砕の薆 2024-10-07 23:15:02

MySQL 使用 Posix 扩展正则表达式引擎 (POSIX ERE),因此根本不支持反向引用。它也不支持环视,您需要环视才能构建可以处理此问题的单个正则表达式。

因此你必须拼出所有可能的组合:

hello.*dog|dog.*hello

当然,如果匹配候选的数量增加,这会变得很笨拙,所以正则表达式不是 MySQL 中的正确工具,除非你可以安装/使用 LIB_MYSQLUDF_PREG

MySQL uses a Posix Extended Regular Expression engine (POSIX ERE) and therefore doesn't support backreferences at all. It also doesn't support lookaround which you would need in order to construct a single regex that could handle this.

Therefore you have to spell out all possible combinations:

hello.*dog|dog.*hello

Of course, this will get unwieldy if the number of match candidates increases, so regular expressions are not the right tool for this in MySQL, unless you can install/use LIB_MYSQLUDF_PREG.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文