MySQL REGEXP 中的负反向引用
MySQL 手册对于它支持哪些表达式不是很详细,所以我不确定 MySQL 是否可以实现以下操作。
我正在尝试使用 RLIKE 创建一个与以下内容匹配的查询。
任务是从 SQL 中获取包含给定句子中至少任意两个单词的所有句子。
比方说,我在正则表达式中使用了一些特定的单词:
hello, dog
我在数据库中有以下句子:
hello from dog
hello hello cat
dog says hello
dog dog goes away
big bad dog
从这些句子中我只想匹配
hello from dog
dog says hello
现在我有这样的:
SELECT *
FROM test
WHERE
test RLIKE '(hello|dog).*(hello|dog)'
问题是 - 我也得到了那些不需要的
hello hello cat
dog dog goes away
所以我想,我需要在第二个(hello|dog)之前进行反向引用。
在伪代码中,它看起来像这样:
RLIKE '(hello OR dog) anything can be here (hello OR dog, but not the word which already was in the previous group)'
所以它可能是这样的:
'(hello|dog).*(negative backreference to the 1st group goes here)(hello|dog)'
这种负反向引用可以在 MySQL 正则表达式中完成吗? 或者也许有一些更好的方法来编写执行相同操作的正则表达式,但也考虑到查询将由某些 C++ 代码生成,因此生成起来应该不会太复杂?
MySQL manual is not very detailed about what expressions it supports, so I am not sure if the following is possible with MySQL at all.
I am trying to create a query with RLIKE which matches the following.
The task is to get from SQL all the sentences which contains at least any two words from the given sentence.
Let's say, I have some certain words to use in regex:
hello, dog
I have following sentences in the database:
hello from dog
hello hello cat
dog says hello
dog dog goes away
big bad dog
From those all I want to match only
hello from dog
dog says hello
For now I have it like this:
SELECT *
FROM test
WHERE
test RLIKE '(hello|dog).*(hello|dog)'
The problem is - I get also those unneeded
hello hello cat
dog dog goes away
So I guess, I need a backreference right before the second (hello|dog).
In pseudo code it would look like this:
RLIKE '(hello OR dog) anything can be here (hello OR dog, but not the word which already was in the previous group)'
so it could be like:
'(hello|dog).*(negative backreference to the 1st group goes here)(hello|dog)'
Can such negative backreference be done in MySQL regex?
Or maybe there is some better way to write the regex which does the same thing, but also considering that the query will get generated by some C++ code, so it shouldn't be too complex to generate?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
MySQL 使用 Posix 扩展正则表达式引擎 (POSIX ERE),因此根本不支持反向引用。它也不支持环视,您需要环视才能构建可以处理此问题的单个正则表达式。
因此你必须拼出所有可能的组合:
当然,如果匹配候选的数量增加,这会变得很笨拙,所以正则表达式不是 MySQL 中的正确工具,除非你可以安装/使用 LIB_MYSQLUDF_PREG。
MySQL uses a Posix Extended Regular Expression engine (POSIX ERE) and therefore doesn't support backreferences at all. It also doesn't support lookaround which you would need in order to construct a single regex that could handle this.
Therefore you have to spell out all possible combinations:
Of course, this will get unwieldy if the number of match candidates increases, so regular expressions are not the right tool for this in MySQL, unless you can install/use LIB_MYSQLUDF_PREG.