Oracle REGEXP_LIKE 和字边界
我在使用 REGEXP_LIKE 匹配单词边界时遇到问题。以下查询按预期返回一行。
select 1 from dual
where regexp_like('DOES TEST WORK HERE','TEST');
但我也想匹配单词边界。因此,添加“\b”字符将使此查询
select 1 from dual
where regexp_like('DOES TEST WORK HERE','\bTEST\b');
运行此命令返回零行。有什么想法吗?
I am having a problem with matching word boundaries with REGEXP_LIKE. The following query returns a single row, as expected.
select 1 from dual
where regexp_like('DOES TEST WORK HERE','TEST');
But I want to match on word boundaries as well. So, adding the "\b" characters gives this query
select 1 from dual
where regexp_like('DOES TEST WORK HERE','\bTEST\b');
Running this returns zero rows. Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我相信您想尝试一下,
因为
\b
没有出现在此列表中:Oracle 正则表达式中受 Perl 影响的扩展\s
确保测试以空格开始和结束。然而,这还不够,因为字符串test
也可能出现在匹配字符串的开头或末尾。因此,我使用替代(由|
表示)^
作为字符串的开头,使用$
作为字符串的结尾细绳。更新(3年后+)...
碰巧,我今天需要这个功能,在我看来,更好的正则表达式是
(^|\s|\W)test($|\s|\W)
( Oracle 中缺少的 \b 正则表达式特殊字符)。I believe you want to try
because the
\b
does not appear on this list: Perl-influenced Extensions in Oracle Regular ExpressionsThe
\s
makes sure that test starts and ends in a whitespace. This is not sufficient, however, since the stringtest
could also appear at the very start or end of the string being matched. Therefore, I use the alternative (indicated by the|
)^
for start of string and$
for end of string.Update (after 3 years+)...
As it happens, I needed this functionality today, and it appears to me, that even better a regular expression is
(^|\s|\W)test($|\s|\W)
(The missing \b regular expression special character in Oracle).Oracle 中可以检查整个单词的最短正则表达式是
请参阅正则表达式演示。
详细信息
(^|\W)
- 匹配的捕获组^
- 字符串开头|
- 或\W
- 非单词字符test
-一个单词($|\W)
- 匹配的捕获组$
- 字符串结尾|
- 或\W
- 非单词字符。请注意,
\W
匹配除字母、数字和_
之外的任何字符。如果要匹配可以出现在_
(下划线)之间的单词,则需要稍微不同的模式:[^[:alnum:]]
否定括号表达式匹配除字母数字字符之外的任何字符,并匹配_
,因此,_test_
将与此模式匹配。请参阅此正则表达式演示。
The shortest regex that can check for a whole word in Oracle is
See the regex demo.
Details
(^|\W)
- a capturing group matching either^
- start of string|
- or\W
- a non-word chartest
- a word($|\W)
- a capturing group matching either$
- end of string|
- or\W
- a non-word char.Note that
\W
matches any chars but letters, digits and_
. If you want to match a word that can appear in between_
(underscores), you need a bit different pattern:The
[^[:alnum:]]
negated bracket expression matches any char but alphanumeric chars, and matches_
, so,_test_
will be matched with this pattern.See this regex demo.
一般来说,我会坚持 René 的解决方案,例外是当您需要匹配为零长度时。即您不想实际捕获开头/结尾的非单词字符。
例如,如果我们的字符串是
test test
那么(\b)test(\b)
将匹配两次,但(^|\s|\W)test ($|\s|\W)
将仅匹配第一次出现的情况。至少,如果您尝试使用 regexp_substr,情况肯定是这样。示例
SELECT regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 1, 'i'),
regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 2, 'i') FROM Dual;
返回
test |空
In general, I would stick with René's solution, the exception being when you need the match to be zero-length. ie You don't want to actually capture the non-word character at the beginning/end.
For example, if our string is
test test
then(\b)test(\b)
will match twice but(^|\s|\W)test($|\s|\W)
will only match the first occurrence. At least, that's certainly the case if you try to use regexp_substr.Example
SELECT regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 1, 'i'),
regexp_substr('test test', '(^|\s|\W)test($|\s|\W)', 1, 2, 'i') FROM dual;
Returns
test |NULL