REGEX 用于完整单词匹配
好的,所以我很困惑(显然)
我试图返回行(来自 Oracle),其中文本字段包含完整的单词,而不仅仅是子字符串。
一个简单的例子是“我”这个词。
显示字符串包含单词“I”的所有行,但不仅仅是“I”是 '%I%'
中某处的子字符串,
因此我编写了我认为是一个简单的正则表达式的内容:
select REGEXP_INSTR(upper(description), '\bI\b') from mytab;
期望我应该被检测到单词边界。我没有得到任何结果(或者每行的结果为 0。
我期望的是:
- '我是管理员' -> 1
- '我是管理员' -> 0
- '我是管理员吗' -> 1
- '这是臭名昭著的管理员' -> 0
- '管理员,我' -> 1
不是应该通过单词边界来查找包含的字符串吗
?
OK So i am confused (obviously)
I'm trying to return rows (from Oracle) where a text field contains a complete word, not just the substring.
a simple example is the word 'I'.
Show me all rows where the string contains the word 'I', but not simply where 'I' is a substring somewhere as in '%I%'
so I wrote what i thought would be a simple regex:
select REGEXP_INSTR(upper(description), '\bI\b') from mytab;
expecting that I should be detected with word boundaries. I get no results (or rather the result 0 for each row.
what i expect:
- 'I am the Administrator' -> 1
- 'I'm the administrator' -> 0
- 'Am I the administrator' -> 1
- 'It is the infamous administrator' -> 0
- 'The adminisrtrator, tis I' -> 1
isn't the /b supposed to find the contained string by word boundary?
tia
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我相信您的正则表达式风格不支持 \b :
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#i1007670
因此你可以这样做:
至少确保你的“word”由一些空格分隔或者它是整个字符串。
I believe that \b is not supported by your flavor of regex :
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#i1007670
Therefore you could do something like :
To at least ensure that your "word" is separated by some whitespace or it's the whole string.
Oracle 不支持字边界锚,但即使支持,您也不会获得所需的结果:
\b
字母数字字符和非字母数字字符之间的匹配。 alnum 的确切定义因实现而异,但在大多数风格中,它是[A-Za-z0-9_]
(.NET 也考虑 Unicode 字母/数字)。因此,
%I%
中的I
周围有两个边界。如果您将单词边界定义为“单词之前/之后的空格”,那么您可以使用
它也适用于字符串的开头/结尾。
Oracle doesn't support word boundary anchors, but even if it did, you wouldn't get the desired result:
\b
matches between an alphanumeric character and a non-alphanumeric character. The exact definition of what an alnum is differs between implementations, but in most flavors, it's[A-Za-z0-9_]
(.NET also considers Unicode letters/digits).So there are two boundaries around the
I
in%I%
.If you define your word boundary as "whitespace before/after the word", then you could use
which would also work at the start/end of the string.
Oracle 本机正则表达式支持是有限的。
\b
或<
不能用作单词分隔符。您可能需要 Oracle Text 进行单词搜索。Oracle native regex support is limited.
\b
or<
cannot be used as word delimiters. You may want Oracle Text for word search.