REGEX 用于完整单词匹配

发布于 2024-12-12 16:26:34 字数 525 浏览 2 评论 0原文

好的,所以我很困惑(显然)

我试图返回行(来自 Oracle),其中文本字段包含完整的单词,而不仅仅是子字符串。

一个简单的例子是“我”这个词。

显示字符串包含单词“I”的所有行,但不仅仅是“I”是 '%I%' 中某处的子字符串,

因此我编写了我认为是一个简单的正则表达式的内容:

select REGEXP_INSTR(upper(description), '\bI\b') from mytab;

期望我应该被检测到单词边界。我没有得到任何结果(或者每行的结果为 0。

我期望的是:

  • '我是管理员' -> 1
  • '我是管理员' -> 0
  • '我是管理员吗' -> 1
  • '这是臭名昭著的管理员' -> 0
  • '管理员,我' -> 1

不是应该通过单词边界来查找包含的字符串吗

OK So i am confused (obviously)

I'm trying to return rows (from Oracle) where a text field contains a complete word, not just the substring.

a simple example is the word 'I'.

Show me all rows where the string contains the word 'I', but not simply where 'I' is a substring somewhere as in '%I%'

so I wrote what i thought would be a simple regex:

select REGEXP_INSTR(upper(description), '\bI\b') from mytab;

expecting that I should be detected with word boundaries. I get no results (or rather the result 0 for each row.

what i expect:

  • 'I am the Administrator' -> 1
  • 'I'm the administrator' -> 0
  • 'Am I the administrator' -> 1
  • 'It is the infamous administrator' -> 0
  • 'The adminisrtrator, tis I' -> 1

isn't the /b supposed to find the contained string by word boundary?

tia

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

十二 2024-12-19 16:26:34

我相信您的正则表达式风格不支持 \b :

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#i1007670

因此你可以这样做:

(^|\s)word(\s|$)

至少确保你的“word”由一些空格分隔或者它是整个字符串。

I believe that \b is not supported by your flavor of regex :

http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#i1007670

Therefore you could do something like :

(^|\s)word(\s|$)

To at least ensure that your "word" is separated by some whitespace or it's the whole string.

悍妇囚夫 2024-12-19 16:26:34

Oracle 不支持字边界锚,但即使支持,您也不会获得所需的结果: \b 字母数字字符和非字母数字字符之间的匹配。 alnum 的确切定义因实现而异,但在大多数风格中,它是 [A-Za-z0-9_] (.NET 也考虑 Unicode 字母/数字)。

因此,%I% 中的 I 周围有两个边界。

如果您将单词边界定义为“单词之前/之后的空格”,那么您可以使用

(^|\s)I(\s|$)

它也适用于字符串的开头/结尾。

Oracle doesn't support word boundary anchors, but even if it did, you wouldn't get the desired result: \b matches between an alphanumeric character and a non-alphanumeric character. The exact definition of what an alnum is differs between implementations, but in most flavors, it's [A-Za-z0-9_] (.NET also considers Unicode letters/digits).

So there are two boundaries around the I in %I%.

If you define your word boundary as "whitespace before/after the word", then you could use

(^|\s)I(\s|$)

which would also work at the start/end of the string.

鸠书 2024-12-19 16:26:34

Oracle 本机正则表达式支持是有限的。 \b< 不能用作单词分隔符。您可能需要 Oracle Text 进行单词搜索。

Oracle native regex support is limited. \b or < cannot be used as word delimiters. You may want Oracle Text for word search.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文