re 模块中的正则表达式支持字边界 (\b) 吗?

发布于 2024-09-28 10:44:19 字数 285 浏览 1 评论 0原文

在尝试更多地了解正则表达式时,教程建议您可以使用 \b 来匹配单词边界。但是,Python 解释器中的以下代码片段无法按预期工作:

>>> x = 'one two three'
>>> y = re.search("\btwo\b", x)

如果有任何内容匹配,它应该是一个匹配对象,但它是 None

Python 不支持 \b 表达式还是我使用错误?

While trying to learn a little more about regular expressions, a tutorial suggested that you can use the \b to match a word boundary. However, the following snippet in the Python interpreter does not work as expected:

>>> x = 'one two three'
>>> y = re.search("\btwo\b", x)

It should have been a match object if anything was matched, but it is None.

Is the \b expression not supported in Python or am I using it wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

霞映澄塘 2024-10-05 10:44:19

应该在代码中使用原始字符串

>>> x = 'one two three'
>>> y = re.search(r"\btwo\b", x)
>>> y
<_sre.SRE_Match object at 0x100418a58>
>>> 

另外,你为什么不尝试

word = 'two'
re.compile(r'\b%s\b' % word, re.I)

输出:

>>> word = 'two'
>>> k = re.compile(r'\b%s\b' % word, re.I)
>>> x = 'one two three'
>>> y = k.search( x)
>>> y
<_sre.SRE_Match object at 0x100418850>

You should be using raw strings in your code

>>> x = 'one two three'
>>> y = re.search(r"\btwo\b", x)
>>> y
<_sre.SRE_Match object at 0x100418a58>
>>> 

Also, why don't you try

word = 'two'
re.compile(r'\b%s\b' % word, re.I)

Output:

>>> word = 'two'
>>> k = re.compile(r'\b%s\b' % word, re.I)
>>> x = 'one two three'
>>> y = k.search( x)
>>> y
<_sre.SRE_Match object at 0x100418850>
烟柳画桥 2024-10-05 10:44:19

这将起作用: re.search(r"\btwo\b", x)

当您在 Python 中编写 "\b" 时,它是单个字符:“\x08”。要么像这样转义反斜杠:

"\\b"

要么像这样编写原始字符串:

r"\b"

This will work: re.search(r"\btwo\b", x)

When you write "\b" in Python, it is a single character: "\x08". Either escape the backslash like this:

"\\b"

or write a raw string like this:

r"\b"
懵少女 2024-10-05 10:44:19

只是为了明确解释为什么 re.search("\btwo\b", x) 不起作用,这是因为 \b 在Python 字符串是退格字符的简写。

print("foo\bbar")
fobar

因此,模式 "\btwo\b" 正在寻找一个退格键,后跟 two,然后是另一个退格键,即您要搜索的字符串 (x = '一二三') 没有。

要允许re.search(或compile)将序列\b解释为单词边界,请转义反斜杠(" \\btwo\\b")或使用原始字符串来创建您的模式(r"\btwo\b")。

Just to explicitly explain why re.search("\btwo\b", x) doesn't work, it's because \b in a Python string is shorthand for a backspace character.

print("foo\bbar")
fobar

So the pattern "\btwo\b" is looking for a backspace, followed by two, followed by another backspace, which the string you're searching in (x = 'one two three') doesn't have.

To allow re.search (or compile) to interpret the sequence \b as a word boundary, either escape the backslashes ("\\btwo\\b") or use a raw string to create your pattern (r"\btwo\b").

浊酒尽余欢 2024-10-05 10:44:19

Python 文档

https://docs. python.org/2/library/re.html#regular-expression-syntax

\b

匹配空字符串,但仅匹配单词的开头或结尾。单词被定义为字母数字或下划线字符的序列,因此单词的结尾由空格或非字母数字、非下划线字符指示。请注意,正式地,\b 被定义为 \w 和 \W 字符之间的边界(反之亦然),或者 \w 和字符串的开头/结尾之间的边界,因此被视为字母数字的字符的精确集合取决于关于 UNICODE 和 LOCALE 标志的值。例如,r'\bfoo\b' 匹配 'foo'、'foo.'、'(foo)'、'bar foo baz',但不匹配 'foobar' 或 'foo3'。在字符范围内,\b 表示退格字符,以与 Python 的字符串文字兼容。

Python documentation

https://docs.python.org/2/library/re.html#regular-expression-syntax

\b

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

笑脸一如从前 2024-10-05 10:44:19

请注意,对于动态变量,这将无法

x = 'one two three'
dy = "two"
y = re.search(r"\b" + dy + "\b", x)
print(y) # None

在左侧和右侧使用 r"\b"

x = 'one two three'
dy = "two"
y = re.search(r"\b" + dy + r"\b", x)
print(y) # <re.Match object; span=(4, 7), match='two'>

just a note, for dynamic variable this will not work

x = 'one two three'
dy = "two"
y = re.search(r"\b" + dy + "\b", x)
print(y) # None

use r"\b" on left and right

x = 'one two three'
dy = "two"
y = re.search(r"\b" + dy + r"\b", x)
print(y) # <re.Match object; span=(4, 7), match='two'>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文