Python 正则表达式 - r 前缀
谁能解释为什么下面的示例 1 在不使用 r
前缀的情况下有效? 我认为每当使用转义序列时都必须使用 r
前缀。 示例 2 和示例 3 证明了这一点。
# example 1
import re
print (re.sub('\s+', ' ', 'hello there there'))
# prints 'hello there there' - not expected as r prefix is not used
# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))
# prints 'hello there' - as expected as r prefix is used
# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello there there'))
# prints 'hello there there' - as expected as r prefix is not used
Can anyone explain why example 1 below works, when the r
prefix is not used?
I thought the r
prefix must be used whenever escape sequences are used.
Example 2 and example 3 demonstrate this.
# example 1
import re
print (re.sub('\s+', ' ', 'hello there there'))
# prints 'hello there there' - not expected as r prefix is not used
# example 2
import re
print (re.sub(r'(\b\w+)(\s+\1\b)+', r'\1', 'hello there there'))
# prints 'hello there' - as expected as r prefix is used
# example 3
import re
print (re.sub('(\b\w+)(\s+\1\b)+', '\1', 'hello there there'))
# prints 'hello there there' - as expected as r prefix is not used
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
因为
\
仅当转义序列是有效的转义序列时才开始。:在路径文字的原始字符串上,因为原始字符串有一些相当特殊的内部工作原理,众所周知,这些工作原理已经让人痛不欲生了:
为了更好地说明最后一点:
Because
\
begin escape sequences only when they are valid escape sequences.Never rely on raw strings for path literals, as raw strings have some rather peculiar inner workings, known to have bitten people in the ass:
To better illustrate this last point:
'r' 表示以下是“原始字符串”,即。反斜杠字符按字面意思处理,而不是表示对后面的字符进行特殊处理。
http://docs.python.org/reference/lexical_analysis.html#literals
所以
'\n'
是一个换行符r'\n'
是两个字符 - 反斜杠和字母 'n'另一种写法是
'\\n'
因为第一个反斜杠转义了第二个反斜杠,等效的写法
是
因为 Python 处理无效转义字符的方式,而不是所有转义字符这些双反斜杠是必要的 - 例如
'\s'=='\\s'
但对于'\b'
和'\\ 则不然b'
。我的偏好是明确并将所有反斜杠加倍。the 'r' means the the following is a "raw string", ie. backslash characters are treated literally instead of signifying special treatment of the following character.
http://docs.python.org/reference/lexical_analysis.html#literals
so
'\n'
is a single newlineand
r'\n'
is two characters - a backslash and the letter 'n'another way to write it would be
'\\n'
because the first backslash escapes the secondan equivalent way of writing this
is
Because of the way Python treats characters that are not valid escape characters, not all of those double backslashes are necessary - eg
'\s'=='\\s'
however the same is not true for'\b'
and'\\b'
. My preference is to be explicit and double all the backslashes.并非所有涉及反斜杠的序列都是转义序列。例如,
\t
和\f
是,但\s
不是。在非原始字符串文字中,任何不属于转义序列的\
都被视为另一个\
:\b
是一个转义序列,因此示例 3 失败。 (是的,有些人认为这种行为相当不幸。)Not all sequences involving backslashes are escape sequences.
\t
and\f
are, for example, but\s
is not. In a non-raw string literal, any\
that is not part of an escape sequence is seen as just another\
:\b
is an escape sequence, however, so example 3 fails. (And yes, some people consider this behaviour rather unfortunate.)尝试一下:
Try that:
检查下面的例子:
Check below example: