为什么 Python 的原始字符串文字不能以单个反斜杠结尾?
从技术上讲,可以是任意奇数个反斜杠,如文档中所述< /a>.
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
File "<stdin>", line 1
r'\\\'
^
SyntaxError: EOL while scanning string literal
解析器似乎可以将原始字符串中的反斜杠视为常规字符(这不是原始字符串的全部内容吗?),但我可能遗漏了一些明显的东西。
Technically, any odd number of backslashes, as described in the documentation.
>>> r'\'
File "<stdin>", line 1
r'\'
^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
File "<stdin>", line 1
r'\\\'
^
SyntaxError: EOL while scanning string literal
It seems like the parser could just treat backslashes in raw strings as regular characters (isn't that what raw strings are all about?), but I'm probably missing something obvious.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(14)
关于 python 原始字符串的整个误解是,大多数人认为反斜杠(在原始字符串内)只是与其他所有字符一样的常规字符。 它不是。 理解的关键是这个Python的教程序列:
因此反斜杠后面的任何字符都是原始字符串的一部分。 一旦解析器输入一个原始字符串(非 Unicode 字符串)并遇到反斜杠,它就知道有 2 个字符(一个反斜杠和后面的一个字符)。
这边走:
和:
最后一种情况表明,根据文档,现在解析器无法找到结束引号,因为您在上面看到的最后一个引号是字符串的一部分,即反斜杠不能位于此处的最后,因为它将“吞噬”字符串结束字符。
The whole misconception about python's raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python's tutorial sequence:
So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).
This way:
and:
Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will 'devour' string closing char.
原因在我用粗体突出显示的部分中进行了解释:
所以原始字符串并不是 100% 原始的,仍然有一些基本的反斜杠处理。
The reason is explained in the part of that section which I highlighted in bold:
So raw strings are not 100% raw, there is still some rudimentary backslash-processing.
它就是这样儿的! 我认为这是 python 中的小缺陷之一!
我不认为有什么充分的理由,但它绝对不是解析; 使用 \ 作为最后一个字符来解析原始字符串非常容易。
问题是,如果你允许 \ 成为原始字符串中的最后一个字符,那么你将无法将 " 放入原始字符串中。似乎 python 允许 " 而不是允许 \ 作为最后一个字符。
不过,这应该不会造成任何麻烦。
如果您担心无法轻松编写 Windows 文件夹路径,例如
c:\mypath\
那么不用担心,因为您可以将它们表示为r"C:\mypath"
,并且,如果您需要附加子目录名称,请不要使用字符串连接来完成,因为无论如何这都不是正确的方法! 使用 os.path.joinThat's the way it is! I see it as one of those small defects in python!
I don't think there's a good reason for it, but it's definitely not parsing; it's really easy to parse raw strings with \ as a last character.
The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character.
However, this shouldn't cause any trouble.
If you're worried about not being able to easily write windows folder pathes such as
c:\mypath\
then worry not, for, you can represent them asr"C:\mypath"
, and, if you need to append a subdirectory name, don't do it with string concatenation, for it's not the right way to do it anyway! useos.path.join
为了让你用斜杠结束原始字符串,我建议你可以使用这个技巧:
它使用Python中字符串文字的隐式串联,并将一个用双引号分隔的字符串与另一个用单引号分隔的字符串连接起来。 丑陋,但有效。
In order for you to end a raw string with a slash I suggest you can use this trick:
It uses the implicit concatenation of string literals in Python and concatenates one string delimited with double quotes with another that is delimited by single quotes. Ugly, but works.
另一个技巧是使用 chr(92),因为它的计算结果为“\”。
我最近不得不清理一串反斜杠,下面的方法解决了这个问题:
我意识到这并没有解决“为什么”,但该线程吸引了许多人寻找解决眼前问题的方法。
Another trick is to use chr(92) as it evaluates to "\".
I recently had to clean a string of backslashes and the following did the trick:
I realize that this does not take care of the "why" but the thread attracts many people looking for a solution to an immediate problem.
既然原始字符串中允许使用“\”,那么它就不能用来识别字符串文字的结尾。
为什么当遇到第一个“时不停止解析字符串文字呢?”
如果是这种情况,那么字符串文字中将不允许使用 \"。但事实确实如此。
Since \" is allowed inside the raw string. Then it can't be used to identify the end of the string literal.
Why not stop parsing the string literal when you encounter the first "?
If that was the case, then \" wouldn't be allowed inside the string literal. But it is.
r'\'
语法不正确的原因是,尽管字符串表达式是原始的,但使用的引号(单引号或双引号)始终必须转义,否则它们会标记引号的结尾。 所以如果你想在单引号字符串中表达单引号,除了使用\'
之外没有其他办法。 同样适用于双引号。但你可以使用:
The reason for why
r'\'
is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using\'
. Same applies for double quotes.But you could use:
另一位删除了答案的用户(不确定他们是否愿意被认可)建议,Python 语言设计者也许可以通过使用相同的解析规则并将转义字符扩展为原始形式来简化解析器设计。 (如果文字被标记为原始)。
我认为这是一个有趣的想法,并将其作为社区维基供后代使用。
Another user who has since deleted their answer (not sure if they'd like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).
I thought it was an interesting idea and am including it as community wiki for posterity.
考虑到对 Python 原始字符串末尾奇数个反斜杠的看似任意的限制的混乱,可以公平地说,这是一个设计错误或遗留问题问题源于对更简单的解析器的渴望。
虽然解决方法(例如
r'C:\some\path' '\\'
产生(以 Python 表示法:)'C:\\some\\path\\'
或(逐字:)C:\some\path\
)很简单,需要它们是违反直觉的。 为了进行比较,让我们看一下 C++ 和 Perl。在C++中,我们可以直接使用原始字符串文字语法
得到以下输出:
如果我们想在字符串文字中使用结束分隔符(上面:
)
),我们甚至可以以特殊的方式将语法扩展为R “delimiterString(quotedMaterial)delimiterString”
。 例如,R"asdf(some random delimiters: ( } [ ] { ) < > just for fun)asdf"
生成字符串some random delimiters: ( } [ ] { ) < > 只是为了在输出中有趣
。 (这不是“asdf”的一个很好的用法!)在Perl中,此代码
将输出以下内容:
This is a test.\This is another test.
替换第一行 by
会导致错误消息:
Can't find string terminator "}" Anywhere before EOF at main.pl line 1.
但是,Perl 处理预分隔符
\< /code> 作为转义字符并不会阻止用户在结果字符串末尾出现奇数个反斜杠; 例如,要将 3 个反斜杠
\\\
放入$str
的末尾,只需用 6 个反斜杠结束代码即可:my $str = q{This is a test .\\\\\\};
. 重要的是,虽然我们需要在输入中使用双反斜杠,但不存在类似 Python 的看似不一致的语法限制。另一种看待事物的方式是,这 3 种语言使用不同的方式来解决转义字符和结束分隔符之间交互的解析问题:
r'stringWithoutFinalBackslash' '\\'
1 自定义
delimiterString
本身的长度不能超过 16 个字符,但这并不是一个限制。² 如果您需要分隔符本身,只需使用
\
进行转义即可。然而,为了公平地与 Python 进行比较,我们需要承认 (1) C++ 直到 C++11 才具有这样的字符串文字,并且众所周知难以解析,(2) Perl 更难解析。
Given the confusion around the arbitrary-seeming restriction against an odd number of backslashes at the end of a Python raw-string, it's fair to say that this is a design mistake or legacy issue originating in a desire to have a simpler parser.
While workarounds (such as
r'C:\some\path' '\\'
yielding (in Python notation:)'C:\\some\\path\\'
or (verbatim:)C:\some\path\
) are simple, it's counterintuitive to be needing them. For comparison, let's have a look at C++ and Perl.In C++, we can straightforwardly use raw string literal syntax
to get the following output:
If we want to use the closing delimiter (above:
)
) within the string literal, we can even extend the syntax in an ad-hoc way toR"delimiterString(quotedMaterial)delimiterString"
. For example,R"asdf(some random delimiters: ( } [ ] { ) < > just for fun)asdf"
produces the stringsome random delimiters: ( } [ ] { ) < > just for fun
in the output. (Ain't that a good use of "asdf"!)In Perl, this code
will output the following:
This is a test.\This is another test.
Replacing the first line by
would lead to an error message:
Can't find string terminator "}" anywhere before EOF at main.pl line 1.
However, Perl treating a pre-delimiter
\
as an escape character doesn't prevent the user from having an odd number of backslashes at the end of the resulting string; eg to place 3 backslashes\\\
into the end of$str
, simply end the code with 6 backslashes:my $str = q{This is a test.\\\\\\};
. Importantly, while we need to double the backslashes in the input, there is no Python-like inconsistent-seeming syntactic restriction.Another way of looking at things is that these 3 languages use different ways to address the parsing issue of interaction between escape characters and closing delimiters:
r'stringWithoutFinalBackslash' '\\'
¹ The custom
delimiterString
itself cannot be more than 16 characters long, but that's hardly a limitation.² If you need the delimiter itself, just escape it with
\
.However, to be fair in a comparison to Python, we need to acknowledge that (1) C++ didn't have such string literals until C++11 and is famously hard to parse and (2) Perl is even harder to parse.
天真的原始字符串 原始
字符串的天真的想法是
不幸的是,这不起作用,因为如果无论如何
碰巧包含引号,原始字符串将在该点结束。
根本不可能“随心所欲”
在固定分隔符之间,因为其中一些可能看起来像
终止分隔符——无论该分隔符是什么。
现实世界的原始字符串(变体 1)
解决此问题的一种可能方法是:
这一限制听起来很严厉,直到人们认识到这一点
Python 提供的大量引用可以适应大多数情况
有了这个规则。 以下都是有效的 Python 引号:
有了这么多分隔符的可能性,几乎任何东西
可以工作。
唯一的例外是如果字符串
文字应该包含所有允许的完整列表
Python 引用。
现实世界的原始字符串(变体 2,如 Python 中)
然而,Python 采用了不同的路线,使用
上述规则的扩展版本。
它有效地说明了
因此,从某种意义上说,Python 方法更加自由
比上面的变体 1 — 但它有以下副作用
“mis”将结束引号解释为字符串的一部分
如果字符串的最后一个预期字符是反斜杠。
变体 2 没有帮助:
但不是反斜杠,这是我的字符串文字的允许版本
不会是我所需要的。
然而,鉴于我有三种不同的其他类型的报价
在我的支配下,我可能只会选择其中之一,然后我的
问题将会得到解决——所以这不是有问题的情况。
如果我希望我的字符串以反斜杠结尾,我会不知所措。
我需要诉诸连接非原始字符串文字
包含反斜杠。
结论
写完这篇文章后,我和其他几张海报一起去
变体 1 会更容易理解和接受
因此更加Pythonic。 这就是生活!
Naive raw strings
The naive idea of a raw string is
Unfortunately, this does not work, because if the whatever
happens to contain a quote, the raw string would end at that point.
It is simply impossible that I can put "whatever I want"
between fixed delimiters, because some of it could look like
the terminating delimiter -- no matter what that delimiter is.
Real-world raw strings (variant 1)
One possible approach to this problem would be to say
This restriction sounds harsh, until one recognizes that
Python's large offering of quotes can accommodate most situations
with this rule. The following are all valid Python quotes:
With this many possibilities for the delimiter, almost anything
can be made to work.
About the only exception would be if the string
literal is supposed to contain a complete list of all allowed
Python quotes.
Real-world raw strings (variant 2, as in Python)
Python, however, takes a different route using
an extended version of the above rule.
It effectively states
So the Python approach is, in a sense, even more liberal
than variant 1 above -- but it has the side effect of
"mis"interpreting the closing quote as part of the string
if the last intended character of the string is a backslash.
Variant 2 is not helpful:
but not the backslash, the allowed version of my string literal
will not be what I need.
However, given the three different other kinds of quotes I have
at my disposal, I will probably just pick one of those and my
problem will be solved -- so this is not problematic case.
If I want my string to end with a backslash, I am at a loss.
I need to resort to concatenating a non-raw string literal
containing the backslash.
Conclusion
After writing this, I go with several of the other posters
that variant 1 would have been easier to understand and to accept
and therefore more pythonic. That's life!
从 C 开始,我很清楚单个 \ 作为转义字符,允许您将特殊字符(例如换行符、制表符和引号)放入字符串中。
这确实不允许 \ 作为最后一个字符,因为它会转义 " 并使解析器窒息。但正如前面指出的 \ 是合法的。
Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.
That does indeed disallow \ as last character since it will escape the " and make the parser choke. But as pointed out earlier \ is legal.
一些提示:
1)如果您需要操作路径的反斜杠,那么标准 python 模块 os.path 是您的朋友。 例如 :
2) 如果你想构建带有反斜杠的字符串,但在字符串末尾没有反斜杠,那么原始字符串就是你的朋友(在你的文字之前使用 'r' 前缀细绳)。 例如:
3) 如果您需要在变量 X 中的字符串前面加上反斜杠,那么您可以这样做:
4) 如果您需要创建一个末尾带有反斜杠的字符串,则结合提示 2 和 3:
现在 lilypond_statement 包含
"\DisplayLilyMusic \upper"
python 万岁! :)
n3on
some tips :
1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :
2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use 'r' prefix before your literal string). for example :
3) if you need to prefix a string in a variable X with a backslash then you can do this :
4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :
now lilypond_statement contains
"\DisplayLilyMusic \upper"
long live python ! :)
n3on
我遇到了这个问题并找到了适合某些情况的部分解决方案。 尽管 python 无法以单个反斜杠结束字符串,但可以将其序列化并保存在文本文件中,并在末尾添加一个反斜杠。 因此,如果您需要在计算机上保存带有单个反斜杠的文本,则有可能:
顺便说一句,如果您使用 python 的 json 库转储它,它就无法与 json 一起使用。
最后,我使用 Spyder,我注意到,如果我通过双击变量资源管理器中的名称来打开蜘蛛的文本编辑器中的变量,它会显示一个反斜杠,并且可以通过这种方式复制到剪贴板(这不是对大多数需求非常有帮助,但也许对某些需求..)。
I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:
BTW it is not working with json if you dump it using python's json library.
Finally, I work with Spyder, and I noticed that if I open the variable in spider's text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it's not very helpful for most needs but maybe for some..).