为什么 Python 的原始字符串文字不能以单个反斜杠结尾?

发布于 2024-07-15 06:45:14 字数 510 浏览 10 评论 0原文

从技术上讲,可以是任意奇数个反斜杠,如文档中所述< /a>.

>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
  File "<stdin>", line 1
    r'\\\'
         ^
SyntaxError: EOL while scanning string literal

解析器似乎可以将原始字符串中的反斜杠视为常规字符(这不是原始字符串的全部内容吗?),但我可能遗漏了一些明显的东西。

Technically, any odd number of backslashes, as described in the documentation.

>>> r'\'
  File "<stdin>", line 1
    r'\'
       ^
SyntaxError: EOL while scanning string literal
>>> r'\\'
'\\\\'
>>> r'\\\'
  File "<stdin>", line 1
    r'\\\'
         ^
SyntaxError: EOL while scanning string literal

It seems like the parser could just treat backslashes in raw strings as regular characters (isn't that what raw strings are all about?), but I'm probably missing something obvious.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(14

爱,才寂寞 2024-07-22 06:45:14

关于 python 原始字符串的整个误解是,大多数人认为反斜杠(在原始字符串内)只是与其他所有字符一样的常规字符。 它不是。 理解的关键是这个Python的教程序列:

当存在“r”或“R”前缀时,后面的字符
反斜杠不加改变地包含在字符串中,并且所有
反斜杠留在字符串中

因此反斜杠后面的任何字符都是原始字符串的一部分。 一旦解析器输入一个原始字符串(非 Unicode 字符串)并遇到反斜杠,它就知道有 2 个字符(一个反斜杠和后面的一个字符)。

这边走:

r'abc\d' 包含 a, b, c, \, d

r'abc\'d' 包含 a, b, c, \, ', d

r'abc\'' 包含 a, b, c, \, '

和:

r'abc\' 包含 a, b, c, \, ' 但现在没有终止引号。

最后一种情况表明,根据文档,现在解析器无法找到结束引号,因为您在上面看到的最后一个引号是字符串的一部分,即反斜杠不能位于此处的最后,因为它将“吞噬”字符串结束字符。

The whole misconception about python's raw strings is that most of people think that backslash (within a raw string) is just a regular character as all others. It is NOT. The key to understand is this python's tutorial sequence:

When an 'r' or 'R' prefix is present, a character following a
backslash is included in the string without change, and all
backslashes are left in the string

So any character following a backslash is part of raw string. Once parser enters a raw string (non Unicode one) and encounters a backslash it knows there are 2 characters (a backslash and a char following it).

This way:

r'abc\d' comprises a, b, c, \, d

r'abc\'d' comprises a, b, c, \, ', d

r'abc\'' comprises a, b, c, \, '

and:

r'abc\' comprises a, b, c, \, ' but there is no terminating quote now.

Last case shows that according to documentation now a parser cannot find closing quote as the last quote you see above is part of the string i.e. backslash cannot be last here as it will 'devour' string closing char.

残花月 2024-07-22 06:45:14

原因在我用粗体突出显示的部分中进行了解释:

字符串引号可以用 a 转义
反斜杠,
但反斜杠仍然存在
在字符串中; 例如,r"\""
由两个组成的有效字符串文字
字符:反斜杠和双斜杠
引用; r"\" 不是有效字符串
文字(即使是原始字符串也不能结束
以奇数个反斜杠)。
具体来说,原始字符串不能结束
在一个反斜杠中(因为
反斜杠会转义以下内容
引用字符)。 另请注意,
单个反斜杠后跟换行符
被解释为这两个字符
作为字符串的一部分,而不是作为一行
继续。

所以原始字符串并不是 100% 原始的,仍然有一些基本的反斜杠处理。

The reason is explained in the part of that section which I highlighted in bold:

String quotes can be escaped with a
backslash,
but the backslash remains
in the string; for example, r"\"" is a
valid string literal consisting of two
characters: a backslash and a double
quote; r"\" is not a valid string
literal (even a raw string cannot end
in an odd number of backslashes).
Specifically, a raw string cannot end
in a single backslash (since the
backslash would escape the following
quote character). Note also that a
single backslash followed by a newline
is interpreted as those two characters
as part of the string, not as a line
continuation.

So raw strings are not 100% raw, there is still some rudimentary backslash-processing.

水水月牙 2024-07-22 06:45:14

它就是这样儿的! 我认为这是 python 中的小缺陷之一!

我不认为有什么充分的理由,但它绝对不是解析; 使用 \ 作为最后一个字符来解析原始字符串非常容易。

问题是,如果你允许 \ 成为原始字符串中的最后一个字符,那么你将无法将 " 放入原始字符串中。似乎 python 允许 " 而不是允许 \ 作为最后一个字符。

不过,这应该不会造成任何麻烦。

如果您担心无法轻松编写 Windows 文件夹路径,例如 c:\mypath\ 那么不用担心,因为您可以将它们表示为 r"C:\mypath",并且,如果您需要附加子目录名称,请不要使用字符串连接来完成,因为无论如何这都不是正确的方法! 使用 os.path.join

>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'

That's the way it is! I see it as one of those small defects in python!

I don't think there's a good reason for it, but it's definitely not parsing; it's really easy to parse raw strings with \ as a last character.

The catch is, if you allow \ to be the last character in a raw string then you won't be able to put " inside a raw string. It seems python went with allowing " instead of allowing \ as the last character.

However, this shouldn't cause any trouble.

If you're worried about not being able to easily write windows folder pathes such as c:\mypath\ then worry not, for, you can represent them as r"C:\mypath", and, if you need to append a subdirectory name, don't do it with string concatenation, for it's not the right way to do it anyway! use os.path.join

>>> import os
>>> os.path.join(r"C:\mypath", "subfolder")
'C:\\mypath\\subfolder'
你与昨日 2024-07-22 06:45:14

为了让你用斜杠结束原始字符串,我建议你可以使用这个技巧:

>>> print r"c:\test"'\\'
test\

它使用Python中字符串文字的隐式串联,并将一个用双引号分隔的字符串与另一个用单引号分隔的字符串连接起来。 丑陋,但有效。

In order for you to end a raw string with a slash I suggest you can use this trick:

>>> print r"c:\test"'\\'
test\

It uses the implicit concatenation of string literals in Python and concatenates one string delimited with double quotes with another that is delimited by single quotes. Ugly, but works.

伪装你 2024-07-22 06:45:14

另一个技巧是使用 chr(92),因为它的计算结果为“\”。

我最近不得不清理一串反斜杠,下面的方法解决了这个问题:

CleanString = DirtyString.replace(chr(92),'')

我意识到这并没有解决“为什么”,但该线程吸引了许多人寻找解决眼前问题的方法。

Another trick is to use chr(92) as it evaluates to "\".

I recently had to clean a string of backslashes and the following did the trick:

CleanString = DirtyString.replace(chr(92),'')

I realize that this does not take care of the "why" but the thread attracts many people looking for a solution to an immediate problem.

且行且努力 2024-07-22 06:45:14

既然原始字符串中允许使用“\”,那么它就不能用来识别字符串文字的结尾。

为什么当遇到第一个“时不停止解析字符串文字呢?”

如果是这种情况,那么字符串文字中将不允许使用 \"。但事实确实如此。

Since \" is allowed inside the raw string. Then it can't be used to identify the end of the string literal.

Why not stop parsing the string literal when you encounter the first "?

If that was the case, then \" wouldn't be allowed inside the string literal. But it is.

单身情人 2024-07-22 06:45:14

r'\' 语法不正确的原因是,尽管字符串表达式是原始的,但使用的引号(单引号或双引号)始终必须转义,否则它们会标记引号的结尾。 所以如果你想在单引号字符串中表达单引号,除了使用 \' 之外没有其他办法。 同样适用于双引号。

但你可以使用:

'\\'

The reason for why r'\' is syntactical incorrect is that although the string expression is raw the used quotes (single or double) always have to be escape since they would mark the end of the quote otherwise. So if you want to express a single quote inside single quoted string, there is no other way than using \'. Same applies for double quotes.

But you could use:

'\\'
江湖彼岸 2024-07-22 06:45:14

另一位删除了答案的用户(不确定他们是否愿意被认可)建议,Python 语言设计者也许可以通过使用相同的解析规则并将转义字符扩展为原始形式来简化解析器设计。 (如果文字被标记为原始)。

我认为这是一个有趣的想法,并将其作为社区维基供后代使用。

Another user who has since deleted their answer (not sure if they'd like to be credited) suggested that the Python language designers may be able to simplify the parser design by using the same parsing rules and expanding escaped characters to raw form as an afterthought (if the literal was marked as raw).

I thought it was an interesting idea and am including it as community wiki for posterity.

撞了怀 2024-07-22 06:45:14

考虑到对 Python 原始字符串末尾奇数个反斜杠的看似任意的限制的混乱,可以公平地说,这是一个设计错误遗留问题问题源于对更简单的解析器的渴望。

虽然解决方法(例如 r'C:\some\path' '\\' 产生(以 Python 表示法:)'C:\\some\\path\\' 或(逐字:)C:\some\path\)很简单,需要它们是违反直觉的。 为了进行比较,让我们看一下 C++ 和 Perl。


C++中,我们可以直接使用原始字符串文字语法

#include <iostream>

int main() {
    std::cout << R"(Hello World!)" << std::endl;
    std::cout << R"(Hello World!\)" << std::endl;
    std::cout << R"(Hello World!\\)" << std::endl;
    std::cout << R"(Hello World!\\\)" << std::endl;
}

得到以下输出:

Hello World!
Hello World!\
Hello World!\\
Hello World!\\\

如果我们想在字符串文字中使用结束分隔符(上面:),我们甚至可以以特殊的方式将语法扩展为 R “delimiterString(quotedMaterial)delimiterString”。 例如,R"asdf(some random delimiters: ( } [ ] { ) < > just for fun)asdf" 生成字符串 some random delimiters: ( } [ ] { ) < > 只是为了在输出中有趣。 (这不是“asdf”的一个很好的用法!)


Perl中,此代码

my $str = q{This is a test.\\};
print ($str);
print ("This is another test.\n");

将输出以下内容:This is a test.\This is another test.

替换第一行 by

my $str = q{This is a test.\};

会导致错误消息: Can't find string terminator "}" Anywhere before EOF at main.pl line 1.

但是,Perl 处理预分隔符 \< /code> 作为转义字符并不会阻止用户在结果字符串末尾出现奇数个反斜杠; 例如,要将 3 个反斜杠 \\\ 放入 $str 的末尾,只需用 6 个反斜杠结束代码即可: my $str = q{This is a test .\\\\\\};. 重要的是,虽然我们需要在输入中使用双反斜杠,但不存在类似 Python 的看似不一致的语法限制。


另一种看待事物的方式是,这 3 种语言使用不同的方式来解决转义字符和结束分隔符之间交互的解析问题:

  • Python:不允许在结束分隔符之前使用奇数个反斜杠; 一个简单的解决方法是 r'stringWithoutFinalBackslash' '\\'
  • C++:基本上允许 1 分隔符之间的所有内容
  • Perl:基本上允许 2 分隔符之间的所有内容,但反斜杠需要始终加倍

1 自定义 delimiterString 本身的长度不能超过 16 个字符,但这并不是一个限制。

² 如果您需要分隔符本身,只需使用 \ 进行转义即可。

然而,为了公平地与 Python 进行比较,我们需要承认 (1) C++ 直到 C++11 才具有这样的字符串文字,并且众所周知难以解析,(2) Perl 更难解析。

Given the confusion around the arbitrary-seeming restriction against an odd number of backslashes at the end of a Python raw-string, it's fair to say that this is a design mistake or legacy issue originating in a desire to have a simpler parser.

While workarounds (such as r'C:\some\path' '\\' yielding (in Python notation:) 'C:\\some\\path\\' or (verbatim:) C:\some\path\) are simple, it's counterintuitive to be needing them. For comparison, let's have a look at C++ and Perl.


In C++, we can straightforwardly use raw string literal syntax

#include <iostream>

int main() {
    std::cout << R"(Hello World!)" << std::endl;
    std::cout << R"(Hello World!\)" << std::endl;
    std::cout << R"(Hello World!\\)" << std::endl;
    std::cout << R"(Hello World!\\\)" << std::endl;
}

to get the following output:

Hello World!
Hello World!\
Hello World!\\
Hello World!\\\

If we want to use the closing delimiter (above: )) within the string literal, we can even extend the syntax in an ad-hoc way to R"delimiterString(quotedMaterial)delimiterString". For example, R"asdf(some random delimiters: ( } [ ] { ) < > just for fun)asdf" produces the string some random delimiters: ( } [ ] { ) < > just for fun in the output. (Ain't that a good use of "asdf"!)


In Perl, this code

my $str = q{This is a test.\\};
print ($str);
print ("This is another test.\n");

will output the following: This is a test.\This is another test.

Replacing the first line by

my $str = q{This is a test.\};

would lead to an error message: Can't find string terminator "}" anywhere before EOF at main.pl line 1.

However, Perl treating a pre-delimiter \ as an escape character doesn't prevent the user from having an odd number of backslashes at the end of the resulting string; eg to place 3 backslashes \\\ into the end of $str, simply end the code with 6 backslashes: my $str = q{This is a test.\\\\\\};. Importantly, while we need to double the backslashes in the input, there is no Python-like inconsistent-seeming syntactic restriction.


Another way of looking at things is that these 3 languages use different ways to address the parsing issue of interaction between escape characters and closing delimiters:

  • Python: disallows an odd number of backslashes just before the closing delimiter; a simple workaround is r'stringWithoutFinalBackslash' '\\'
  • C++: allows essentially¹ everything between the delimiters
  • Perl: allows essentially² everything between the delimiters, but backslashes need to be consistently doubled

¹ The custom delimiterString itself cannot be more than 16 characters long, but that's hardly a limitation.

² If you need the delimiter itself, just escape it with \.

However, to be fair in a comparison to Python, we need to acknowledge that (1) C++ didn't have such string literals until C++11 and is famously hard to parse and (2) Perl is even harder to parse.

影子的影子 2024-07-22 06:45:14

天真的原始字符串 原始

字符串的天真的想法是

如果我在一对引号前面放一个 r,
我可以在引号之间添加任何我想要的内容
这将意味着它本身。

不幸的是,这不起作用,因为如果无论如何
碰巧包含引号,原始字符串将在该点结束。

根本不可能“随心所欲”
在固定分隔符之间,因为其中一些可能看起来像
终止分隔符——无论该分隔符是什么。

现实世界的原始字符串(变体 1)

解决此问题的一种可能方法是:

如果我在一对引号前面放一个 r,
我可以在引号之间添加任何我想要的内容
只要不包含引号
这将意味着它本身。

这一限制听起来很严厉,直到人们认识到这一点
Python 提供的大量引用可以适应大多数情况
有了这个规则。 以下都是有效的 Python 引号:

'
"
'''
"""

有了这么多分隔符的可能性,几乎任何东西
可以工作。
唯一的例外是如果字符串
文字应该包含所有允许的完整列表
Python 引用。

现实世界的原始字符串(变体 2,如 Python 中)

然而,Python 采用了不同的路线,使用
上述规则的扩展版本。
它有效地说明了

如果我在一对引号前面放一个 r,
我可以在引号之间添加任何我想要的内容
只要它不包含引号
这将意味着它本身。
如果我坚持引用,即使这样也是允许的,
但我必须在它前面加一个反斜杠。

因此,从某种意义上说,Python 方法更加自由
比上面的变体 1 — 但它有以下副作用
“mis”将结束引号解释为字符串的一部分
如果字符串的最后一个预期字符是反斜杠。

变体 2 没有帮助:

  • 如果我想在字符串中引用引号,
    但不是反斜杠,这是我的字符串文字的允许版本
    不会是我所需要的。
    然而,鉴于我有三种不同的其他类型的报价
    在我的支配下,我可能只会选择其中之一,然后我的
    问题将会得到解决——所以这不是有问题的情况。
  • 有问题的案例是这样的:
    如果我希望我的字符串以反斜杠结尾,我会不知所措。
    我需要诉诸连接非原始字符串文字
    包含反斜杠。

结论

写完这篇文章后,我和其他几张海报一起去
变体 1 会更容易理解和接受
因此更加Pythonic。 这就是生活!

Naive raw strings

The naive idea of a raw string is

If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
and it will mean itself.

Unfortunately, this does not work, because if the whatever
happens to contain a quote, the raw string would end at that point.

It is simply impossible that I can put "whatever I want"
between fixed delimiters, because some of it could look like
the terminating delimiter -- no matter what that delimiter is.

Real-world raw strings (variant 1)

One possible approach to this problem would be to say

If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.

This restriction sounds harsh, until one recognizes that
Python's large offering of quotes can accommodate most situations
with this rule. The following are all valid Python quotes:

'
"
'''
"""

With this many possibilities for the delimiter, almost anything
can be made to work.
About the only exception would be if the string
literal is supposed to contain a complete list of all allowed
Python quotes.

Real-world raw strings (variant 2, as in Python)

Python, however, takes a different route using
an extended version of the above rule.
It effectively states

If I put an r in front of a pair of quotes,
I can put whatever I want between the quotes
as long as it does not contain a quote
and it will mean itself.
If I insist on including a quote, even that is allowed,
but I have to put a backslash before it.

So the Python approach is, in a sense, even more liberal
than variant 1 above -- but it has the side effect of
"mis"interpreting the closing quote as part of the string
if the last intended character of the string is a backslash.

Variant 2 is not helpful:

  • If I want the quote in my string,
    but not the backslash, the allowed version of my string literal
    will not be what I need.
    However, given the three different other kinds of quotes I have
    at my disposal, I will probably just pick one of those and my
    problem will be solved -- so this is not problematic case.
  • The problematic case is this one:
    If I want my string to end with a backslash, I am at a loss.
    I need to resort to concatenating a non-raw string literal
    containing the backslash.

Conclusion

After writing this, I go with several of the other posters
that variant 1 would have been easier to understand and to accept
and therefore more pythonic. That's life!

卷耳 2024-07-22 06:45:14

从 C 开始,我很清楚单个 \ 作为转义字符,允许您将特殊字符(例如换行符、制表符和引号)放入字符串中。

这确实不允许 \ 作为最后一个字符,因为它会转义 " 并使解析器窒息。但正如前面指出的 \ 是合法的。

Comming from C it pretty clear to me that a single \ works as escape character allowing you to put special characters such as newlines, tabs and quotes into strings.

That does indeed disallow \ as last character since it will escape the " and make the parser choke. But as pointed out earlier \ is legal.

寄人书 2024-07-22 06:45:14

一些提示:

1)如果您需要操作路径的反斜杠,那么标准 python 模块 os.path 是您的朋友。 例如 :

os.path.normpath('c:/folder1/')

2) 如果你想构建带有反斜杠的字符串,但在字符串末尾没有反斜杠,那么原始字符串就是你的朋友(在你的文字之前使用 'r' 前缀细绳)。 例如:

r'\one \two \three'

3) 如果您需要在变量 X 中的字符串前面加上反斜杠,那么您可以这样做:

X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X  # X2 now contains \dummy

4) 如果您需要创建一个末尾带有反斜杠的字符串,则结合提示 2 和 3:

voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name

现在 lilypond_statement 包含"\DisplayLilyMusic \upper"

python 万岁! :)

n3on

some tips :

1) if you need to manipulate backslash for path then standard python module os.path is your friend. for example :

os.path.normpath('c:/folder1/')

2) if you want to build strings with backslash in it BUT without backslash at the END of your string then raw string is your friend (use 'r' prefix before your literal string). for example :

r'\one \two \three'

3) if you need to prefix a string in a variable X with a backslash then you can do this :

X='dummy'
bs=r'\ ' # don't forget the space after backslash or you will get EOL error
X2=bs[0]+X  # X2 now contains \dummy

4) if you need to create a string with a backslash at the end then combine tip 2 and 3 :

voice_name='upper'
lilypond_display=r'\DisplayLilyMusic \ ' # don't forget the space at the end
lilypond_statement=lilypond_display[:-1]+voice_name

now lilypond_statement contains "\DisplayLilyMusic \upper"

long live python ! :)

n3on

小忆控 2024-07-22 06:45:14

尽管有其作用,但即使是原始字符串也不能以单个字符结尾
反斜杠,因为反斜杠转义了下面的引号
字符 - 您仍然必须转义周围的引号字符
将其嵌入字符串中。 也就是说,r"...\" 不是有效字符串
文字 - 原始字符串不能以奇数个反斜杠结尾。
如果需要以单个反斜杠结束原始字符串,可以使用
两个,然后切掉第二个。

Despite its role, even a raw string cannot end in a single
backslash, because the backslash escapes the following quote
character—you still must escape the surrounding quote character to
embed it in the string. That is, r"...\" is not a valid string
literal—a raw string cannot end in an odd number of backslashes.
If you need to end a raw string with a single backslash, you can use
two and slice off the second.

无敌元气妹 2024-07-22 06:45:14

我遇到了这个问题并找到了适合某些情况的部分解决方案。 尽管 python 无法以单个反斜杠结束字符串,但可以将其序列化并保存在文本文件中,并在末尾添加一个反斜杠。 因此,如果您需要在计算机上保存带有单个反斜杠的文本,则有可能:

x = 'a string\\' 
x
'a string\\' 

# Now save it in a text file and it will appear with a single backslash:

with open("my_file.txt", 'w') as h:
    h.write(x)

顺便说一句,如果您使用 python 的 json 库转储它,它就无法与 json 一起使用。

最后,我使用 Spyder,我注意到,如果我通过双击变量资源管理器中的名称来打开蜘蛛的文本编辑器中的变量,它会显示一个反斜杠,并且可以通过这种方式复制到剪贴板(这不是对大多数需求非常有帮助,但也许对某些需求..)。

I encountered this problem and found a partial solution which is good for some cases. Despite python not being able to end a string with a single backslash, it can be serialized and saved in a text file with a single backslash at the end. Therefore if what you need is saving a text with a single backslash on you computer, it is possible:

x = 'a string\\' 
x
'a string\\' 

# Now save it in a text file and it will appear with a single backslash:

with open("my_file.txt", 'w') as h:
    h.write(x)

BTW it is not working with json if you dump it using python's json library.

Finally, I work with Spyder, and I noticed that if I open the variable in spider's text editor by double clicking on its name in the variable explorer, it is presented with a single backslash and can be copied to the clipboard that way (it's not very helpful for most needs but maybe for some..).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文