收到未以“u”为前缀的 python 字符串文字的警告
要遵循 Python 中 Unicode 的最佳实践,您应该在所有字符串文字前面加上“u”前缀。是否有可用的工具(最好与 PyDev 兼容)在您忘记时发出警告?
To follow best practices for Unicode in python, you should prefix all string literals of characters with 'u'. Is there any tool available (preferably PyDev compatible) that warns if you forget it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不,不是真的。
您应该为字符串的文字添加
u
前缀。但并非所有字符串都是字符串。当您与基于字节的组件(例如网络服务或二进制文件)交谈时,您需要使用字节字符串。例如。想要尝试将 Unicode 字符串写入 PNG 文件吗?不明智。想要对字符串
Y2Fm6Q==
进行 Base64 解码吗?你不能在这里合理地使用 Unicode 字符串,base64 是明确的字节。当然,Python 通常会让您在需要字节字符串的地方传递 unicode 字符串,但只能自动编码为 ASCII。如果字符串包含非 ASCII 字符,您将得到
UnicodeError
,就像您在需要 unicode 的地方使用字节一样。 “Unicode 是对的,字节是错的”是一个具有破坏性的神话。需要对两种字符串进行操作。如果您担心向 Python 3 的过渡,您当然应该将字符串标记为
u''
,但您还应该将显式字节字符串标记为b''
。无关紧要的字符串可以保留为''
并让它们在 Python 3 上从字节字符串转换为 unicode 字符串。在很多情况下,Python 2 使用字节而 Python 3在适合执行此操作的地方使用 Unicode。但在很多情况下,您确实需要谈论字节,并且将其转换为 Python 3 作为 unicode 会导致问题。(唯一的问题是
b''
语法需要 Python 2.6 或更高版本,因此使用它将使您与早期版本不兼容。)No, not really.
You should prefix literals for strings of characters with
u
. But not all strings are strings of characters. When you are talking to components that are byte based, like network services, or binary files, you need to be using byte strings.eg. Want to try to write a Unicode string into a PNG file? Not sensible. Want to base64-decode the string
Y2Fm6Q==
? You can't reasonably use a Unicode string here, base64 is explicitly bytes.Sure, Python will often let you get away with passing a unicode string where a byte string is expected, but only by automatically encoding to ASCII. If the string contains non-ASCII characters you going to get
UnicodeError
just as surely as if you'd used bytes where unicode was expected. “Unicode is right, bytes are wrong” is a damaging myth. Manipulation for both kinds of strings are required.If you are concerned about the transition to Python 3, you should certainly mark up your character strings as
u''
, but you should then also mark up your explicitly-bytes strings asb''
. Strings where it doesn't matter you can leave as''
and let them get converted from byte strings to unicode strings on Python 3. There are lots of cases where Python 2 used to use bytes and Python 3 uses Unicode where it is appropriate to do this. But there are still plenty of cases where you do really need to be talking bytes, and having that converted to Python 3 as unicode will cause problems.(The only problem with this is that
b''
syntax requires Python 2.6 or later, so using it will make you incompatible with earlier versions.)您可能希望通过使用
parser
或dis
内置模块解析 Python 源代码来编写这样的警告生成器工具。您还可以考虑将此类功能添加到 pylint 中。You might want to write a such a warnging-generator tool by parsing Python source code using the
parser
or thedis
built-in modules. You may also consider adding such a feature to pylint.KennyTM 的评论应该作为答案发布:
这个未来的声明可以在 Python 2.6 和 2.7 中使用,并启用 Python 3 的字符串语法,以便无前缀的字符串文字是 Unicode 字符串,而字节数组需要
b
前缀。KennyTM's comment should be posted as an answer:
This future declaration can be used in Python 2.6 and 2.7 and enables Python 3's string syntax so that unprefixed string literals are Unicode strings and byte arrays require a
b
prefix.