能够在 Eclipse 中运行带有 Unicode 字符串的 Python 代码,但在通过命令行或 Idle 运行时出现 UnicodeEncodeError。
我已经经历过很多次了,我将在 Eclipse (PyDev) 中解码/编码一些 Unicode 字符串,并且它运行良好并且符合我的预期,但是当我从命令行启动相同的脚本时(例如)相反,我会收到编码错误。
对此有什么简单的解释吗? Eclipse 是否对 Unicode 做了一些事情/以某种不同的方式操作它?
编辑:
示例:
value = u'\u2019'.decode( 'utf-8', 'ignore' )
return value
这在 Eclipse (PyDev) 中有效,但如果我在空闲或命令行上运行它则无效。
UnicodeEncodeError:“ascii”编解码器无法对位置 135 中的字符 u'\u2019' 进行编码:序号不在范围(128)
I've experienced this a lot, where I'll decode/encode some string of Unicode in Eclipse (PyDev), and it runs fine and how I expected, but then when I launch the same script from the command line (for example) instead, I'll get encoding errors.
Is there any simple explanation for this? Is Eclipse doing something to the Unicode/manipulating it in some different way?
EDIT:
Example:
value = u'\u2019'.decode( 'utf-8', 'ignore' )
return value
This works in Eclipse (PyDev) but not if I run it in Idle or on the command line.
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 135: ordinal not in range(128)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
只是想补充一下它在 PyDev 上工作的原因:它有一个特殊的 sitecustomize,可以通过 sys.setdefaultencoding 自定义 python 以使用 PyDev 控制台的编码。
请注意,bobince 的响应是正确的,如果您有一个 unicode 字符串,则必须使用encode() 方法将其转换为正确的字符串(如果您有一个字符串并希望将其转换为统一码)。
Just wanted to add why it worked on PyDev: it has a special sitecustomize that'll customize python through sys.setdefaultencoding to use the encoding of the PyDev console.
Note that the response from bobince is correct, if you have a unicode string, you have to use the encode() method to transform it into a proper string (you'd use decode if you had a string and wanted to transform it into a unicode).
字节字符串被解码为 Unicode 字符串。
Unicode 字符串被编码为字节字符串。
因此,如果您说
someunicodestring.decode
,它会尝试将 Unicode 字符串强制转换为字节字符串,以便能够对其进行解码(返回 Unicode!)。作为隐式转换,此编码步骤将满足默认编码,不同环境之间可能有所不同,并且很可能是“安全”值ascii
,这肯定会产生您提到的错误ASCII 不能包含字符 U+2019。依赖默认编码几乎从来都不是一个好主意。因此尝试
解码
Unicode 字符串是没有意义的。我很确定您的意思是:(ignore
对于编码为 UTF-8 来说是多余的,因为没有该编码无法表示的字符。)Byte strings are DECODED into Unicode strings.
Unicode strings are ENCODED into byte strings.
So if you say
someunicodestring.decode
, it tries to coerce the Unicode string to a byte string, in order to be able to decode it (back to Unicode!). Being an implicit conversion, this encoding step will plump for the default encoding, which may differ between different environments, and is likely to be the ‘safe’ valueascii
, which will certainly produce the error you mention as ASCII can't contain the character U+2019. It's almost never a good idea to rely on the default encoding.So it doesn't make sense to try to
decode
a Unicode string. I'm pretty sure you mean:(
ignore
is redundant for encoding to UTF-8 as there is no character that this encoding can't represent.)