能够在 Eclipse 中运行带有 Unicode 字符串的 Python 代码，但在通过命令行或 Idle 运行时出现 UnicodeEncodeError。

发布于 2024-11-30 05:45:31 字数 437 浏览 4 评论 0原文

我已经经历过很多次了，我将在 Eclipse (PyDev) 中解码/编码一些 Unicode 字符串，并且它运行良好并且符合我的预期，但是当我从命令行启动相同的脚本时（例如）相反，我会收到编码错误。

对此有什么简单的解释吗？ Eclipse 是否对 Unicode 做了一些事情/以某种不同的方式操作它？

编辑：

示例：

value = u'\u2019'.decode( 'utf-8', 'ignore' )
return value

这在 Eclipse (PyDev) 中有效，但如果我在空闲或命令行上运行它则无效。

UnicodeEncodeError：“ascii”编解码器无法对位置 135 中的字符 u'\u2019' 进行编码：序号不在范围（128）

原文

I've experienced this a lot, where I'll decode/encode some string of Unicode in Eclipse (PyDev), and it runs fine and how I expected, but then when I launch the same script from the command line (for example) instead, I'll get encoding errors.

Is there any simple explanation for this? Is Eclipse doing something to the Unicode/manipulating it in some different way?

EDIT:

Example:

value = u'\u2019'.decode( 'utf-8', 'ignore' )
return value

This works in Eclipse (PyDev) but not if I run it in Idle or on the command line.

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 135: ordinal not in range(128)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时光病人 2024-12-07 05:45:31

只是想补充一下它在 PyDev 上工作的原因：它有一个特殊的 sitecustomize，可以通过 sys.setdefaultencoding 自定义 python 以使用 PyDev 控制台的编码。

请注意，bobince 的响应是正确的，如果您有一个 unicode 字符串，则必须使用encode() 方法将其转换为正确的字符串（如果您有一个字符串并希望将其转换为统一码）。

回复收藏 0 原文

孤独陪着我 2024-12-07 05:45:31

value = u'\u2019'.decode( 'utf-8', 'ignore' )

字节字符串被解码为 Unicode 字符串。

Unicode 字符串被编码为字节字符串。

因此，如果您说 someunicodestring.decode，它会尝试将 Unicode 字符串强制转换为字节字符串，以便能够对其进行解码（返回 Unicode！）。作为隐式转换，此编码步骤将满足默认编码，不同环境之间可能有所不同，并且很可能是“安全”值 ascii，这肯定会产生您提到的错误ASCII 不能包含字符 U+2019。依赖默认编码几乎从来都不是一个好主意。

因此尝试解码 Unicode 字符串是没有意义的。我很确定您的意思是：（

value = u'\u2019'.encode('utf-8')

ignore 对于编码为 UTF-8 来说是多余的，因为没有该编码无法表示的字符。）

value = u'\u2019'.decode( 'utf-8', 'ignore' )

Byte strings are DECODED into Unicode strings.

Unicode strings are ENCODED into byte strings.

So if you say someunicodestring.decode, it tries to coerce the Unicode string to a byte string, in order to be able to decode it (back to Unicode!). Being an implicit conversion, this encoding step will plump for the default encoding, which may differ between different environments, and is likely to be the ‘safe’ value ascii, which will certainly produce the error you mention as ASCII can't contain the character U+2019. It's almost never a good idea to rely on the default encoding.

So it doesn't make sense to try to decode a Unicode string. I'm pretty sure you mean: