ipython 和 python 处理我的字符串的方式不同,为什么?
在 python (2.7.1) 中:
>>> x = u'$€%'
>>> x.find('%')
2
>>> len(x)
3
而在 ipython 中:
>>> x = u'$€%'
>>> x.find('%')
4
>>> len(x)
5
这是怎么回事?
编辑:包括从下面的评论中请求的附加信息
ipython
>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\xe2\x82\xac%'
>>> print x
$â¬%
>>> len(x)
5
python
>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\u20ac%'
>>> print x
$€%
>>> len(x)
3
In python (2.7.1):
>>> x = u'$€%'
>>> x.find('%')
2
>>> len(x)
3
Whereas in ipython:
>>> x = u'$€%'
>>> x.find('%')
4
>>> len(x)
5
What's going on here?
edit: including the additional info requested from the comments below
ipython
>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\xe2\x82\xac%'
>>> print x
$â¬%
>>> len(x)
5
python
>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\u20ac%'
>>> print x
$€%
>>> len(x)
3
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
@nye17 正式地调用
setdefaultencoding()
并不是一个好主意(由于某种原因,它在首次使用后会从 sys 中删除)。一个常见的罪魁祸首是 gtk,它会导致各种问题,因此如果 IPython 导入了 gtk,sys.getdefaultencoding()
将返回 utf8。 IPython 本身不设置默认编码。@wim 我可以问一下你正在使用什么版本的 IPython 吗? 0.11 中的重大修改的一部分是修复了许多 unicode 错误,但更多的错误确实出现了(现在主要是在 Windows 上)。
我在 IPython 0.11 中运行了你的测试用例,IPython 和 Python 的行为确实看起来是相同的,所以我认为这个错误已经修复了。
相关值:
至于解释,本质上IPython无法识别输入可能是unicode。在 IPython 0.10 中,不考虑多字节 utf8 输入,因此每个字节 = 1 个字符,您可以通过以下方式查看:
然而,应该发生什么,以及发生了什么在 0.11 中,是
y == x.decode(sys.stdin.encoding)
,而不是repr(y) == 'u'+repr(x)
。@nye17 It's officially not a good idea to ever call
setdefaultencoding()
(it is removed from sys after first use for a reason). One common culprit is gtk, which causes all kinds of problems, so if IPython has imported gtk,sys.getdefaultencoding()
will return utf8. IPython does not set the default encoding itself.@wim can I ask what version of IPython you are using? Part of the major overhaul in 0.11 was fixing many unicode bugs, but more do crop up (mostly on Windows, now).
I ran your test case in IPython 0.11, and the behavior of IPython and Python do appear to be the same, so I think this bug is fixed.
Relevant values:
As for an explanation, essentially IPython didn't recognize that input could be unicode. In IPython 0.10, the multibyte utf8 input is not being respected, so each byte = 1 character, which you can see with:
Whereas, what should happen, and what does happen in 0.11, is that
y == x.decode(sys.stdin.encoding)
, notrepr(y) == 'u'+repr(x)
.如果你这样做,
我认为你会在 python 和 ipython 中得到不同的结果,可能一个是
ascii
,另一个是utf-8
,所以这应该只是一个问题每个人都选择哪种默认编码。您可以做的另一个测试是输入以下内容以将其作为默认区域设置,
然后尝试问题中的
x
测试。if you do
I think you will get different results in python an ipython, possible one
ascii
, and the other one beingutf-8
, so it should only be a matter of which default encoding each one is choosing.The other test you can do is to type the following to enfore it as your default locale,
then try the test of
x
in your question.