Python 的 os.path 被希伯来文件名阻塞
我正在编写一个必须移动一些文件的脚本,但不幸的是,它似乎不太适合国际化。 当我有以希伯来语命名的文件时,就会出现问题。 以下是目录内容的屏幕截图:
(来源:thegreenplace.net)
现在考虑此代码遍历此目录中的文件:
files = os.listdir('test_source')
for f in files:
pf = os.path.join('test_source', f)
print pf, os.path.exists(pf)
输出为:
test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False
请注意 os.path.exists 是如何认为希伯来语命名的文件根本不存在的? 我怎样才能解决这个问题?
Windows XP Home SP2 上的 ActivePython 2.5.2
I'm writing a script that has to move some file around, but unfortunately it doesn't seem os.path
plays with internationalization very well. When I have files named in Hebrew, there are problems. Here's a screenshot of the contents of a directory:
(source: thegreenplace.net)
Now consider this code that goes over the files in this directory:
files = os.listdir('test_source')
for f in files:
pf = os.path.join('test_source', f)
print pf, os.path.exists(pf)
The output is:
test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False
Notice how os.path.exists
thinks that the hebrew-named file doesn't even exist?
How can I fix this?
ActivePython 2.5.2 on Windows XP Home SP2
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
嗯,经过一些挖掘后,似乎在向 os.listdir 提供 unicode 字符串时,这有点作品:
===>
这里有一些重要的观察结果:
os.listdir
中(以及类似的函数,例如os.walk) 应传递一个 unicode 字符串,以便正确使用 unicode 路径。 这是上述链接的引用:
print
需要一个 ascii 字符串,而不是 unicode,因此路径必须编码为 ascii。Hmm, after some digging it appears that when supplying os.listdir a unicode string, this kinda works:
===>
Some important observations here:
os.listdir
(and similar functions, likeos.walk
) should be passed a unicode string in order to work correctly with unicode paths. Here's a quote from the aforementioned link:print
wants an ascii string, not unicode, so the path has to be encoded to ascii.它看起来像是 Unicode 与 ASCII 问题 -
os.listdir
返回 ASCII 字符串列表。编辑:我在 Python 3.0 和 XP SP2 上尝试过,
os.listdir
只是省略了希伯来语文件名,而不是完全列出它们。根据文档,这意味着它无法解码:
It looks like a Unicode vs ASCII issue -
os.listdir
is returning a list of ASCII strings.Edit: I tried it on Python 3.0, also on XP SP2, and
os.listdir
simply omitted the Hebrew filenames instead of listing them at all.According to the docs, this means it was unable to decode it:
它在 OS X 上使用 Python 2.5.1 的效果就像一个魅力:
也许这意味着这与 Windows XP 有某种关系?
编辑:我还尝试使用 unicode 字符串来更好地模仿 Windows 行为:
在终端(os x stock 命令提示符应用程序)中。 使用IDLE它仍然可以工作,但没有正确打印文件名。 为了确保它确实是 unicode,我检查了:
It works like a charm using Python 2.5.1 on OS X:
Maybe that means that this has to do with Windows XP somehow?
EDIT: I also tried with unicode strings to try mimic the Windows behaviour better:
In the Terminal (os x stock command prompt app) that is. Using IDLE it still worked but didn't print the filename correctly. To make sure it really is unicode there I checked:
问号是当 Unicode 字符无法用特定编码表示时显示的或多或少通用的符号。 Windows 下的终端或交互式会话可能使用 ASCII 或 ISO-8859-1 等。 所以实际的字符串是 unicode,但它被翻译成 ???? 当打印到终端时。 这就是它适用于使用 OSX 的 PEZ 的原因。
A question mark is the more or less universal symbol displayed when a unicode character can't be represented in a specific encoding. Your terminal or interactive session under Windows is probably using ASCII or ISO-8859-1 or something. So the actual string is unicode, but it gets translated to ???? when printed to the terminal. That's why it works for PEZ, using OSX.