Python 的 os.path 被希伯来文件名阻塞

发布于 2024-07-13 17:33:28 字数 816 浏览 10 评论 0原文

我正在编写一个必须移动一些文件的脚本，但不幸的是，它似乎不太适合国际化。当我有以希伯来语命名的文件时，就会出现问题。以下是目录内容的屏幕截图：

现在考虑此代码遍历此目录中的文件：

files = os.listdir('test_source')

for f in files:
    pf = os.path.join('test_source', f)
    print pf, os.path.exists(pf)

输出为：

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False

请注意 os.path.exists 是如何认为希伯来语命名的文件根本不存在的？我怎样才能解决这个问题？

Windows XP Home SP2 上的 ActivePython 2.5.2

原文

I'm writing a script that has to move some file around, but unfortunately it doesn't seem os.path plays with internationalization very well. When I have files named in Hebrew, there are problems. Here's a screenshot of the contents of a directory:

_{(source: thegreenplace.net)}

Now consider this code that goes over the files in this directory:

files = os.listdir('test_source')

for f in files:
    pf = os.path.join('test_source', f)
    print pf, os.path.exists(pf)

The output is:

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt False

Notice how os.path.exists thinks that the hebrew-named file doesn't even exist?
How can I fix this?

ActivePython 2.5.2 on Windows XP Home SP2

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

柳若烟 2024-07-20 17:33:28

嗯，经过一些挖掘后，似乎在向 os.listdir 提供 unicode 字符串时，这有点作品：

files = os.listdir(u'test_source')

for f in files:

    pf = os.path.join(u'test_source', f)
    print pf.encode('ascii', 'replace'), os.path.exists(pf)

===>

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True

这里有一些重要的观察结果：

Windows XP（像所有 NT 衍生品一样）将所有文件名存储在 unicode
os.listdir 中（以及类似的函数，例如 os.walk) 应传递一个 unicode 字符串，以便正确使用 unicode 路径。这是上述链接的引用：

os.listdir()，返回文件名，
提出了一个问题：它是否应该返回
文件名的 Unicode 版本，或
它应该返回 8 位字符串吗
包含编码版本？
os.listdir() 会同时执行这两项操作，具体取决于
关于您是否提供了目录
8 位字符串或 Unicode 形式的路径
细绳。如果你传递一个 Unicode 字符串
作为路径，文件名将被解码
使用文件系统的编码和
Unicode 字符串列表将是
返回，同时传递 8 位路径
将返回 8 位版本
文件名。

最后，print 需要一个 ascii 字符串，而不是 unicode，因此路径必须编码为 ascii。

Hmm, after some digging it appears that when supplying os.listdir a unicode string, this kinda works:

files = os.listdir(u'test_source')

for f in files:

    pf = os.path.join(u'test_source', f)
    print pf.encode('ascii', 'replace'), os.path.exists(pf)

===>

test_source\ex True
test_source\joe True
test_source\mie.txt True
test_source\__()'''.txt True
test_source\????.txt True

Some important observations here:

Windows XP (like all NT derivatives) stores all filenames in unicode
os.listdir (and similar functions, like os.walk) should be passed a unicode string in order to work correctly with unicode paths. Here's a quote from the aforementioned link:

os.listdir(), which returns filenames,
raises an issue: should it return the
Unicode version of filenames, or
should it return 8-bit strings
containing the encoded versions?
os.listdir() will do both, depending
on whether you provided the directory
path as an 8-bit string or a Unicode
string. If you pass a Unicode string
as the path, filenames will be decoded
using the filesystem's encoding and a
list of Unicode strings will be
returned, while passing an 8-bit path
will return the 8-bit versions of the
filenames.

And lastly, print wants an ascii string, not unicode, so the path has to be encoded to ascii.

回复收藏 0 原文

忘羡 2024-07-20 17:33:28

它看起来像是 Unicode 与 ASCII 问题 - os.listdir 返回 ASCII 字符串列表。

编辑：我在 Python 3.0 和 XP SP2 上尝试过，os.listdir 只是省略了希伯来语文件名，而不是完全列出它们。

根据文档，这意味着它无法解码：

请注意，当 os.listdir() 返回一个
不能的字符串、文件名列表
正确解码被省略
而不是引发 UnicodeError。

回复收藏 0 原文

柠栀 2024-07-20 17:33:28

它在 OS X 上使用 Python 2.5.1 的效果就像一个魅力：

subdir/bar.txt True
subdir/foo.txt True
subdir/עִבְרִית.txt True

也许这意味着这与 Windows XP 有某种关系？

编辑：我还尝试使用 unicode 字符串来更好地模仿 Windows 行为：

for f in os.listdir(u'subdir'):
  pf = os.path.join(u'subdir', f)
  print pf, os.path.exists(pf)

subdir/bar.txt True
subdir/foo.txt True
subdir/עִבְרִית.txt True

在终端（os x stock 命令提示符应用程序）中。使用IDLE它仍然可以工作，但没有正确打印文件名。为了确保它确实是 unicode，我检查了：

>>>os.listdir(u'listdir')[2]
u'\u05e2\u05b4\u05d1\u05b0\u05e8\u05b4\u05d9\u05ea.txt'

It works like a charm using Python 2.5.1 on OS X:

subdir/bar.txt True
subdir/foo.txt True
subdir/עִבְרִית.txt True

Maybe that means that this has to do with Windows XP somehow?

EDIT: I also tried with unicode strings to try mimic the Windows behaviour better:

for f in os.listdir(u'subdir'):
  pf = os.path.join(u'subdir', f)
  print pf, os.path.exists(pf)

subdir/bar.txt True
subdir/foo.txt True
subdir/עִבְרִית.txt True

In the Terminal (os x stock command prompt app) that is. Using IDLE it still worked but didn't print the filename correctly. To make sure it really is unicode there I checked:

>>>os.listdir(u'listdir')[2]
u'\u05e2\u05b4\u05d1\u05b0\u05e8\u05b4\u05d9\u05ea.txt'

回复收藏 0 原文

酒中人 2024-07-20 17:33:28

问号是当 Unicode 字符无法用特定编码表示时显示的或多或少通用的符号。 Windows 下的终端或交互式会话可能使用 ASCII 或 ISO-8859-1 等。所以实际的字符串是 unicode，但它被翻译成 ???? 当打印到终端时。这就是它适用于使用 OSX 的 PEZ 的原因。

回复收藏 0 原文

~没有更多了~

关于作者

海拔太高太耀眼

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

Python 的 os.path 被希伯来文件名阻塞

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

Python 的 os.path 被希伯来文件名阻塞

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。