如何在 PHP 中迭代非英文文件名
我有一个目录,其中包含几个文件,其中许多文件的名称都不是英文。我在 Windows 7 中使用 PHP。
我想使用 PHP 列出文件名及其内容。
目前我正在使用 DirectoryIterator
和 file_get_contents
。这适用于英文文件名,但不适用于非英文(中文)文件名。
例如,我的文件名类似于“एक और प्रोब्लेम.eml”、“hello 鶨鹙鵨鶣鎹蓥.eml”。
DirectoryIterator
无法使用->getFilename()
获取文件名file_get_contents
即使我硬编码文件名也无法打开在其参数中。
我该怎么做呢?
I have a directory which contains several files, many of which has non-english name. I am using PHP in Windows 7.
I want to list the filename and their content using PHP.
Currently I am using DirectoryIterator
and file_get_contents
. This works for English files names but not for non-English (chinese) file names.
For example, I have filenames like "एक और प्रोब्लेम.eml", "hello 鶨鶖鵨鶣鎹鎣.eml".
DirectoryIterator
is not able to get the filename using->getFilename()
file_get_contents
is also not able to open even if I hard code the filename in its parameter.
How can I do it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是不可能的。这是 PHP 的限制。 PHP 使用 Windows API 的多字节版本;您仅限于代码页可以表示的字符。
请参阅此答案。
目录内容:
测试文件内容:
测试文件结果:
调试器输出:
调用堆栈(PHP 5.3.0):
这真的是一个问号吗?
是的!这是第 63 号角色。
This is not possible. It's a limitation of PHP. PHP uses the multibyte versions of Windows APIs; you're limited to the characters your codepage can represent.
See this answer.
Directory contents:
Test file contents:
Test file results:
Debugger output:
Call stack (PHP 5.3.0):
Is it really a question mark?
Yes! It's character #63.
简短回复:
在Windows下,不能用PHP访问任意文件名; 格式”面板和“管理”选项卡面板“非 Unicode 程序的语言”)。
您只能使用那些名称可以用当前选定的“代码页”表示的文件名(请参阅“区域和语言选项”、 “ :
Windows 从 Win2000 开始使用 UTF-16 进行文件编码,但 PHP 作为“非 Unicode 感知程序”与底层文件系统进行通信,这意味着存在一个从 PHP 字符串转换为当前的“代码页表”。 UTF-16 字符串,反之亦然。在 PHP 中,可以通过 setlocale() 以“language_country.codepage”的形式检索当前代码页,例如:
setlocale(LC_CTYPE, 0) ==> “english_United States.1252”
其中 1252 是当前从控制面板中选择的 Windows 代码页表,从文件系统检索的文件名必须使用该代码页进行编码;事情变得更加复杂,因为 UTF-16 文件名使用“最适合的代码页”转换为 PHP 字符串,这是实际字符/单词的近似表示,因此您不能信任文件。从文件系统检索的名称和路径,因为它们可能被任意破坏。
参考文献:
http://en.wikipedia.org/wiki/Windows_code_page
什么是“Windows 代码页”。
https://bugs.php.net/bug.php?id=47096
有关此问题的更多详细信息。
Short reply:
Under Windows, you cannot access arbitrary file names with PHP; you are limited to those file names whose name can be represented with the currently selected "code page" (see Regional and Language Options", "Format" panel and "Administrative" tab panel "Language for non-Unicode programs").
Longer reply:
Windows uses UTF-16 for file encoding since Win2000, but PHP communicate with the underlying file system as a "non-Unicode aware program". This means that there is a current "code page table" that tranlates from PHP strings to UTF-16 strings and vice-versa. From PHP the current code page can be retrieved by setlocale() in the form "language_country.codepage", for example:
setlocale(LC_CTYPE, 0) ==> "english_United States.1252"
where 1252 is the Windows code page table currently selected from the control panel; file names retrieved from the file system are encoded using that code page; file names generated from PHP must be encoded according to that code page. Things are even more complicated by the fact that UTF-16 file names are traslated to PHP strings using the "best-fit code page", that is an approxymated representation of the actual characters/words, so you cannot trust on file names and paths retrieved from the file system as they might be arbitrarily mangled.
References:
http://en.wikipedia.org/wiki/Windows_code_page
What "Windows code pages" are.
https://bugs.php.net/bug.php?id=47096
More details about this issue.
发现文件我有这个脚本:
这将成功找到文件:鶨鹙鵨鶣鎹蓥
不过,我在 Linux 发行版上尝试过。
要阅读它,您可以使用:
逐行:
Do discover the files I have this script:
This will succesfully find the file: 鶨鶖鵨鶣鎹鎣
I tried it here on a Linux distro though..
to read it you use:
Line by line: