如何在 PHP 中迭代非英文文件名

发布于 2025-01-06 07:25:07 字数 428 浏览 4 评论 0原文

我有一个目录,其中包含几个文件,其中许多文件的名称都不是英文。我在 Windows 7 中使用 PHP。

我想使用 PHP 列出文件名及其内容。

目前我正在使用 DirectoryIteratorfile_get_contents。这适用于英文文件名,但不适用于非英文(中文)文件名。

例如,我的文件名类似于“एक और प्रोब्लेम.eml”、“hello 鶨鹙鵨鶣鎹蓥.eml”。

  1. DirectoryIterator 无法使用 ->getFilename() 获取文件名
  2. file_get_contents 即使我硬编码文件名也无法打开在其参数中。

我该怎么做呢?

I have a directory which contains several files, many of which has non-english name. I am using PHP in Windows 7.

I want to list the filename and their content using PHP.

Currently I am using DirectoryIterator and file_get_contents. This works for English files names but not for non-English (chinese) file names.

For example, I have filenames like "एक और प्रोब्लेम.eml", "hello 鶨鶖鵨鶣鎹鎣.eml".

  1. DirectoryIterator is not able to get the filename using ->getFilename()
  2. file_get_contents is also not able to open even if I hard code the filename in its parameter.

How can I do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

春庭雪 2025-01-13 07:25:07

这是不可能的。这是 PHP 的限制。 PHP 使用 Windows API 的多字节版本;您仅限于代码页可以表示的字符。

请参阅此答案

目录内容:

D:\Users\Cataphract\Desktop\teste2>dir
 Volume in drive D is GRANDEDISCO
 Volume Serial Number is 945F-DB89

 Directory of D:\Users\Cataphract\Desktop\teste2

01-06-2010  17:16              .
01-06-2010  17:16              ..
01-06-2010  17:15                 0 coptic small letter shima follows ϭ.txt
01-06-2010  17:18                86 teste.php
               2 File(s)             86 bytes
               2 Dir(s)  12.178.505.728 bytes free

测试文件内容:

<?php
exec('pause');
foreach (new DirectoryIterator(".") as $v) {
    echo $v."\n";
}

测试文件结果:

.
..
coptic small letter shima follows ?.txt
teste.php

调试器输出:

调用堆栈(PHP 5.3.0):

>   php5ts_debug.dll!readdir_r(DIR * dp=0x02f94068, dirent * entry=0x00a7e7cc, dirent * * result=0x00a7e7c0)  Line 80   C
    php5ts_debug.dll!php_plain_files_dirstream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int count=260, void * * * tsrm_ls=0x028a15c0)  Line 820 + 0x17 bytes   C
    php5ts_debug.dll!_php_stream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int size=260, void * * * tsrm_ls=0x028a15c0)  Line 603 + 0x1c bytes  C
    php5ts_debug.dll!_php_stream_readdir(_php_stream * dirstream=0x02b94280, _php_stream_dirent * ent=0x02b9437c, void * * * tsrm_ls=0x028a15c0)  Line 1806 + 0x16 bytes    C
    php5ts_debug.dll!spl_filesystem_dir_read(_spl_filesystem_object * intern=0x02b94340, void * * * tsrm_ls=0x028a15c0)  Line 199 + 0x20 bytes  C
    php5ts_debug.dll!spl_filesystem_dir_open(_spl_filesystem_object * intern=0x02b94340, char * path=0x02b957f0, void * * * tsrm_ls=0x028a15c0)  Line 238 + 0xd bytes   C
    php5ts_debug.dll!spl_filesystem_object_construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0, long ctor_flags=0)  Line 645 + 0x11 bytes  C
    php5ts_debug.dll!zim_spl_DirectoryIterator___construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0)  Line 658 + 0x1f bytes   C
    php5ts_debug.dll!zend_do_fcall_common_helper_SPEC(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0)  Line 313 + 0x78 bytes   C
    php5ts_debug.dll!ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0)  Line 423  C
    php5ts_debug.dll!execute(_zend_op_array * op_array=0x02b93888, void * * * tsrm_ls=0x028a15c0)  Line 104 + 0x11 bytes    C
    php5ts_debug.dll!zend_execute_scripts(int type=8, void * * * tsrm_ls=0x028a15c0, _zval_struct * * retval=0x00000000, int file_count=3, ...)  Line 1188 + 0x21 bytes C
    php5ts_debug.dll!php_execute_script(_zend_file_handle * primary_file=0x00a7fad4, void * * * tsrm_ls=0x028a15c0)  Line 2196 + 0x1b bytes C
    php.exe!main(int argc=2, char * * argv=0x028a14c0)  Line 1188 + 0x13 bytes  C
    php.exe!__tmainCRTStartup()  Line 555 + 0x19 bytes  C
    php.exe!mainCRTStartup()  Line 371  C

这真的是一个问号吗?

dp->fileinfo
{dwFileAttributes=32 ftCreationTime={...} ftLastAccessTime={...} ...}
    dwFileAttributes: 32
    ftCreationTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
    ftLastAccessTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
    ftLastWriteTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
    nFileSizeHigh: 0
    nFileSizeLow: 0
    dwReserved0: 3435973836
    dwReserved1: 3435973836
    cFileName: 0x02f9409c "coptic small letter shima follows ?.txt"
    cAlternateFileName: 0x02f941a0 "COPTIC~1.TXT"
dp->fileinfo.cFileName[34]
63 '?'

是的!这是第 63 号角色。

This is not possible. It's a limitation of PHP. PHP uses the multibyte versions of Windows APIs; you're limited to the characters your codepage can represent.

See this answer.

Directory contents:

D:\Users\Cataphract\Desktop\teste2>dir
 Volume in drive D is GRANDEDISCO
 Volume Serial Number is 945F-DB89

 Directory of D:\Users\Cataphract\Desktop\teste2

01-06-2010  17:16              .
01-06-2010  17:16              ..
01-06-2010  17:15                 0 coptic small letter shima follows ϭ.txt
01-06-2010  17:18                86 teste.php
               2 File(s)             86 bytes
               2 Dir(s)  12.178.505.728 bytes free

Test file contents:

<?php
exec('pause');
foreach (new DirectoryIterator(".") as $v) {
    echo $v."\n";
}

Test file results:

.
..
coptic small letter shima follows ?.txt
teste.php

Debugger output:

Call stack (PHP 5.3.0):

>   php5ts_debug.dll!readdir_r(DIR * dp=0x02f94068, dirent * entry=0x00a7e7cc, dirent * * result=0x00a7e7c0)  Line 80   C
    php5ts_debug.dll!php_plain_files_dirstream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int count=260, void * * * tsrm_ls=0x028a15c0)  Line 820 + 0x17 bytes   C
    php5ts_debug.dll!_php_stream_read(_php_stream * stream=0x02b94280, char * buf=0x02b9437c, unsigned int size=260, void * * * tsrm_ls=0x028a15c0)  Line 603 + 0x1c bytes  C
    php5ts_debug.dll!_php_stream_readdir(_php_stream * dirstream=0x02b94280, _php_stream_dirent * ent=0x02b9437c, void * * * tsrm_ls=0x028a15c0)  Line 1806 + 0x16 bytes    C
    php5ts_debug.dll!spl_filesystem_dir_read(_spl_filesystem_object * intern=0x02b94340, void * * * tsrm_ls=0x028a15c0)  Line 199 + 0x20 bytes  C
    php5ts_debug.dll!spl_filesystem_dir_open(_spl_filesystem_object * intern=0x02b94340, char * path=0x02b957f0, void * * * tsrm_ls=0x028a15c0)  Line 238 + 0xd bytes   C
    php5ts_debug.dll!spl_filesystem_object_construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0, long ctor_flags=0)  Line 645 + 0x11 bytes  C
    php5ts_debug.dll!zim_spl_DirectoryIterator___construct(int ht=1, _zval_struct * return_value=0x02b91f88, _zval_struct * * return_value_ptr=0x00000000, _zval_struct * this_ptr=0x02b92028, int return_value_used=0, void * * * tsrm_ls=0x028a15c0)  Line 658 + 0x1f bytes   C
    php5ts_debug.dll!zend_do_fcall_common_helper_SPEC(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0)  Line 313 + 0x78 bytes   C
    php5ts_debug.dll!ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER(_zend_execute_data * execute_data=0x02bc0098, void * * * tsrm_ls=0x028a15c0)  Line 423  C
    php5ts_debug.dll!execute(_zend_op_array * op_array=0x02b93888, void * * * tsrm_ls=0x028a15c0)  Line 104 + 0x11 bytes    C
    php5ts_debug.dll!zend_execute_scripts(int type=8, void * * * tsrm_ls=0x028a15c0, _zval_struct * * retval=0x00000000, int file_count=3, ...)  Line 1188 + 0x21 bytes C
    php5ts_debug.dll!php_execute_script(_zend_file_handle * primary_file=0x00a7fad4, void * * * tsrm_ls=0x028a15c0)  Line 2196 + 0x1b bytes C
    php.exe!main(int argc=2, char * * argv=0x028a14c0)  Line 1188 + 0x13 bytes  C
    php.exe!__tmainCRTStartup()  Line 555 + 0x19 bytes  C
    php.exe!mainCRTStartup()  Line 371  C

Is it really a question mark?

dp->fileinfo
{dwFileAttributes=32 ftCreationTime={...} ftLastAccessTime={...} ...}
    dwFileAttributes: 32
    ftCreationTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
    ftLastAccessTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
    ftLastWriteTime: {dwLowDateTime=2784934701 dwHighDateTime=30081445 }
    nFileSizeHigh: 0
    nFileSizeLow: 0
    dwReserved0: 3435973836
    dwReserved1: 3435973836
    cFileName: 0x02f9409c "coptic small letter shima follows ?.txt"
    cAlternateFileName: 0x02f941a0 "COPTIC~1.TXT"
dp->fileinfo.cFileName[34]
63 '?'

Yes! It's character #63.

莫多说 2025-01-13 07:25:07

简短回复:

在Windows下,不能用PHP访问任意文件名; 格式”面板和“管理”选项卡面板“非 Unicode 程序的语言”)。

您只能使用那些名称可以用当前选定的“代码页”表示的文件名(请参阅“区域和语言选项”、 “ :

Windows 从 Win2000 开始使用 UTF-16 进行文件编码,但 PHP 作为“非 Unicode 感知程序”与底层文件系统进行通信,这意味着存在一个从 PHP 字符串转换为当前的“代码页表”。 UTF-16 字符串,反之亦然。在 PHP 中,可以通过 setlocale() 以“language_country.codepage”的形式检索当前代码页,例如:

setlocale(LC_CTYPE, 0) ==> “english_United States.1252”

其中 1252 是当前从控制面板中选择的 Windows 代码页表,从文件系统检索的文件名必须使用该代码页进行编码;事情变得更加复杂,因为 UTF-16 文件名使用“最适合的代码页”转换为 PHP 字符串,这是实际字符/单词的近似表示,因此您不能信任文件。从文件系统检索的名称和路径,因为它们可能被任意破坏。

参考文献:

http://en.wikipedia.org/wiki/Windows_code_page
什么是“Windows 代码页”。

https://bugs.php.net/bug.php?id=47096
有关此问题的更多详细信息。

Short reply:

Under Windows, you cannot access arbitrary file names with PHP; you are limited to those file names whose name can be represented with the currently selected "code page" (see Regional and Language Options", "Format" panel and "Administrative" tab panel "Language for non-Unicode programs").

Longer reply:

Windows uses UTF-16 for file encoding since Win2000, but PHP communicate with the underlying file system as a "non-Unicode aware program". This means that there is a current "code page table" that tranlates from PHP strings to UTF-16 strings and vice-versa. From PHP the current code page can be retrieved by setlocale() in the form "language_country.codepage", for example:

setlocale(LC_CTYPE, 0) ==> "english_United States.1252"

where 1252 is the Windows code page table currently selected from the control panel; file names retrieved from the file system are encoded using that code page; file names generated from PHP must be encoded according to that code page. Things are even more complicated by the fact that UTF-16 file names are traslated to PHP strings using the "best-fit code page", that is an approxymated representation of the actual characters/words, so you cannot trust on file names and paths retrieved from the file system as they might be arbitrarily mangled.

References:

http://en.wikipedia.org/wiki/Windows_code_page
What "Windows code pages" are.

https://bugs.php.net/bug.php?id=47096
More details about this issue.

甜味拾荒者 2025-01-13 07:25:07

发现文件我有这个脚本:

$content = scandir($directory);
$list = "<select size = 5 name ='file' id='file'>\n";
for($i = 0; $i < count ( $content ); $i ++) {
    $list .= "<option>$content[$i] </option>\n";
}
$list .= "</select>\n";

这将成功找到文件:鶨鹙鵨鶣鎹蓥
不过,我在 Linux 发行版上尝试过。

要阅读它,您可以使用:
逐行:

$lines = file('file.txt');
//loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {
print "Line #<b>{$line_num}</b> : " . htmlspecialchars($line) . "<br />\n";//or try it without the htmlspecialchars
}

Do discover the files I have this script:

$content = scandir($directory);
$list = "<select size = 5 name ='file' id='file'>\n";
for($i = 0; $i < count ( $content ); $i ++) {
    $list .= "<option>$content[$i] </option>\n";
}
$list .= "</select>\n";

This will succesfully find the file: 鶨鶖鵨鶣鎹鎣
I tried it here on a Linux distro though..

to read it you use:
Line by line:

$lines = file('file.txt');
//loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {
print "Line #<b>{$line_num}</b> : " . htmlspecialchars($line) . "<br />\n";//or try it without the htmlspecialchars
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文