如果文件名是 UTF-8,则使 PHP pathinfo() 返回正确的文件名
当对已知为 UTF-8 的文件名使用 PHP 的 pathinfo()
函数时,它不会返回正确的值,除非特殊字符前面有“正常”字符。
示例:pathinfo('aä.pdf')
返回:
Array
(
[dirname] => [the dir]
[basename] => aä.pdf
[extension] => pdf
[filename] => aä
)
这很好,但是 pathinfo('äa.pdf')
返回:
Array
(
[dirname] => [the dir]
[basename] => a.pdf
[extension] => pdf
[filename] => a
)
这并不完全是我所期望的。更糟糕的是,pathinfo('ä.pdf')
返回:
Array
(
[dirname] => [the dir]
[basename] => .pdf
[extension] => pdf
[filename] =>
)
为什么要这样做?这适用于我测试过的所有重音字符。
When using PHP's pathinfo()
function on a filename known to be UTF-8, it does not return the correct value, unless there are 'normal' characters in front of the special character.
Examples:pathinfo('aä.pdf')
returns:
Array
(
[dirname] => [the dir]
[basename] => aä.pdf
[extension] => pdf
[filename] => aä
)
which is fine and dandy, but pathinfo('äa.pdf')
returns:
Array
(
[dirname] => [the dir]
[basename] => a.pdf
[extension] => pdf
[filename] => a
)
Which is not quite what I was expecting. Even worse, pathinfo('ä.pdf')
returns:
Array
(
[dirname] => [the dir]
[basename] => .pdf
[extension] => pdf
[filename] =>
)
Why does it do this? This goes for all accented characters I have tested.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
使用前路径信息
before usage pathinfo
我在 PHP 5.3.3 - 5.3.18 中使用这些函数来处理 basename() 和 pathinfo() 中的 UTF-8 问题。
I have used these functions in PHP 5.3.3 - 5.3.18 to handle UTF-8 issue in basename() and pathinfo().
此问题的临时解决方法似乎是确保重音字符前面有一个“正常”字符,如下所示:
请注意,我们将所有出现的“/”替换为“/a”,但这没关系,因为我们从结果的偏移量 1 开始返回。有趣的是,
pathinfo()
的dirname
部分似乎确实有效,因此不需要解决方法。A temporary work-around for this problem appears to be to make sure there is a 'normal' character in front of the accented characters, like so:
Note that we replace all occurrences of '/' with '/a' but this is okay, since we return starting at offset 1 of the result. Interestingly enough, the
dirname
part ofpathinfo()
does seem to work, so no workaround is needed there.请参阅“pathinfo() 无法处理带有特殊字符(例如德语“Umlaute”)的参数”。
Please refer to "pathinfo() cannot handle argument with special characters like german 'Umlaute'".
当处理ansi字符时,函数pathinfo正确执行。
基于此注释,我们将把输入转换(编码)为 ansi 字符,然后仍然使用函数 pathinfo 来保留其全部内容。
最后,我们将输出值转换(解码)为原始格式。
和演示如下。
When process ansi characters, the function pathinfo do correctly.
Base this note, we will convert (encoding) input to ansi charaters and then still use function pathinfo to keep its whole things.
Finally, we will convert (decoding) output values to original format.
And demo as bellowing.
正如 doc 所示,
以及手册中的示例
As the doc shows,
and the example in the manual