如果文件名是 UTF-8,则使 PHP pathinfo() 返回正确的文件名

发布于 2024-10-07 22:39:15 字数 664 浏览 4 评论 0原文

当对已知为 UTF-8 的文件名使用 PHP 的 pathinfo() 函数时,它不会返回正确的值,除非特殊字符前面有“正常”字符。

示例:
pathinfo('aä.pdf')返回:

Array
(
[dirname] => [the dir]
[basename] => aä.pdf
[extension] => pdf
[filename] => aä
)  

这很好,但是 pathinfo('äa.pdf')返回:

Array
(
[dirname] => [the dir]
[basename] => a.pdf
[extension] => pdf
[filename] => a
)  

这并不完全是我所期望的。更糟糕的是,pathinfo('ä.pdf')返回:

Array
(
[dirname] => [the dir]
[basename] => .pdf
[extension] => pdf
[filename] => 
)  

为什么要这样做?这适用于我测试过的所有重音字符。

When using PHP's pathinfo() function on a filename known to be UTF-8, it does not return the correct value, unless there are 'normal' characters in front of the special character.

Examples:
pathinfo('aä.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => aä.pdf
[extension] => pdf
[filename] => aä
)  

which is fine and dandy, but pathinfo('äa.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => a.pdf
[extension] => pdf
[filename] => a
)  

Which is not quite what I was expecting. Even worse, pathinfo('ä.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => .pdf
[extension] => pdf
[filename] => 
)  

Why does it do this? This goes for all accented characters I have tested.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

浪荡不羁 2024-10-14 22:39:15

使用前路径信息

setlocale(LC_ALL,'en_US.UTF-8');
pathinfo($OriginalName, PATHINFO_FILENAME);
pathinfo($OriginalName, PATHINFO_BASENAME);

before usage pathinfo

setlocale(LC_ALL,'en_US.UTF-8');
pathinfo($OriginalName, PATHINFO_FILENAME);
pathinfo($OriginalName, PATHINFO_BASENAME);
£烟消云散 2024-10-14 22:39:15

我在 PHP 5.3.3 - 5.3.18 中使用这些函数来处理 basename() 和 pathinfo() 中的 UTF-8 问题。

if (!function_exists("mb_basename"))
{
  function mb_basename($path)
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    $base = basename($path);
    $base = str_replace($separator, "", $base);
    return $base;
  }
}
if (!function_exists("mb_pathinfo"))
{
  function mb_pathinfo($path, $opt = "")
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    if ($opt == "") $pathinfo = pathinfo($path);
    else $pathinfo = pathinfo($path, $opt);

    if (is_array($pathinfo))
    {
      $pathinfo2 = $pathinfo;
      foreach($pathinfo2 as $key => $val)
      {
        $pathinfo[$key] = str_replace($separator, "", $val);
      }
    }
    else if (is_string($pathinfo)) $pathinfo = str_replace($separator, "", $pathinfo);
    return $pathinfo;
  }
}

I have used these functions in PHP 5.3.3 - 5.3.18 to handle UTF-8 issue in basename() and pathinfo().

if (!function_exists("mb_basename"))
{
  function mb_basename($path)
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    $base = basename($path);
    $base = str_replace($separator, "", $base);
    return $base;
  }
}
if (!function_exists("mb_pathinfo"))
{
  function mb_pathinfo($path, $opt = "")
  {
    $separator = " qq ";
    $path = preg_replace("/[^ ]/u", $separator."\$0".$separator, $path);
    if ($opt == "") $pathinfo = pathinfo($path);
    else $pathinfo = pathinfo($path, $opt);

    if (is_array($pathinfo))
    {
      $pathinfo2 = $pathinfo;
      foreach($pathinfo2 as $key => $val)
      {
        $pathinfo[$key] = str_replace($separator, "", $val);
      }
    }
    else if (is_string($pathinfo)) $pathinfo = str_replace($separator, "", $pathinfo);
    return $pathinfo;
  }
}
灵芸 2024-10-14 22:39:15

此问题的临时解决方法似乎是确保重音字符前面有一个“正常”字符,如下所示:

function getFilename($path)
{
    // if there's no '/', we're probably dealing with just a filename
    // so just put an 'a' in front of it
    if (strpos($path, '/') === false)
    {
        $path_parts = pathinfo('a'.$path);
    }
    else
    {
        $path= str_replace('/', '/a', $path);
        $path_parts = pathinfo($path);
    }
    return substr($path_parts["filename"],1);
}

请注意,我们将所有出现的“/”替换为“/a”,但这没关系,因为我们从结果的偏移量 1 开始返回。有趣的是,pathinfo()dirname 部分似乎确实有效,因此不需要解决方法。

A temporary work-around for this problem appears to be to make sure there is a 'normal' character in front of the accented characters, like so:

function getFilename($path)
{
    // if there's no '/', we're probably dealing with just a filename
    // so just put an 'a' in front of it
    if (strpos($path, '/') === false)
    {
        $path_parts = pathinfo('a'.$path);
    }
    else
    {
        $path= str_replace('/', '/a', $path);
        $path_parts = pathinfo($path);
    }
    return substr($path_parts["filename"],1);
}

Note that we replace all occurrences of '/' with '/a' but this is okay, since we return starting at offset 1 of the result. Interestingly enough, the dirname part of pathinfo() does seem to work, so no workaround is needed there.

已下线请稍等 2024-10-14 22:39:15

当处理ansi字符时,函数pathinfo正确执行。

基于此注释,我们将把输入转换(编码)为 ansi 字符,然后仍然使用函数 pathinfo 来保留其全部内容。

最后,我们将输出值转换(解码)为原始格式。

和演示如下。

function _pathinfo($path, $options = null)
{
    $path = urlencode($path);
    $parts = null === $options ? pathinfo($path) : pathinfo($path, $options);
    foreach ($parts as $field => $value) {
        $parts[$field] = urldecode($value);
    }
    return $parts;
}
// calling
_pathinfo('すtest.jpg');
_pathinfo('すtest.jpg', PATHINFO_EXTENSION);

When process ansi characters, the function pathinfo do correctly.

Base this note, we will convert (encoding) input to ansi charaters and then still use function pathinfo to keep its whole things.

Finally, we will convert (decoding) output values to original format.

And demo as bellowing.

function _pathinfo($path, $options = null)
{
    $path = urlencode($path);
    $parts = null === $options ? pathinfo($path) : pathinfo($path, $options);
    foreach ($parts as $field => $value) {
        $parts[$field] = urldecode($value);
    }
    return $parts;
}
// calling
_pathinfo('すtest.jpg');
_pathinfo('すtest.jpg', PATHINFO_EXTENSION);
旧人九事 2024-10-14 22:39:15
private function _pathinfo($path, $options = null) {
  $result = pathinfo(' ' . $path, $options);
  return substr($result, 1);
}
private function _pathinfo($path, $options = null) {
  $result = pathinfo(' ' . $path, $options);
  return substr($result, 1);
}
傾旎 2024-10-14 22:39:15

正如 doc 所示,

注意

pathinfo() 是区域设置感知的,因此它可以解析包含以下内容的路径:
正确的多字节字符,必须使用设置匹配的区域设置
setlocale() 函数。

以及手册中的示例

As the doc shows,

Caution

pathinfo() is locale aware, so for it to parse a path containing
multibyte characters correctly, the matching locale must be set using
the setlocale() function.

and the example in the manual

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文