在 PHP 中处理 csv 文件时如何指定编码?

发布于 2024-08-17 17:45:33 字数 377 浏览 5 评论 0原文

<?php
$row = 1;
$handle = fopen ("test.csv","r");
while ($data = fgetcsv ($handle, 1000, ",")) {
    $num = count ($data);
    print "<p> $num fields in line $row: <br>\n";
    $row++;
    for ($c=0; $c < $num; $c++) {
        print $data[$c] . "<br>\n";
    }
}
fclose ($handle);
?> 

上面来自php手册,但我没有看到在哪里指定编码(如utf8左右)

<?php
$row = 1;
$handle = fopen ("test.csv","r");
while ($data = fgetcsv ($handle, 1000, ",")) {
    $num = count ($data);
    print "<p> $num fields in line $row: <br>\n";
    $row++;
    for ($c=0; $c < $num; $c++) {
        print $data[$c] . "<br>\n";
    }
}
fclose ($handle);
?> 

The above comes from php manual,but I didn't see where to specify the encoding(like utf8 or so)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

写给空气的情书 2024-08-24 17:45:33

尝试更改区域设置。

正如您提供的 手册 中的示例所示:

注意:此函数会考虑区域设置。如果 LANG 是例如 en_US.UTF-8,
该函数读取一字节编码的文件是错误的。

同一页面上评论建议的方法

setlocale(LC_ALL, 'ja_JP.UTF8'); // for japanese locale

来自 setlocale()

区域设置名称可以在 RFC 1766ISO 639。不同的系统有
区域设置的不同命名方案。 [...] 在 Windows 上,setlocale(LC_ALL, '') 设置
来自系统区域/语言设置的区域设置名称(可通过控制面板访问)。

Try to change the locale.

Like it says below the example in the manual you gave:

Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8,
files in one-byte encoding are read wrong by this function.

Suggested approach by comment on the same page:

setlocale(LC_ALL, 'ja_JP.UTF8'); // for japanese locale

From setlocale():

Locale names can be found in RFC 1766 and ISO 639. Different systems have
different naming schemes for locales. […] On Windows, setlocale(LC_ALL, '') sets the
locale names from the system's regional/language settings (accessible via Control Panel).

街道布景 2024-08-24 17:45:33

其中之一就是 UTF 字节顺序标记 (BOM) 的出现。字节顺序标记的 UTF-8 字符是 U+FEFF,或者更确切地说,三个字节 - 0xef、0xbb 和 0xbf - 位于文本文件的开头。对于 UTF-16,它用于指示字节顺序。对于 UTF-8 来说,这并不是必需的。

所以需要检测这三个字节并去掉BOM。下面是有关如何检测和删除这三个字节的简化示例。

$str = file_get_contents('file.utf8.csv');
$bom = pack("CCC", 0xef, 0xbb, 0xbf);
if (0 == strncmp($str, $bom, 3)) {
    echo "BOM detected - file is UTF-8\n";
    $str = substr($str, 3);
}

就这样

One such thing is the occurrence of the UTF byte order mark, or BOM. The UTF-8 character for the byte order mark is U+FEFF, or rather three bytes – 0xef, 0xbb and 0xbf – that sits in the beginning of the text file. For UTF-16 it is used to indicate the byte order. For UTF-8 it is not really necessary.

So you need to detect the three bytes and remove the BOM. Below is a simplified example on how to detect and remove the three bytes.

$str = file_get_contents('file.utf8.csv');
$bom = pack("CCC", 0xef, 0xbb, 0xbf);
if (0 == strncmp($str, $bom, 3)) {
    echo "BOM detected - file is UTF-8\n";
    $str = substr($str, 3);
}

That's all

小耗子 2024-08-24 17:45:33

试试这个:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>

try this:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文