csv 中缺少字段的第一个字符

发布于 2024-11-02 18:08:44 字数 969 浏览 0 评论 0原文

我正在 php 中编写 csv 导入脚本。除了字段开头的外来字符之外,它工作正常。

代码看起来像这样

if (($handle = fopen($filename, "r")) !== FALSE)
{
     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
         $teljing[] = $data;

     fclose($handle);
}

这是一个显示我的问题的数据示例

føroyskir stavir, "Kr. 201,50"
óvirkin ting, "Kr. 100,00"

这将导致以下结果

array 
(
     [0] => array 
          (
                 [0] => 'føroyskir stavir',
                 [1] => 'Kr. 201,50'
          )
     [1] => array 
          (
                 [0] => 'virkin ting', <--- Should be 'óvirkin ting'
                 [1] => 'Kr. 100,00'
          )
)

我已经在 php.net 的一些评论中看到了这种行为,并且我已经尝试过 ini_set('auto_detect_line_endings',TRUE);< /code> 检测行结尾。没有成功。

有人熟悉这个问题吗?

编辑:

谢谢你,AJ,这个问题现在已经解决了。

setlocale(LC_ALL, 'en_US.UTF-8');

是解决方案。

I'm working on a csv import script in php. It works fine, except for foreign characters in the beginning of a field.

The code looks like this

if (($handle = fopen($filename, "r")) !== FALSE)
{
     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
         $teljing[] = $data;

     fclose($handle);
}

Here is a data example showing my issue

føroyskir stavir, "Kr. 201,50"
óvirkin ting, "Kr. 100,00"

This will result in the following

array 
(
     [0] => array 
          (
                 [0] => 'føroyskir stavir',
                 [1] => 'Kr. 201,50'
          )
     [1] => array 
          (
                 [0] => 'virkin ting', <--- Should be 'óvirkin ting'
                 [1] => 'Kr. 100,00'
          )
)

I have seen this behaivior documented in some comments in php.net, and I have tried ini_set('auto_detect_line_endings',TRUE); to detect line endings. No success.

Anyone familiar with this issue?

Edit:

Thanks you AJ, this issue is now solved.

setlocale(LC_ALL, 'en_US.UTF-8');

Was the solution.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

怎樣才叫好 2024-11-09 18:08:44

来自 PHP 手册 for fgetcsv()

“注意:区域设置是如果 LANG 是 en_US.UTF-8,则该函数会错误地读取一字节编码的文件。”

From the PHP manual for fgetcsv():

"Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function."

烛影斜 2024-11-09 18:08:44

从 PHP.net/fgetcsv 评论复制:

ma​​rketruler dot com 的肯特
2010 年 2 月 4 日 11:18 请注意 fgetcsv,
至少在 PHP 5.3 或之前版本中,将
不适用于 UTF-16 编码的文件。
您的选择是转换整个
文件到 ISO-8859-1(或 latin1),或者
逐行转换并逐行转换
行转换为 ISO-8859-1 编码,然后
使用 str_getcsv (或兼容的
向后兼容的实现)。
如果您需要阅读非拉丁语
字母表,可能最好转换为
UTF-8。

请参阅 str_getcsv 以获取
它的向后兼容版本
与 PHP < 5.3、查看utf8_decode
对于 Rasmus 编写的函数
Andersson提供了utf16_decode。
我添加的修改是
BOP 出现在文件顶部,
然后不在后续行中。所以你
需要存储字节序,并且
然后在每次后续操作时重新发送
线路解码。这个修改版
如果不是,则返回字节顺序
可用:

<?php
/**
 * Decode UTF-16 encoded strings.
 *
 * Can handle both BOM'ed data and un-BOM'ed data.
 * Assumes Big-Endian byte order if no BOM is available.
 * From: http://php.net/manual/en/function.utf8-decode.php
 *
 * @param   string  $str  UTF-16 encoded data to decode.
 * @return  string  UTF-8 / ISO encoded data.
 * @access  public
 * @version 0.1 / 2005-01-19
 * @author  Rasmus Andersson {@link http://rasmusandersson.se/}
 * @package Groupies
 */
function utf16_decode($str, &$be=null) {
    if (strlen($str) < 2) {
        return $str;
    }
    $c0 = ord($str{0});
    $c1 = ord($str{1});
    $start = 0;
    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
        $start = 2;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $start = 2;
        $be = false;
    }
    if ($be === null) {
        $be = true;
    }
    $len = strlen($str);
    $newstr = '';
    for ($i = $start; $i < $len; $i += 2) {
        if ($be) {
            $val = ord($str{$i})   << 4;
            $val += ord($str{$i+1});
        } else {
            $val = ord($str{$i+1}) << 4;
            $val += ord($str{$i});
        }
        $newstr .= ($val == 0x228) ? "\n" : chr($val);
    }
    return $newstr;
}
?>

Trying the "setlocale" trick did not work for me, e.g.

<?php
setlocale(LC_CTYPE, "en.UTF16");
$line = fgetcsv($file, ...)
?>

但这也许是因为我的平台
不支持。然而,fgetcsv
仅支持单个字符
分隔符等,如果
你传递一个 UTF-16 版本的 said
性格,所以我放弃了
快点。

希望这对外出的人有帮助
那里。

Copied from the PHP.net/fgetcsv comments:

kent at marketruler dot com
04-Feb-2010 11:18 Note that fgetcsv,
at least in PHP 5.3 or previous, will
NOT work with UTF-16 encoded files.
Your options are to convert the entire
file to ISO-8859-1 (or latin1), or
convert line by line and convert each
line into ISO-8859-1 encoding, then
use str_getcsv (or compatible
backwards-compatible implementation).
If you need to read non-latin
alphabets, probably best to convert to
UTF-8.

See str_getcsv for a
backwards-compatible version of it
with PHP < 5.3, and see utf8_decode
for a function written by Rasmus
Andersson which provides utf16_decode.
The modification I added was that the
BOP appears at the top of the file,
then not on subsequent lines. So you
need to store the endian-ness, and
then re-send it upon each subsequent
line decoding. This modified version
returns the endianness, if it's not
available:

<?php
/**
 * Decode UTF-16 encoded strings.
 *
 * Can handle both BOM'ed data and un-BOM'ed data.
 * Assumes Big-Endian byte order if no BOM is available.
 * From: http://php.net/manual/en/function.utf8-decode.php
 *
 * @param   string  $str  UTF-16 encoded data to decode.
 * @return  string  UTF-8 / ISO encoded data.
 * @access  public
 * @version 0.1 / 2005-01-19
 * @author  Rasmus Andersson {@link http://rasmusandersson.se/}
 * @package Groupies
 */
function utf16_decode($str, &$be=null) {
    if (strlen($str) < 2) {
        return $str;
    }
    $c0 = ord($str{0});
    $c1 = ord($str{1});
    $start = 0;
    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
        $start = 2;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $start = 2;
        $be = false;
    }
    if ($be === null) {
        $be = true;
    }
    $len = strlen($str);
    $newstr = '';
    for ($i = $start; $i < $len; $i += 2) {
        if ($be) {
            $val = ord($str{$i})   << 4;
            $val += ord($str{$i+1});
        } else {
            $val = ord($str{$i+1}) << 4;
            $val += ord($str{$i});
        }
        $newstr .= ($val == 0x228) ? "\n" : chr($val);
    }
    return $newstr;
}
?>

Trying the "setlocale" trick did not work for me, e.g.

<?php
setlocale(LC_CTYPE, "en.UTF16");
$line = fgetcsv($file, ...)
?>

But that's perhaps because my platform
didn't support it. However, fgetcsv
only supports single characters for
the delimiter, etc. and complains if
you pass in a UTF-16 version of said
character, so I gave up on that rather
quickly.

Hope this is helpful to someone out
there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文