csv 中缺少字段的第一个字符

发布于 2024-11-02 18:08:44 字数 969 浏览 0 评论 0原文

我正在 php 中编写 csv 导入脚本。除了字段开头的外来字符之外，它工作正常。

代码看起来像这样

if (($handle = fopen($filename, "r")) !== FALSE)
{
     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
         $teljing[] = $data;

     fclose($handle);
}

这是一个显示我的问题的数据示例

føroyskir stavir, "Kr. 201,50"
óvirkin ting, "Kr. 100,00"

这将导致以下结果

array 
(
     [0] => array 
          (
                 [0] => 'føroyskir stavir',
                 [1] => 'Kr. 201,50'
          )
     [1] => array 
          (
                 [0] => 'virkin ting', <--- Should be 'óvirkin ting'
                 [1] => 'Kr. 100,00'
          )
)

我已经在 php.net 的一些评论中看到了这种行为，并且我已经尝试过 ini_set('auto_detect_line_endings',TRUE);< /code> 检测行结尾。没有成功。

有人熟悉这个问题吗？

编辑：

谢谢你，AJ，这个问题现在已经解决了。

setlocale(LC_ALL, 'en_US.UTF-8');

是解决方案。

原文

I'm working on a csv import script in php. It works fine, except for foreign characters in the beginning of a field.

The code looks like this

if (($handle = fopen($filename, "r")) !== FALSE)
{
     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) 
         $teljing[] = $data;

     fclose($handle);
}

Here is a data example showing my issue

føroyskir stavir, "Kr. 201,50"
óvirkin ting, "Kr. 100,00"

This will result in the following

array 
(
     [0] => array 
          (
                 [0] => 'føroyskir stavir',
                 [1] => 'Kr. 201,50'
          )
     [1] => array 
          (
                 [0] => 'virkin ting', <--- Should be 'óvirkin ting'
                 [1] => 'Kr. 100,00'
          )
)

I have seen this behaivior documented in some comments in php.net, and I have tried ini_set('auto_detect_line_endings',TRUE); to detect line endings. No success.

Anyone familiar with this issue?

Edit:

Thanks you AJ, this issue is now solved.

setlocale(LC_ALL, 'en_US.UTF-8');

Was the solution.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怎樣才叫好 2024-11-09 18:08:44

来自 PHP 手册 for fgetcsv()：

“注意：区域设置是如果 LANG 是 en_US.UTF-8，则该函数会错误地读取一字节编码的文件。”

回复收藏 0 原文

烛影斜 2024-11-09 18:08:44

从 PHP.net/fgetcsv 评论复制：

marketruler dot com 的肯特
2010 年 2 月 4 日 11:18 请注意 fgetcsv，
至少在 PHP 5.3 或之前版本中，将
不适用于 UTF-16 编码的文件。
您的选择是转换整个
文件到 ISO-8859-1（或 latin1），或者
逐行转换并逐行转换
行转换为 ISO-8859-1 编码，然后
使用 str_getcsv （或兼容的
向后兼容的实现）。
如果您需要阅读非拉丁语
字母表，可能最好转换为
UTF-8。
请参阅 str_getcsv 以获取
它的向后兼容版本
与 PHP < 5.3、查看utf8_decode
对于 Rasmus 编写的函数
Andersson提供了utf16_decode。
我添加的修改是
BOP 出现在文件顶部，
然后不在后续行中。所以你
需要存储字节序，并且
然后在每次后续操作时重新发送
线路解码。这个修改版
如果不是，则返回字节顺序
可用：

<?php
/**
 * Decode UTF-16 encoded strings.
 *
 * Can handle both BOM'ed data and un-BOM'ed data.
 * Assumes Big-Endian byte order if no BOM is available.
 * From: http://php.net/manual/en/function.utf8-decode.php
 *
 * @param   string  $str  UTF-16 encoded data to decode.
 * @return  string  UTF-8 / ISO encoded data.
 * @access  public
 * @version 0.1 / 2005-01-19
 * @author  Rasmus Andersson {@link http://rasmusandersson.se/}
 * @package Groupies
 */
function utf16_decode($str, &$be=null) {
    if (strlen($str) < 2) {
        return $str;
    }
    $c0 = ord($str{0});
    $c1 = ord($str{1});
    $start = 0;
    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
        $start = 2;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $start = 2;
        $be = false;
    }
    if ($be === null) {
        $be = true;
    }
    $len = strlen($str);
    $newstr = '';
    for ($i = $start; $i < $len; $i += 2) {
        if ($be) {
            $val = ord($str{$i})   << 4;
            $val += ord($str{$i+1});
        } else {
            $val = ord($str{$i+1}) << 4;
            $val += ord($str{$i});
        }
        $newstr .= ($val == 0x228) ? "\n" : chr($val);
    }
    return $newstr;
}
?>

Trying the "setlocale" trick did not work for me, e.g.

<?php
setlocale(LC_CTYPE, "en.UTF16");
$line = fgetcsv($file, ...)
?>

但这也许是因为我的平台
不支持。然而，fgetcsv
仅支持单个字符
分隔符等，如果
你传递一个 UTF-16 版本的 said
性格，所以我放弃了
快点。
希望这对外出的人有帮助
那里。

Copied from the PHP.net/fgetcsv comments:

kent at marketruler dot com
04-Feb-2010 11:18 Note that fgetcsv,
at least in PHP 5.3 or previous, will
NOT work with UTF-16 encoded files.
Your options are to convert the entire
file to ISO-8859-1 (or latin1), or
convert line by line and convert each
line into ISO-8859-1 encoding, then
use str_getcsv (or compatible
backwards-compatible implementation).
If you need to read non-latin
alphabets, probably best to convert to
UTF-8.
See str_getcsv for a
backwards-compatible version of it
with PHP < 5.3, and see utf8_decode
for a function written by Rasmus
Andersson which provides utf16_decode.
The modification I added was that the
BOP appears at the top of the file,
then not on subsequent lines. So you
need to store the endian-ness, and
then re-send it upon each subsequent
line decoding. This modified version
returns the endianness, if it's not
available:

<?php
/**
 * Decode UTF-16 encoded strings.
 *
 * Can handle both BOM'ed data and un-BOM'ed data.
 * Assumes Big-Endian byte order if no BOM is available.
 * From: http://php.net/manual/en/function.utf8-decode.php
 *
 * @param   string  $str  UTF-16 encoded data to decode.
 * @return  string  UTF-8 / ISO encoded data.
 * @access  public
 * @version 0.1 / 2005-01-19
 * @author  Rasmus Andersson {@link http://rasmusandersson.se/}
 * @package Groupies
 */
function utf16_decode($str, &$be=null) {
    if (strlen($str) < 2) {
        return $str;
    }
    $c0 = ord($str{0});
    $c1 = ord($str{1});
    $start = 0;
    if ($c0 == 0xFE && $c1 == 0xFF) {
        $be = true;
        $start = 2;
    } else if ($c0 == 0xFF && $c1 == 0xFE) {
        $start = 2;
        $be = false;
    }
    if ($be === null) {
        $be = true;
    }
    $len = strlen($str);
    $newstr = '';
    for ($i = $start; $i < $len; $i += 2) {
        if ($be) {
            $val = ord($str{$i})   << 4;
            $val += ord($str{$i+1});
        } else {
            $val = ord($str{$i+1}) << 4;
            $val += ord($str{$i});
        }
        $newstr .= ($val == 0x228) ? "\n" : chr($val);
    }
    return $newstr;
}
?>

Trying the "setlocale" trick did not work for me, e.g.

<?php
setlocale(LC_CTYPE, "en.UTF16");
$line = fgetcsv($file, ...)
?>

But that's perhaps because my platform
didn't support it. However, fgetcsv
only supports single characters for
the delimiter, etc. and complains if
you pass in a UTF-16 version of said
character, so I gave up on that rather
quickly.
Hope this is helpful to someone out
there.

回复收藏 0 原文

~没有更多了~