PHP csv 解析器在字段开头吃掉诸如“é”、“í”之类的字符

发布于 2024-10-27 12:31:26 字数 545 浏览 1 评论 0原文

我正在尝试用 PHP 解析 csv 文件。我的问题如下：如果有一个字段用“é”或“í”表示，则解析器会从字段开头吃掉所有这些字符。

该问题仅存在于我的主机上，在本地使用 XAAMP（较新的 PHP 版本）时不存在。我的主机上有 bug 的 PHP 版本是：5.2.6-1+lenny9

代码只不过是一行 fgetcsv。

while (($program = fgetcsv($handle, 0, ',', '"')) !== FALSE) {...}

此代码已经输出“吃掉”的版本，例如当通过 print_r 查看时。

有什么我可以做的吗？这肯定是 PHP 中的一个错误，此后已修复。我发现的另一种选择是通过在字段末尾放置一个逗号来转义序列（我的 csv 源，Google Spreadsheets 会自动将字段包装在“”中，如果里面有 , ）。然后我可以编写一个函数，如果最后一个字符是逗号，则删除它（对此有任何帮助吗？）。

这是（或者曾经是）PHP 中的一个已知错误，是否有解决方案？如果没有，你能帮我删除最后一个字符如果是逗号功能吗？

原文

I am trying to parse a csv file in PHP. My problem is the following: If there is a field stating with "é" or "í", the parser eats all those characters from the start of a field.

The problem is only present on my host, it's not present when using XAAMP locally (newer PHP version). The PHP version on my host with the bug is: 5.2.6-1+lenny9

The code is nothing but one line of fgetcsv.

while (($program = fgetcsv($handle, 0, ',', '"')) !== FALSE) {...}

This code already outputs the "eaten" version, for example when viewed by print_r.

Is there anything I can do? It must be a bug in PHP something, which has been fixed since then. One alternative option I found out was to just escape the sequence, by putting a comma at the end of a field (my csv source, Google Spreadsheets automatically wraps the field in " " if there is a , present inside). Then I can write a function that deletes the last character if it's a comma (any help on this?).

Is is (or was it) a known bug in PHP, and were there any solutions for this? If not, can you help me with the delete-last-character-if-its-a-comma function?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

_畞蕅 2024-11-03 12:31:26

您的实际问题是网络服务器在禁止多字节字符集的区域设置下运行。如果设置为 C，我会得到相同的结果：

<?php print_r(str_getcsv("ée, íi, zz, bb, "));

$   LC_ALL=C   php test_getcsv.php

剪切字段中的 é 和 í。 <代码>[0] => e
[1] =>我
[2] => zz

但是当我像这样运行它时：

$   LC_ALL=de_DE.UTF-8  php test_getcsv.php

我得到了正确的结果。 <代码>[0] =>埃伊
[1] =>伊伊
[2] => zz

您需要调查您的服务器上有哪些区域设置可用，然后在脚本开头使用 setlocale(LC_ALL, "xy_zz.UTF-8") 。

Your actual problem is that the webserver runs under a locale which forbids multibyte charsets. If set to C I get the same result:

<?php print_r(str_getcsv("ée, íi, zz, bb, "));

$   LC_ALL=C   php test_getcsv.php

Cuts of the é and í in fields. [0] => e [1] => i [2] => zz

But when I run it like this:

$   LC_ALL=de_DE.UTF-8  php test_getcsv.php

I get the correct results. [0] => ée [1] => íi [2] => zz

You will need to investigate which locales are available on your server, then use setlocale(LC_ALL, "xy_zz.UTF-8") at the start of your script.

回复收藏 0 原文

~没有更多了~

关于作者

梦忆晨望

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

PHP csv 解析器在字段开头吃掉诸如“é”、“í”之类的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

PHP csv 解析器在字段开头吃掉诸如“é”、“í”之类的字符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。