PHP str_getcsv 删除变音符号

发布于 2024-11-18 14:48:10 字数 1605 浏览 7 评论 0原文

我在 PHP 中解析包含德语变音符号 (-> ä, ö, ü, ä, Ö, Ü) 的 CSV 字符串时遇到了一个小问题。

假设以下 csv 输入字符串：

w;x;y;z
48;OSL;Oslo Stock Exchange;B
49;OTB;Österreichische Termin- und Optionenbörse;C
50;VIE;Wiener Börse;D

并使用适当的 PHP 代码来解析该字符串并创建一个包含 csv 字符串中的数据的数组：

public static function parseCSV($csvString) {
    $rows = str_getcsv($csvString, "\n");
    // Remove headers ..
    $header = array_shift($rows);
    $cols = str_getcsv($header, ';');
    if(!$cols || count($cols)!=4) {
        return null;
    }
    // Parse rows ..
    $data = array();
    foreach($rows as $row) {
        $cols = str_getcsv($row, ';');
        $data[] = array('w'=>$cols[0], 'x'=>$cols[1], 'y'=>$cols[2], 'z'=>$cols[3]);
    }
    if(count($data)>0) {
        return $data;
    }
    return null;
}

使用给定的 csv 字符串调用上述函数的结果是：

Array
(
    [0] => Array
        (
            [w] => 48
            [x] => OSL
            [y] => Oslo Stock Exchange
            [z] => B
        )

    [1] => Array
        (
            [w] => 49
            [x] => OTB
            [y] => sterreichische Termin- und Optionenbörse
            [z] => C
        )

    [2] => Array
        (
            [w] => 50
            [x] => VIE
            [y] => Wiener Börse
            [z] => D
        )
)

请注意，第二个条目缺少 Ö。仅当变音符号直接放置在列分隔符之后时才会发生这种情况。如果按顺序放置多个元音变音，也会发生这种情况，即“ÖÖÖsterreich” -> “斯特莱奇”。 csv 字符串使用 HTML 表单发送，因此内容得到 URL 编码。我使用 Linux 服务器，采用 utf-8 编码，并且 csv 字符串在解析之前看起来是正确的。

有什么想法吗？

原文

I encountered a little problem when parsing CSV-Strings that contain german umlauts (-> ä, ö, ü, Ä, Ö, Ü) in PHP.

Assume the following csv input string:

w;x;y;z
48;OSL;Oslo Stock Exchange;B
49;OTB;Österreichische Termin- und Optionenbörse;C
50;VIE;Wiener Börse;D

And the appropriate PHP code used to parse the string and create an array which contains the data from the csv-String:

public static function parseCSV($csvString) {
    $rows = str_getcsv($csvString, "\n");
    // Remove headers ..
    $header = array_shift($rows);
    $cols = str_getcsv($header, ';');
    if(!$cols || count($cols)!=4) {
        return null;
    }
    // Parse rows ..
    $data = array();
    foreach($rows as $row) {
        $cols = str_getcsv($row, ';');
        $data[] = array('w'=>$cols[0], 'x'=>$cols[1], 'y'=>$cols[2], 'z'=>$cols[3]);
    }
    if(count($data)>0) {
        return $data;
    }
    return null;
}

The result of calling the above function with the given csv-string results in:

Array
(
    [0] => Array
        (
            [w] => 48
            [x] => OSL
            [y] => Oslo Stock Exchange
            [z] => B
        )

    [1] => Array
        (
            [w] => 49
            [x] => OTB
            [y] => sterreichische Termin- und Optionenbörse
            [z] => C
        )

    [2] => Array
        (
            [w] => 50
            [x] => VIE
            [y] => Wiener Börse
            [z] => D
        )
)

Note that the second entry is missing the Ö.
This only happens, if the umlaut is placed directly after the column separator character.
It also happens, if more than one umlaut is places in sequence, i.e. "ÖÖÖsterreich" -> "sterreich".
The csv-string is sent using a HTML-Form, thus the content gets URL-encoded.
I use a Linux server, with utf-8 encoding and the csv-string looks correct before parsing.

Any ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

迷你仙 2024-11-25 14:48:10

假设 fgetcsv (http://php.net/manual/en/function.fgetcsv.php< /a>) 的工作方式与 str_getcsv() 类似，然后引用手册页：

考虑区域设置
通过这个函数。如果 LANG 是例如
en_US.UTF-8，一字节文件
编码被读取错误
功能。

那么你应该尝试使用 setlocale 设置区域设置
http://php.net/manual/en/function.setlocale.php

如果这不起作用，请尝试启用多字节重载
http://www.php.net/manual/en/mbstring.overload.php

甚至更好，使用像 Zend/Symfony 库这样的标准框架库来提取数据

回复收藏 0 原文

美胚控场 2024-11-25 14:48:10

我对一些源自 Microsoft Excel 的数据中的 ï 字符也有类似的问题，保存为 CSV（是的，在“另存为...”对话框的“Web 选项”部分中选择了 UTF8 编码）。不过，这似乎与 str_getcsv 期望的 UTF8 编码不同。

现在，我首先通过 iconv 运行所有内容，并且运行良好 - Excel 的 CSV 文件想法似乎有些问题：

iconv -f windows-1252 -t utf8 source.csv > output.csv

I had a similar issue with the ï character in some data that originated from Microsoft Excel, saved out as a CSV (yes, with UTF8 encoding selected in the "web options" part of the "Save As..." dialog). And still, this appears not to be the same UTF8 encoding that str_getcsv expects.

I now run everything through iconv first and it works fine - there seems something up with Excel's idea of a CSV file:

iconv -f windows-1252 -t utf8 source.csv > output.csv

回复收藏 0 原文

~没有更多了~

关于作者

绮筵

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

PHP str_getcsv 删除变音符号

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_VRzBBA45

痴情

。

Mu.

凉薄对峙

不落城

友情链接

PHP str_getcsv 删除变音符号

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

qq_VRzBBA45

痴情

。

Mu.

凉薄对峙

不落城

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。