导入实际字段中有换行符的 CSV

发布于 2024-10-27 01:03:47 字数 283 浏览 11 评论 0原文

我正在使用 PHP 导入 CSV 文件,该文件源自 Excel 电子表格。有些字段中包含换行符,因此当我在 excel/open office 电子表格中重新打开 csv 时,它会误解换行符应该发生的位置。

同样在我的脚本中,使用 fgetcsv 遍历每一行,它在不应该的地方错误地换行。

我可以手动清理数据,但是 a) 这需要很长时间,因为它是一个 10k 行的文件,b) 数据是从客户现有的软件导出的

关于如何在导入过程中自动解决这个问题有什么想法吗?我本以为划定字段会对其进行排序,但事实并非如此。

I am using PHP to import a CSV file, which originates from an excel spreadsheet. Some of the fields contain line breaks in them, so when I reopen the csv in excel / open office spreadsheet, it misinterprets where the line breaks should happen.

Also in my script, using fgetcsv to go through each line, it is incorrectly line breaking where it shouldn't be.

I could manually cleanse the data but a) that would take ages as its a 10k line file, and b) the data is exported from a clients existing piece of software

Any ideas on how to automatically solve this on the import process? I would have thought delimiting the fields would have sorted it but it does not.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

鸠魁 2024-11-03 01:03:47

接受的答案并没有解决我的问题,但我最终在谷歌代码上找到了这个 CSV 解析器库,它非常适合 CSV 中的多行字段。

parsecsv-for-php:
https://github.com/parsecsv/parsecsv-for-php


出于历史目的,最初的项目主页是:
http://code.google.com/p/parsecsv-for-php/

The accepted answer didn't solve the problem for me, but I eventually found this CSV parser library on google code that works well for multiline fields in CSV's.

parsecsv-for-php:
https://github.com/parsecsv/parsecsv-for-php


For historical purposes, the original project home was:
http://code.google.com/p/parsecsv-for-php/

痴意少年 2024-11-03 01:03:47

我也遇到了这个问题,没有找到正确读取数据的方法。

就我而言,它是一次性导入,因此我编写了一个脚本来搜索列中的所有换行符,并将其替换为类似 ##### 的内容。然后我导入数据并用换行符替换它。

如果您需要定期导入,您可以编写自己的 CSV-Parser,这可以解决问题。如果文本列位于 "" 内,您可以将两个 "" 之间的所有内容视为一列(检查内容中是否有转义的 " )。

I had that problem too and did not find an way to read the data correctly.

In my case it was an one-time-import, so i made an script that searched for all line-breaks within an column and replaced it with something like #####. Then I imported the data and replaced that by linebreaks.

If you need an regular import you could write you own CSV-Parser, that handles the problem. If the text-columns are within "" you could treat everything between two "" as one columns (with check for escaped " within the content).

究竟谁懂我的在乎 2024-11-03 01:03:47

这是一个旧线程,但我遇到了这个问题,我用正则表达式解决了它,这样你就可以避免为此而使用库。这里的代码是 PHP 的,但它可以适应其他语言。

$parsedCSV = preg_replace('/(,|\n|^)"(?:([^\n"]*)\n([^\n"]*))*"/', '$1 "$2 $3"', $parsedCSV);

这个解决方案假设包含换行符的字段用双引号括起来,这似乎是一个有效的假设,至少就我到目前为止所看到的而言。此外,双引号应跟在 , 后面,或放置在新行(或第一行)的开头。

示例:

field1,"field2-part1\nfield2-part2",field3

这里 \n 被空格替换,因此结果为:

field1,"field2-part1 field2-part2" ,field3

正则表达式还应该处理多个换行符。

如果内容太大,这可能效率不高,但它可以在很多情况下有所帮助,并且可以重用该想法,也许可以通过对较小的块执行此操作来优化(但您需要使用固定大小的缓冲来处理剪切) 。

It's an old thread but i encountered this problem and i solved it with a regex so you can avoid a library just for that. Here the code is in PHP but it can be adapted to other language.

$parsedCSV = preg_replace('/(,|\n|^)"(?:([^\n"]*)\n([^\n"]*))*"/', '$1"$2 $3"', $parsedCSV);

This solutions supposes the fields containing a linebreak are enclosed by double quotes, which seems to be a valid assumption, at least for what i have seen so far. Also, the double quotes should follow a , or be placed at the start of a new line (or first line).

Example:

field1,"field2-part1\nfield2-part2",field3

Here the \n is replaced by a whitespace so the result would be:

field1,"field2-part1 field2-part2",field3

The regex should handle multiple linebreaks as well.

This might not be efficient if the content is too large, but it can help for many cases and the idea can be reused, maybe optimized by doing this for smaller chunks (but you'd need to handle the cuts with fix-sized buffered).

魔法唧唧 2024-11-03 01:03:47

我的解决方案如下:

nl2br(string);

http://php.net/manual/en/function.nl2br .php

一旦到达单个单元格(字符串)级别,就在字符串上运行它,它会将换行符转换为 html 换行符。

My solution is the following:

nl2br(string);

http://php.net/manual/en/function.nl2br.php

Once you get to the individual cell (string) level, run it on the string and it will convert the linebreaks to html breaks for you.

叹沉浮 2024-11-03 01:03:47

虽然这是一个老问题,但答案可能仍然与人们相关。
目前有一个新的库(独立于框架)http://csv.thephpleague.com/ 支持 NL 字符字段以及一些过滤。

Although it is old question the answer might be still relevant to ppl.
There is currently new library (framework independent) http://csv.thephpleague.com/ which supports NL chars in fields as well as some filtering.

隐诗 2024-11-03 01:03:47

是的,您需要找到该逗号并替换为一些特殊字符,例如 {()} 的组合,最后将它们替换为您最初查找的 ,

希望对您有帮助。

Yes you needs to find that comma and replace by some special characters like combination of {()} and finally replace them with , that you are originally looking for.

Hope that helps you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文