简单的perl split()和正则表达式问题

发布于 2024-11-17 01:49:57 字数 1209 浏览 3 评论 0原文

可能的重复:
如何使用 Perl 解析引用的 CSV正则表达式?

我正在尝试获取 CSV 文件并将每一行导入到一个数组中(其中每个元素代表一列)。 CSV 文件的格式非常简单:

item1,item2,item3
nextrowitem1,item2,item3
"items,with,commas","are,in,quotes"

我使用以下方法导入了 CSV 文件:

open(FILE, "test.csv");
@lines = <FILE>;

然后使用以下方法循环遍历它:

foreach(@lines){
    @items = split(/regular expression/);
    /*Do stuff with @items array*/
}

(请注意,您不需要使用 split(/regular expression, $string);因为如果没有提供字符串,split() 假定 $_)

在我使用 CSV 文件测试文件之前,其中没有任何项目包含逗号和简单的正则表达式分割(/,/)。这工作得很好,所以文件、读取它或我在这个正则表达式之后的循环没有任何问题。然而,当我点击包含逗号的项目时,它们会像这样进行可以理解的划分:

1 => "items
2 => with
3 => commas"
4 => "are
5 => in
6 => quotes"

而不是所需的:

1 => items,with,commas
2 => are,in,quotes

任何人都可以帮助我开发一个正则表达式来正确分割这个数组吗?基本上,如果项目以引号 (") 开头,则需要等到 "," 才能拆分。如果项目不以引号开头,则需要等到 , 进行拆分。

Possible Duplicate:
How can I parse quoted CSV in Perl with a regex?

I am attempting to take a CSV file and import each row into an array (where each element represents a column). The format of a CSV file is very simple:

item1,item2,item3
nextrowitem1,item2,item3
"items,with,commas","are,in,quotes"

I imported the CSV file using:

open(FILE, "test.csv");
@lines = <FILE>;

Then I looped through it using:

foreach(@lines){
    @items = split(/regular expression/);
    /*Do stuff with @items array*/
}

(Note that you do not need to use split(/regular expression, $string); because split() assumes $_ if no string is supplied)

Before I tested the file using a CSV file where none of the items contained commas and the simple regular expression of split(/,/). This worked just fine, so there is nothing wrong with the file, reading it, or my loop after this regular expression. However when I hit items that contained a comma they got understandably divided like so:

1 => "items
2 => with
3 => commas"
4 => "are
5 => in
6 => quotes"

Instead of the desired:

1 => items,with,commas
2 => are,in,quotes

Can anyone help me develop a regular expression to split this array correctly? Basically if the item starts with a quote ("), it needs to wait until "," to split. If the item does not start with a quote, it needs to wait until , to split.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

暗地喜欢 2024-11-24 01:49:57

尝试阅读 Text::CSV 作为已经执行此操作的可能选项。将 CSV 解析为正则表达式的问题在于,您必须查找诸如 "," (您指出的)之类的内容以及 , 分隔符。

Try reading Text::CSV as a possible option that already does this. The problem with doing parsing of a CSV into a regular expression is that you have to look for things like "," (which you indicated) as well as just a , separation.

故人如初 2024-11-24 01:49:57

只需使用 Text::CSV_XS 即可...

Just use Text::CSV_XS instead...

心意如水 2024-11-24 01:49:57

请参阅我的解决此问题的帖子更多细节。

^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$ 将匹配整行,然后您可以使用匹配的捕获来获取数据(不带引号)。

See my post that solves this problem for more detail.

^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$ Will match the whole line, then you can use the matched captures to get your data out (without the quotes).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文