简单的perl split()和正则表达式问题
我正在尝试获取 CSV 文件并将每一行导入到一个数组中(其中每个元素代表一列)。 CSV 文件的格式非常简单:
item1,item2,item3
nextrowitem1,item2,item3
"items,with,commas","are,in,quotes"
我使用以下方法导入了 CSV 文件:
open(FILE, "test.csv");
@lines = <FILE>;
然后使用以下方法循环遍历它:
foreach(@lines){
@items = split(/regular expression/);
/*Do stuff with @items array*/
}
(请注意,您不需要使用 split(/regular expression, $string);
因为如果没有提供字符串,split()
假定 $_
)
在我使用 CSV 文件测试文件之前,其中没有任何项目包含逗号和简单的正则表达式分割(/,/)
。这工作得很好,所以文件、读取它或我在这个正则表达式之后的循环没有任何问题。然而,当我点击包含逗号的项目时,它们会像这样进行可以理解的划分:
1 => "items
2 => with
3 => commas"
4 => "are
5 => in
6 => quotes"
而不是所需的:
1 => items,with,commas
2 => are,in,quotes
任何人都可以帮助我开发一个正则表达式来正确分割这个数组吗?基本上,如果项目以引号 ("
) 开头,则需要等到 ","
才能拆分。如果项目不以引号开头,则需要等到 ,
进行拆分。
Possible Duplicate:
How can I parse quoted CSV in Perl with a regex?
I am attempting to take a CSV file and import each row into an array (where each element represents a column). The format of a CSV file is very simple:
item1,item2,item3
nextrowitem1,item2,item3
"items,with,commas","are,in,quotes"
I imported the CSV file using:
open(FILE, "test.csv");
@lines = <FILE>;
Then I looped through it using:
foreach(@lines){
@items = split(/regular expression/);
/*Do stuff with @items array*/
}
(Note that you do not need to use split(/regular expression, $string);
because split()
assumes $_
if no string is supplied)
Before I tested the file using a CSV file where none of the items contained commas and the simple regular expression of split(/,/)
. This worked just fine, so there is nothing wrong with the file, reading it, or my loop after this regular expression. However when I hit items that contained a comma they got understandably divided like so:
1 => "items
2 => with
3 => commas"
4 => "are
5 => in
6 => quotes"
Instead of the desired:
1 => items,with,commas
2 => are,in,quotes
Can anyone help me develop a regular expression to split this array correctly? Basically if the item starts with a quote ("
), it needs to wait until ","
to split. If the item does not start with a quote, it needs to wait until ,
to split.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试阅读 Text::CSV 作为已经执行此操作的可能选项。将 CSV 解析为正则表达式的问题在于,您必须查找诸如
","
(您指出的)之类的内容以及,
分隔符。Try reading Text::CSV as a possible option that already does this. The problem with doing parsing of a CSV into a regular expression is that you have to look for things like
","
(which you indicated) as well as just a,
separation.只需使用 Text::CSV_XS 即可...
Just use Text::CSV_XS instead...
请参阅我的解决此问题的帖子更多细节。
^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$
将匹配整行,然后您可以使用匹配的捕获来获取数据(不带引号)。See my post that solves this problem for more detail.
^(?:(?:"((?:""|[^"])+)"|([^,]*))(?:$|,))+$
Will match the whole line, then you can use the matched captures to get your data out (without the quotes).