在 Perl 中,如何正确解析带有引号字符串的制表符/空格分隔文件?
我需要解析 Perl 中包含很多列的制表符/空格分隔的文件。这些值使得大字符串括在双引号内。这些字符串可以包含任何字符,例如制表符和空格或其他任何字符。
当我尝试使用 split 函数解析它们时,它也会拆分这些字符串。现在我怎样才能让 perl 理解“”中的字符串是单个列条目?
一个简单的例子是,
12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; "
I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else.
When I try to parse them with the split function it splits these strings as well. Now how can I make perl understand that the strings within the " " are a single column entry?
A simple example is,
12 345546.67677 "Hello World!!!" -567.55656 0.5465767 "Hello_Again; "
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
Text::CSV
库,它可以处理所有边缘情况为你。它允许您设置分隔符:Use the
Text::CSV
library, which handles all the edge cases for you. It lets you set the delimiter:请注意,您说的是制表符/空格分隔。如果分隔符是混合的和/或您必须将连续空格视为一个,请使用 Text::ParseWords< /a> 可能更容易:
输出:
Note that you say tab/space delimited. If delimiters are mixed and/or you have to treat consecutive spaces as one, using Text::ParseWords might be easier:
Output:
其他可能性是 Regexp::Common::balanced 和 Text::Balanced。
Other possibilities are Regexp::Common::balanced and Text::Balanced.