关于 Perl 中文本解析的问题

发布于 2024-10-02 11:36:42 字数 587 浏览 3 评论 0原文

我想像这样解析该行,

S1,F2  title including several white spaces  (abbr) single,Here<->There,reply

并且我想要如下的输出,

1
2
title including several white spaces
abbr
single
Here22There  # identify <-> and translate it to 22; 
reply

我想知道如何解析上面的行?

方法一。 我计划将整行分成四段,然后解析各个子段。

段 1. S1,F2

段 2。 标题包含多个空格

segment3。 abbr

段4。 single,Here<->There,reply

方法 2. 我只是写了一个复杂的正则表达式语句来解析它。

哪种方法对我的实践更有意义?

感谢任何意见或建议。

I want to parse the line as this,

S1,F2  title including several white spaces  (abbr) single,Here<->There,reply

And I want the output as below,

1
2
title including several white spaces
abbr
single
Here22There  # identify <-> and translate it to 22; 
reply

I am wondering how to parse the line above?

Method 1.
I plan to split the whole line to four segments then parse the individual sub segments.

segment1. S1,F2

segment2. title including several white spaces

segment3. abbr

segment4. single,Here<->There,reply

Method 2.
I just write a complex regular expression statement to parse it.

Which method is more make sense for my practice?

Appreciated on any comments or suggestions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我喜欢麦丽素 2024-10-09 11:36:42

假设您的输入采用指定的格式,您可以使用正则表达式,例如:

^S(\d+),F(\d+)\s+(.*?)\((.*?)\)\s+(.*?),(.*?),(.*)$

键盘链接

Assuming your input be in the format specified you could use a regex like:

^S(\d+),F(\d+)\s+(.*?)\((.*?)\)\s+(.*?),(.*?),(.*)$

Codepad link

幻梦 2024-10-09 11:36:42

关于您的第一个方法,您可以做的就是首先用逗号分割字符串

my $line =
 'S1,F4  title including several white spaces (abbr) single,Here<->There,reply';
 my ($field1, $field2, $field3, $field4) = split /,/, $line;

然后在包含子字符串的字段上应用正则表达式 code>S1 和 F2 标题包括几个空格(缩写)单个

my ($field5) = $field1 =~ /S(\d+)/;
my ($field6, $field7, $field8, $field9) = 
                    $field2 =~ m/^F(\d+)\s+(.*?)\((.*?)\)\s+(.*?)$/;

它将适用于所有这些字符串,并有助于避免使用和制作复杂的正则表达式,

S1,F2  title including several white spaces  (abbr) single,Here<->There,reply
S1,F2  title including several white spaces  (abbr) single,Here<->There
S1,F2  title including several white spaces  (abbr) single,Here<->There,[reply]

Regarding your first method, what you can do is like first split the string by comma,like

my $line =
 'S1,F4  title including several white spaces (abbr) single,Here<->There,reply';
 my ($field1, $field2, $field3, $field4) = split /,/, $line;

and then apply regex on the field containg substring S1 and F2 title including several white spaces (abbr) single like

my ($field5) = $field1 =~ /S(\d+)/;
my ($field6, $field7, $field8, $field9) = 
                    $field2 =~ m/^F(\d+)\s+(.*?)\((.*?)\)\s+(.*?)$/;

It will work for all these strings, and help to avoid using and making complex regular expressions,

S1,F2  title including several white spaces  (abbr) single,Here<->There,reply
S1,F2  title including several white spaces  (abbr) single,Here<->There
S1,F2  title including several white spaces  (abbr) single,Here<->There,[reply]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文