在 Perl 中,如何读取符合条件的行的部分内容?
示例数据:
603 Some garbage data not related to me, 55, 113 ->
1-ENST0000 This is sample data blh blah blah blahhhh
2-ENSBTAP0 This is also some other sample data
21-ENADT)$ DO NOT WANT TO READ THIS LINE.
3-ENSGALP0 This is third sample data
node #4 This is 4th sample data
node #5 This is 5th sample data
This is also part of the input file but i dont wish to read this.
Branch -> 05 13,
44, 1,1,4,1
17, 1150
637 YYYYYY: 2 : %
编辑:在上述数据中。 这些部分的列宽是固定的,但可能有一些部分我不想阅读。 上面的示例数据已被编辑以反映这一点。
因此,在这个输入文件中,我想将第一部分“1-ENST0000”的内容读入一个数组,并将“2-ENSBTAP0”的内容读入一个单独的数组,依此类推。
我在想出一个定义模式的正则表达式时遇到了麻烦...前三行有
然后还可以有 node #<这里有一些数字>
Sample Data:
603 Some garbage data not related to me, 55, 113 ->
1-ENST0000 This is sample data blh blah blah blahhhh
2-ENSBTAP0 This is also some other sample data
21-ENADT)$ DO NOT WANT TO READ THIS LINE.
3-ENSGALP0 This is third sample data
node #4 This is 4th sample data
node #5 This is 5th sample data
This is also part of the input file but i dont wish to read this.
Branch -> 05 13,
44, 1,1,4,1
17, 1150
637 YYYYYY: 2 : %
EDIT: In the above data. The column width is fixed for the sections but there might be some sections I do not wish to read. above sample data has been edited to reflect that.
So in this input file I want to read contents of first section '1-ENST0000' into an array and contents of '2-ENSBTAP0' into a separate array and so on.
I am having trouble coming up with a regex that will define the pattern ...first three lines have <someNumber>-ENS<someotherstuf>
and then there can also be node #<some number here>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这真的是一个固定列文件吗? 如果是这样,那么就不用担心正则表达式了。 只需按照列宽进行分割,或许可以修剪第 1 列的尾随空白。
Is this really a fixed-column file? If so, then don't bother with regexps. Just split at the column width, perhaps trimming trailing white space from columen 1.
好的,根据您后来的评论,这与上一个问题有点不同。 另外,我现在意识到
node #54
是第一列中的有效条目。更新:我现在也意识到您不需要第一列。
更新: 一般来说,您既不想也不需要在 Perl 中处理字符数组。
更新:现在您已经澄清了应该跳过什么和不应该跳过什么,这里有一个处理该问题的版本。 添加模式以适应
if
条件。至于学习如何钓鱼,我建议您阅读 perldoc perltoc 中相关的所有内容。
OK, based on your later comment, this is a little different than the previous question. Also, I now realize that
node #54
is a valid entry in the first column.Update: I now also realize you do not need the first column.
Update: In general, you neither want to nor need to deal with character arrays in Perl.
Update: Now that you clarified the what should and should not be skipped, here is a version that deals with that. Add patterns to taste in the
if
condition.As for learning how to fish, I recommend you read everything related in perldoc perltoc.