perl 从文件中删除行

发布于 2024-12-10 21:37:07 字数 671 浏览 0 评论 0原文

我的文件如下所示：

ATOM 2517 O   VAL 160 8.337  12.679  -2.487
ATOM 2518 OXT VAL 160 7.646  12.461  -0.386
TER 
ATOM 2519 N   VAL 161 -14.431  5.789 -25.371
ATOM 2520 H1  VAL 161 -15.336  5.698 -25.811
ATOM 2521 H2  VAL 161 -13.416 10.529  17.708
ATOM 2522 H3  VAL 161 -14.363  9.436  18.498
ATOM 2523 CA  VAL 161   4.400  9.233  16.454
ATOM 2524 HA  VAL 161   3.390  9.170  16.047

我必须删除“TER”、“TER”之前的行以及 TER 之后的行之后的 3 行，并使文件连续，如下所示：

ATOM 2517 O   VAL 160   8.337 12.679  -2.487
ATOM 2519 N   VAL 161 -14.431  5.789 -25.371
ATOM 2523 CA  VAL 161   4.400  9.233  16.454
ATOM 2524 HA  VAL 161   3.390  9.170  16.047

原文

I have file that looks like:

ATOM 2517 O   VAL 160 8.337  12.679  -2.487
ATOM 2518 OXT VAL 160 7.646  12.461  -0.386
TER 
ATOM 2519 N   VAL 161 -14.431  5.789 -25.371
ATOM 2520 H1  VAL 161 -15.336  5.698 -25.811
ATOM 2521 H2  VAL 161 -13.416 10.529  17.708
ATOM 2522 H3  VAL 161 -14.363  9.436  18.498
ATOM 2523 CA  VAL 161   4.400  9.233  16.454
ATOM 2524 HA  VAL 161   3.390  9.170  16.047

I have to remove "TER", the line before "TER" and 3 lines after the line just after TER and make file continuous like this:

ATOM 2517 O   VAL 160   8.337 12.679  -2.487
ATOM 2519 N   VAL 161 -14.431  5.789 -25.371
ATOM 2523 CA  VAL 161   4.400  9.233  16.454
ATOM 2524 HA  VAL 161   3.390  9.170  16.047

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

莳間冲淡了誓言ζ 2024-12-17 21:37:07

一个简单的逐行脚本。

用法： perl script.pl -i.bak fileglob

例如 perl script.pl -i.bak File*MINvac.pdb

这将改变原始文件，并保存每个文件的备份，扩展名为 .bak。请注意，如果 TER 行出现得太靠近文件末尾，则会导致警告。另一方面，提出的其他解决方案也是如此。

如果您不想保存备份（请小心，因为更改是不可逆的！），请改用 -i。

代码：

#!/usr/bin/perl
use v5.10;
use strict;
use warnings;

my $prev;
while (<>) {
    if (/^TER/) {
        print scalar <>;  # print next line
        <> for 1 .. 3;    # skip 3 lines
        $prev = undef;    # remove previous line
    } else {
        print $prev if defined $prev;
        $prev = $_;
    }
    if (eof) {  # New file next iteration?
        print $prev;
        $prev = undef;
    }
}

A simple line-by-line script.

Usage: perl script.pl -i.bak fileglob

E.g. perl script.pl -i.bak File*MINvac.pdb

This will alter the original file, and save a backup of each file with the extension .bak. Note that if TER lines appear too close to the end of the file, it will cause warnings. On the other hand, so will the other solutions presented.

If you do not wish to save backups (use caution, since changes are irreversible!), use -i instead.

Code:

#!/usr/bin/perl
use v5.10;
use strict;
use warnings;

my $prev;
while (<>) {
    if (/^TER/) {
        print scalar <>;  # print next line
        <> for 1 .. 3;    # skip 3 lines
        $prev = undef;    # remove previous line
    } else {
        print $prev if defined $prev;
        $prev = $_;
    }
    if (eof) {  # New file next iteration?
        print $prev;
        $prev = undef;
    }
}

回复收藏 0 原文

英雄似剑 2024-12-17 21:37:07

我意识到我应该用 Perl 编写它，但现在我已经用 Python 编写了它。无论如何我都会发布它，因为它可能会被证明是有用的，不认为这有什么坏处。

#!/usr/bin/python2.7
import sys
import glob
import os

try:
    dir = sys.argv[1]
except IndexError:
    print "Usage: "+sys.argv[0]+" dir"
    print "Example: "+sys.argv[0]+" /home/user/dir/"
    sys.exit(1)

for file in glob.glob(os.path.join(dir, 'File*_*MINvac.pdb')):
    fin = open(file, "r")
    content = fin.readlines()
    fin.close()

    for i in range(0, len(content)):
        try:
            if "TER" in content[i]:
                del content[i]
                del content[i-1]
                del content[i:i+3]
        except IndexError:
            break
    fout = open(file, "w")
    fout.writelines(content)
    fout.close()

编辑：添加了对多个文件的支持，就像OP想要的那样。

I realized I was supposed to write it in Perl, but now I've already written it in Python. I'm posting it anyway as it may prove to be useful, don't see any harm in that.

#!/usr/bin/python2.7
import sys
import glob
import os

try:
    dir = sys.argv[1]
except IndexError:
    print "Usage: "+sys.argv[0]+" dir"
    print "Example: "+sys.argv[0]+" /home/user/dir/"
    sys.exit(1)

for file in glob.glob(os.path.join(dir, 'File*_*MINvac.pdb')):
    fin = open(file, "r")
    content = fin.readlines()
    fin.close()

    for i in range(0, len(content)):
        try:
            if "TER" in content[i]:
                del content[i]
                del content[i-1]
                del content[i:i+3]
        except IndexError:
            break
    fout = open(file, "w")
    fout.writelines(content)
    fout.close()

Edit: Added support for multiple files, like the OP wanted.

回复收藏 0 原文

丢了幸福的猪 2024-12-17 21:37:07

因此，对于每组 6 个连续行，如果第二行是 TER，您想要丢弃除第三行之外的所有行吗？

TIMTOWTDI，但这应该有效：

my @queue;
while (<>) {
    push @queue, $_;
    @queue = $queue[2]  if @queue == 6 and $queue[1] =~ /^TER$/;
    print shift @queue  if @queue == 6;
}
print @queue;  # assume no TERs in last 4 lines

So, for each set of 6 consecutive lines, you want to discard all but the third line if the second line is a TER?

TIMTOWTDI, but this should work:

my @queue;
while (<>) {
    push @queue, $_;
    @queue = $queue[2]  if @queue == 6 and $queue[1] =~ /^TER$/;
    print shift @queue  if @queue == 6;
}
print @queue;  # assume no TERs in last 4 lines

回复收藏 0 原文

终止放荡 2024-12-17 21:37:07

use strict;
use warnings;
use Tie::File;

my @array;

tie @array, 'Tie::File', 'myFile.txt' or die "Unable to tie file";

my %unwanted = map  { $_ => 1 }                # Hashify ...
               map  { $_-1, $_, $_+2 .. $_+4 } # ... the five lines ...
               grep { $array[$_] =~ /^TER/ }   # ... around 'TER'  ...
               0 .. $#array ;                  # ... in the file

# Remove the unwanted lines
@array = map { $array[$_] } grep { ! $unwanted{$_} } 0 .. $#array;

untie @array;  # The end

use strict;
use warnings;
use Tie::File;

my @array;

tie @array, 'Tie::File', 'myFile.txt' or die "Unable to tie file";

my %unwanted = map  { $_ => 1 }                # Hashify ...
               map  { $_-1, $_, $_+2 .. $_+4 } # ... the five lines ...
               grep { $array[$_] =~ /^TER/ }   # ... around 'TER'  ...
               0 .. $#array ;                  # ... in the file

# Remove the unwanted lines
@array = map { $array[$_] } grep { ! $unwanted{$_} } 0 .. $#array;

untie @array;  # The end

回复收藏 0 原文

~没有更多了~