根据连续两行的模式分割文件

发布于 2024-12-01 23:02:10 字数 1178 浏览 0 评论 0原文

我有以下格式的文件：

ATOM   3736  CB  THR A 486      -6.552 153.891  -7.922  1.00115.15           C  
ATOM   3737  OG1 THR A 486      -6.756 154.842  -6.866  1.00114.94           O  
ATOM   3738  CG2 THR A 486      -7.867 153.727  -8.636  1.00115.11           C  
ATOM   3739  OXT THR A 486      -4.978 151.257  -9.140  1.00115.13           O  
HETATM10351  C1  NAG A 203      33.671  87.279  39.456  0.50 90.22           C  
HETATM10483  C1  NAG A 702      28.025 104.269 -27.569  0.50 92.75           C    
ATOM   3736  CB  THR B 486      -6.552  86.240   7.922  1.00115.15           C  
ATOM   3737  OG1 THR B 486      -6.756  85.289   6.866  1.00114.94           O  
ATOM   3738  CG2 THR B 486      -7.867  86.404   8.636  1.00115.11           C  
ATOM   3739  OXT THR B 486      -4.978  88.874   9.140  1.00115.13           O  
HETATM10351  C1  NAG B 203      33.671 152.852 -39.456  0.50 90.22           C  
HETATM10639  C2  FUC B 402     -48.168 162.221 -22.404  0.50103.03           C

我想在以 HETATM* 开头的每一行之后分割文件，但前提是下一行以 ATOM 开头。我希望新文件名为 $basename_$column，其中 $basename 是输入文件的基本名称，$column 是位置 22-23 处的字符（在示例中为 A 或 B）。我无法弄清楚如何检查两条连续的线以确定分割点。

原文

I have files with the following format:

ATOM   3736  CB  THR A 486      -6.552 153.891  -7.922  1.00115.15           C  
ATOM   3737  OG1 THR A 486      -6.756 154.842  -6.866  1.00114.94           O  
ATOM   3738  CG2 THR A 486      -7.867 153.727  -8.636  1.00115.11           C  
ATOM   3739  OXT THR A 486      -4.978 151.257  -9.140  1.00115.13           O  
HETATM10351  C1  NAG A 203      33.671  87.279  39.456  0.50 90.22           C  
HETATM10483  C1  NAG A 702      28.025 104.269 -27.569  0.50 92.75           C    
ATOM   3736  CB  THR B 486      -6.552  86.240   7.922  1.00115.15           C  
ATOM   3737  OG1 THR B 486      -6.756  85.289   6.866  1.00114.94           O  
ATOM   3738  CG2 THR B 486      -7.867  86.404   8.636  1.00115.11           C  
ATOM   3739  OXT THR B 486      -4.978  88.874   9.140  1.00115.13           O  
HETATM10351  C1  NAG B 203      33.671 152.852 -39.456  0.50 90.22           C  
HETATM10639  C2  FUC B 402     -48.168 162.221 -22.404  0.50103.03           C

I would like to split the file after each line starting with HETATM* but only if the next line starts with ATOM. I would like the new files to be called $basename_$column, where $basename is the base name of the input file and $column is the character at position 22-23 (either A or B, in the example). I am not able to figure out how to check both consecutive lines to determine the splitting point.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

森林很绿却致人迷途 2024-12-08 23:02:10

这是一个 awk 版本

awk 'NR==1{n=$5}/HETATM/{f=1}f && /^ATOM/{n=$5;f=0}{print > "file"n".txt"}' file

使用 FILENAME 而不是 file 创建相同的文件名。

Here's an awk version

awk 'NR==1{n=$5}/HETATM/{f=1}f && /^ATOM/{n=$5;f=0}{print > "file"n".txt"}' file

Use FILENAME instead of file to create the same file name.

回复收藏 0 原文

南风起 2024-12-08 23:02:10

这是一个简单的 Python 解决方案，没有错误检查。应该在 Python 2 或 3 中工作；更改第一行以匹配您的环境。不要将此视为良好编码风格的示例。

编辑独特的文件名。

#!/usr/bin/env python2.4

import os.path
import sys

fname = sys.argv[1]
bname = os.path.basename(fname)

fin = open(fname)

fout = None
ct = 0

for line in fin:
    if line[:6] == 'HETATM':
        flag = True
    if (not fout) or (flag and line[:4] == 'ATOM'):
        if fout:
            fout.close()
        ct += 1
        fout = open(bname + '_' + line[21:22] + str(ct), 'w')
        flag = False
    fout.write(line)

fout.close()

Here's a simple Python solution with no error checking. Should work in Python 2 or 3; change the first line to match your environment. Don't take this as an example of good coding style.

Edited for unique file names.

#!/usr/bin/env python2.4

import os.path
import sys

fname = sys.argv[1]
bname = os.path.basename(fname)

fin = open(fname)

fout = None
ct = 0

for line in fin:
    if line[:6] == 'HETATM':
        flag = True
    if (not fout) or (flag and line[:4] == 'ATOM'):
        if fout:
            fout.close()
        ct += 1
        fout = open(bname + '_' + line[21:22] + str(ct), 'w')
        flag = False
    fout.write(line)

fout.close()

回复收藏 0 原文

~没有更多了~