如何用python计算文件中两个字符之间的行号？

发布于 2024-11-06 13:42:12 字数 726 浏览 3 评论 0原文

嗨我是 python 新手，我有一个 3.2 python！我有一个具有如下格式的文件：

Number of segment pairs = 108570; number of pairwise comparisons = 54234
'+' means given segment; '-' means reverse complement

Overlaps            Containments  No. of Constraints Supporting Overlap

******************* Contig 1 ********************

 E_180+

 E_97-

******************* Contig 2 ********************

E_254+

                    E_264+ is in E_254+

E_276+

******************* Contig 3 ********************

E_256-

E_179-

我想计算 *****< 之间的非空行数em>contig#**** 我想要得到这样的结果

contig1=2
contig2=3
contig3=2**

原文

Hi
I'm new to python and I have a 3.2 python!
I have a file which has some sort of format like this:

Number of segment pairs = 108570; number of pairwise comparisons = 54234
'+' means given segment; '-' means reverse complement

Overlaps            Containments  No. of Constraints Supporting Overlap

******************* Contig 1 ********************

 E_180+

 E_97-

******************* Contig 2 ********************

E_254+

                    E_264+ is in E_254+

E_276+

******************* Contig 3 ********************

E_256-

E_179-

I want to count the number of non-empty lines between the *****contig#****
and I want to get a result like this

contig1=2
contig2=3
contig3=2**

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吲‖鸣 2024-11-13 13:42:12

也许，这里最好使用正则表达式。您可以尝试以下操作：

import re
str = open(file).read()
pairs = re.findall(r'\*+ (Contig \d+) \*+\n([^*]*)',str)

pairs 是元组列表，其中元组的形式为('Contig x', '...')
每个元组的第二个组成部分包含标记后面的文本

。之后，您可以计算这些文本中 '\n' 的数量；最简单的是，这可以通过列表理解来完成：（

[(contig, txt.count('\n')) for (contig,txt) in pairs]

编辑：如果你不想计算空行，你可以尝试

[(contig, txt.count('\n')-txt.count('\n\n')) for (contig,txt) in pairs]

：）

Probably, it's best to use regular expressions here. You can try the following:

import re
str = open(file).read()
pairs = re.findall(r'\*+ (Contig \d+) \*+\n([^*]*)',str)

pairs is a list of tuples, where the tuples have the form ('Contig x', '...')
The second component of each tuple contains the text after the mark

Afterwards, you could count the number of '\n' in those texts; most easily this can be done via a list comprehension:

[(contig, txt.count('\n')) for (contig,txt) in pairs]

(edit: if you don't want to count empty lines you can try:

[(contig, txt.count('\n')-txt.count('\n\n')) for (contig,txt) in pairs]

)

回复收藏 0 原文

赴月观长安 2024-11-13 13:42:12

def give(filename):
    with open(filename) as f:
        for line in f:
            if 'Contig' in line:
                category = line.strip('* \r\n')
                break
        cnt = 0
        aim = []
        for line in f:
            if 'Contig' in line:
                yield (category+'='+str(cnt),aim)
                category = line.strip('* \r\n')
                cnt = 0
                aim= []
            elif line.strip():
                cnt+=1
                if 'is in' in line:
                    aim.append(line.strip())
        yield (category+'='+str(cnt),aim)


for a,b in give('input.txt'):
    print a
    if b:  print b

result

Contig 1=2
Contig 2=3
['E_264+ is in E_254+']
Contig 3=2

函数give() 不是一个普通函数，它是一个生成器函数。看看医生，如果你有问题，我会回答。

strip() 是一个消除字符串开头和结尾字符的函数。

当不带参数使用时，strip() 会删除空格（也就是说 < code>\f \n \r \t \v 和 空格）。当存在字符串作为参数时，在处理的字符串中找到的字符串参数中存在的所有字符都将从处理的字符串中删除。字符串参数中的字符顺序并不重要：此类参数并不指定字符串，而是指定要删除的一组字符。

line.strip() 是一种了解行中是否存在非空白字符的方法

elif line.strip(): 位于行 if 'Contig' in line: ，并且它被写成 elif 而不是 if ，这一点很重要：如果相反，<例如， code>line.strip() 将为 True ，

******** Contig 2 *********\n

我想您会有兴趣了解像这样的行的内容：

            E_264+ is in E_254+

因为它是这样的对计数产生影响的线
所以我编辑了我的代码，以便函数 give() 也产生这些类型的行的信息

def give(filename):
    with open(filename) as f:
        for line in f:
            if 'Contig' in line:
                category = line.strip('* \r\n')
                break
        cnt = 0
        aim = []
        for line in f:
            if 'Contig' in line:
                yield (category+'='+str(cnt),aim)
                category = line.strip('* \r\n')
                cnt = 0
                aim= []
            elif line.strip():
                cnt+=1
                if 'is in' in line:
                    aim.append(line.strip())
        yield (category+'='+str(cnt),aim)


for a,b in give('input.txt'):
    print a
    if b:  print b

result

Contig 1=2
Contig 2=3
['E_264+ is in E_254+']
Contig 3=2

The function give() isn't a normal function, it is a generator function. See the doc, and if you have question, I will answer.

strip() is a function that eliminates characters at the beginning and at the end of a string

When used without argument, strip() removes the whitespaces (that is to say \f \n \r \t \v and blank space). When there is a string as argument, all the characters present in the string argument that are found in the treated string are removed from the treated string. The order of characters in the string argument doesn't matter: such an argument doesn't designates a string but a set of characters to be removed.

line.strip() is a means to know if there are characters that aren't whitespaces in a line

The fact that elif line.strip(): is situated after the line if 'Contig' in line: , and that it is written elif and not if, is important: if it was the contrary, line.strip() would be True for line being for exemple

******** Contig 2 *********\n

I suppose that you will be interested to know the content of the lines like this one:

            E_264+ is in E_254+

because it is this kind of line that make a difference in the countings
So I edited my code in order that the function give() produce also the information of these kind of lines

回复收藏 0 原文

~没有更多了~