Python如何在匹配后抓取一定数量的行

发布于 2024-11-17 04:48:27 字数 1695 浏览 2 评论 0原文

假设我有一个以下格式的输入文本文件：

Section1 Heading    Number of lines: n1
Line 1
Line 2
...
Line n1
Maybe some irrelevant lines

Section2 Heading    Number of lines: n2
Line 1
Line 2
...
Line n2

其中文件的某些部分以标题行开头，该标题行指定该部分中有多少行。每个部分标题都有不同的名称。

我编写了一个正则表达式，它将根据用户搜索每个部分的标题名称来匹配标题行，解析它，然后返回数字 n1/n2/etc 来告诉我该部分中有多少行。我一直在尝试使用 for-in 循环来读取每一行，直到计数器达到 n1，但到目前为止还没有成功。

这是我的问题：当匹配中给出了匹配行之后的特定行数并且每个部分都不同时，如何返回该行数？我是编程新手，非常感谢任何帮助。

编辑：好的，这是我到目前为止的相关代码：

import re
print
fname = raw_input("Enter filename: ")
toolname = raw_input("Enter toolname: ")

def findcounter(fname, toolname):
        logfile = open(fname, "r")

        pat = 'SUCCESS Number of lines :'
        #headers all have that format
        for line in logfile:
                if toolname in line:
                    if pat in line:
                            s=line

        pattern = re.compile(r"""(?P<name>.*?)     #starting name
                             \s*SUCCESS        #whitespace and success
                             \s*Number\s*of\s*lines  #whitespace and strings
                             \s*\:\s*(?P<n1>.*)""",re.VERBOSE)
        match = pattern.match(s)
        name = match.group("name")
        n1 = int(match.group("n1"))
        #after matching line, I attempt to loop through the next n1 lines
        lcount = 0
        for line in logfile:
             if line == match:
                    while lcount <= n1:
                                match.append(line)
                                lcount += 1
                                return result

文件本身相当长，并且在我感兴趣的部分之间散布着许多不相关的行。我不太确定的是如何指定直接在匹配行之后打印这些行。

原文

Let's say I have an input text file of the following format:

Section1 Heading    Number of lines: n1
Line 1
Line 2
...
Line n1
Maybe some irrelevant lines

Section2 Heading    Number of lines: n2
Line 1
Line 2
...
Line n2

where certain sections of the file start with a header line that specifies how many lines are in that section. Each section heading has a different name.

I have written a regular expression that will match the header line based on the header name the user searches for each section, parse it, and then return the number n1/n2/etc that tells me how many lines are in the section. I have been trying to use a for-in loop to read through each line until a counter reaches n1, but it hasn't worked out so far.

Here's my question: how do I return just a certain number of lines following a matched line when that number is given in the match and different for each section? I'm new to programming, and I appreciate any help.

EDIT: Okay, here's the relevant code that I have so far:

import re
print
fname = raw_input("Enter filename: ")
toolname = raw_input("Enter toolname: ")

def findcounter(fname, toolname):
        logfile = open(fname, "r")

        pat = 'SUCCESS Number of lines :'
        #headers all have that format
        for line in logfile:
                if toolname in line:
                    if pat in line:
                            s=line

        pattern = re.compile(r"""(?P<name>.*?)     #starting name
                             \s*SUCCESS        #whitespace and success
                             \s*Number\s*of\s*lines  #whitespace and strings
                             \s*\:\s*(?P<n1>.*)""",re.VERBOSE)
        match = pattern.match(s)
        name = match.group("name")
        n1 = int(match.group("n1"))
        #after matching line, I attempt to loop through the next n1 lines
        lcount = 0
        for line in logfile:
             if line == match:
                    while lcount <= n1:
                                match.append(line)
                                lcount += 1
                                return result

The file itself is pretty long, and there are lots of irrelevant lines interspersed between the sections I'm interested in. What I'm not too sure about is how to specify printing the lines directly after a matched line.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

走过海棠暮 2024-11-24 04:48:27

# f is a file object
# n1 is how many lines to read
lines = [f.readline() for i in range(n1)]

# f is a file object
# n1 is how many lines to read
lines = [f.readline() for i in range(n1)]

回复收藏 0 原文

梦纸 2024-11-24 04:48:27

您可以将这样的逻辑放入生成器中：

def take(seq, n):
    """ gets n items from a sequence """
    return [next(seq) for i in range(n)]

def getblocks(lines):
    # `it` is a iterator and knows where we are in the list of lines.
    it = iter(lines)
    for line in it:
        try:
            # try to find the header:
            sec, heading, num = line.split()
            num = int(num)
        except ValueError:
            # didnt work, try the next line
            continue

        # we got a header, so take the next lines
        yield take(it, num) 

#test
data = """
Section1 Heading  3
Line 1
Line 2
Line 3

Maybe some irrelevant lines

Section2 Heading 2
Line 1
Line 2
""".splitlines()

print list(getblocks(data))

You can put logic like this in a generator:

def take(seq, n):
    """ gets n items from a sequence """
    return [next(seq) for i in range(n)]

def getblocks(lines):
    # `it` is a iterator and knows where we are in the list of lines.
    it = iter(lines)
    for line in it:
        try:
            # try to find the header:
            sec, heading, num = line.split()
            num = int(num)
        except ValueError:
            # didnt work, try the next line
            continue

        # we got a header, so take the next lines
        yield take(it, num) 

#test
data = """
Section1 Heading  3
Line 1
Line 2
Line 3

Maybe some irrelevant lines

Section2 Heading 2
Line 1
Line 2
""".splitlines()

print list(getblocks(data))

回复收藏 0 原文

~没有更多了~