Python中的文本文件解析问题

发布于 2024-12-01 08:33:31 字数 651 浏览 2 评论 0原文

我是 python 新手,如果我找到单词“Lett”,我会尝试删除文本文件中的行。在行中。这是我试图解析的文本文件的示例:

<A>Lamb</A> <W>Let. Moxon</W>
<A>Lamb</A> <W>Danger Confound. Mor. w. Personal Deformity</W>
<A>Lamb</A> <W>Gentle Giantess</W>
<A>Lamb</A> <W>Lett., to Wordsw.</W>
<A>Lamb</A> <W>Lett., to Procter</W>
<A>Lamb</A> <W>Let. to Old Gentleman</W>
<A>Lamb</A> <W>Elia Ser.</W>
<A>Lamb</A> <W>Let. to T. Manning</W>

我知道如何打开该文件,但我只是不确定如何找到匹配的文本以及如何删除该行。任何帮助将不胜感激。

I am new to python and I am trying to delete lines in a text file if I find the word "Lett." in the line. Here is a sample of the text file I am trying to parse:

<A>Lamb</A> <W>Let. Moxon</W>
<A>Lamb</A> <W>Danger Confound. Mor. w. Personal Deformity</W>
<A>Lamb</A> <W>Gentle Giantess</W>
<A>Lamb</A> <W>Lett., to Wordsw.</W>
<A>Lamb</A> <W>Lett., to Procter</W>
<A>Lamb</A> <W>Let. to Old Gentleman</W>
<A>Lamb</A> <W>Elia Ser.</W>
<A>Lamb</A> <W>Let. to T. Manning</W>

I know how to open the file but I am just uncertain of how to find the matching text and then how to delete that line. Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

节枝 2024-12-08 08:33:31
f = open("myfile.txt", "r")
for line in f:
  if not "Lett." in line: print line,

f.close()

或者如果您想将结果写入文件:

f = open("myfile.txt", "r")
lines = f.readlines()
f.close()
f = open("myfile.txt", "w")
for line in lines:
  if not "Lett." in line: f.write(line)

f.close()
f = open("myfile.txt", "r")
for line in f:
  if not "Lett." in line: print line,

f.close()

or if you want to write the result to a file:

f = open("myfile.txt", "r")
lines = f.readlines()
f.close()
f = open("myfile.txt", "w")
for line in lines:
  if not "Lett." in line: f.write(line)

f.close()
感悟人生的甜 2024-12-08 08:33:31
# Open input text
text = open('in.txt', 'r')
# Open a file to output results
out = open('out.txt', 'w')

# Go through file line by line
for line in text.readlines():
    if 'Lett.' not in line: ### This is the crucial line.
        # add line to file if 'Lett.' is not in the line
        out.write(line)
# Close the file to save changes
out.close()
# Open input text
text = open('in.txt', 'r')
# Open a file to output results
out = open('out.txt', 'w')

# Go through file line by line
for line in text.readlines():
    if 'Lett.' not in line: ### This is the crucial line.
        # add line to file if 'Lett.' is not in the line
        out.write(line)
# Close the file to save changes
out.close()
花开半夏魅人心 2024-12-08 08:33:31

我有一个针对此类内容的通用流编辑器框架。我将文件加载到内存中,将更改应用于内存中的行列表,并在发生更改时写出文件。

我有如下所示的样板:

from sed_util import delete_range, insert_range, append_range, replace_range

def sed(filename):
    modified = 0

    # Load file into memory
    with open(filename) as f:
        lines = [line.rstrip() for line in f]

    # magic here...

    if modified:
        with open(filename, "w") as f:
            for line in lines:
                f.write(line + "\n")

# magic here 部分中,我有:

  1. 对各行的修改,例如:

    lines[i] = change_line(lines[i])

  2. 调用我的 sed 实用程序来插入、追加和替换行,例如:

    lines = delete_range(lines, some_range)

后者使用如下原语:

def delete_range(lines, r):
    """
    >>> a = list(range(10))
    >>> b = delete_range(a, (1, 3))
    >>> b
    [0, 4, 5, 6, 7, 8, 9]
    """
    start, end = r
    assert start <= end
    return [line for i, line in enumerate(lines) if not (start <= i <= end)]

def insert_range(lines, line_no, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = insert_range(a, 3, b)
    >>> c
    [0, 1, 2, 11, 12, 3, 4, 5, 6, 7, 8, 9]
    >>> c = insert_range(a, 0, b)
    >>> c
    [11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = insert_range(a, 9, b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 9]
    """
    assert 0 <= line_no < len(lines)
    return lines[0:line_no] + new_lines + lines[line_no:]

def append_range(lines, line_no, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = append_range(a, 3, b)
    >>> c
    [0, 1, 2, 3, 11, 12, 4, 5, 6, 7, 8, 9]
    >>> c = append_range(a, 0, b)
    >>> c
    [0, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = append_range(a, 9, b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
    """
    assert 0 <= line_no < len(lines)
    return lines[0:line_no+1] + new_lines + lines[line_no+1:]

def replace_range(lines, line_nos, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = replace_range(a, (0, 2), b)
    >>> c
    [11, 12, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = replace_range(a, (8, 10), b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 11, 12]
    >>> c = replace_range(a, (0, 10), b)
    >>> c
    [11, 12]
    >>> c = replace_range(a, (0, 10), [])
    >>> c
    []
    >>> c = replace_range(a, (0, 9), [])
    >>> c
    [9]
    """
    start, end = line_nos
    return lines[:start] + new_lines + lines[end:]

def find_line(lines, regex):
    for i, line in enumerate(lines):
        if regex.match(line):
            return i

if __name__ == '__main__':
    import doctest
    doctest.testmod()

为了清楚起见,测试适用于整数数组,但转换也适用于字符串数组。

通常,我会扫描行列表来识别要应用的更改(通常使用正则表达式),然后将更改应用于匹配的数据。例如,今天我最终对 150 个文件进行了大约 2000 行更改。

当您需要应用多行模式或附加逻辑来确定更改是否适用时,这比 sed 效果更好。

I have a general streaming editor framework for this kind of stuff. I load the file into memory, apply changes to the in-memory list of lines, and write out the file if changes were made.

I have boilerplate that looks like this:

from sed_util import delete_range, insert_range, append_range, replace_range

def sed(filename):
    modified = 0

    # Load file into memory
    with open(filename) as f:
        lines = [line.rstrip() for line in f]

    # magic here...

    if modified:
        with open(filename, "w") as f:
            for line in lines:
                f.write(line + "\n")

And in the # magic here section, I have either:

  1. modifications to individual lines, like:

    lines[i] = change_line(lines[i])

  2. calls to my sed utilities for inserting, appending, and replacing lines, like:

    lines = delete_range(lines, some_range)

The latter uses primitives like these:

def delete_range(lines, r):
    """
    >>> a = list(range(10))
    >>> b = delete_range(a, (1, 3))
    >>> b
    [0, 4, 5, 6, 7, 8, 9]
    """
    start, end = r
    assert start <= end
    return [line for i, line in enumerate(lines) if not (start <= i <= end)]

def insert_range(lines, line_no, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = insert_range(a, 3, b)
    >>> c
    [0, 1, 2, 11, 12, 3, 4, 5, 6, 7, 8, 9]
    >>> c = insert_range(a, 0, b)
    >>> c
    [11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = insert_range(a, 9, b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 9]
    """
    assert 0 <= line_no < len(lines)
    return lines[0:line_no] + new_lines + lines[line_no:]

def append_range(lines, line_no, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = append_range(a, 3, b)
    >>> c
    [0, 1, 2, 3, 11, 12, 4, 5, 6, 7, 8, 9]
    >>> c = append_range(a, 0, b)
    >>> c
    [0, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = append_range(a, 9, b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
    """
    assert 0 <= line_no < len(lines)
    return lines[0:line_no+1] + new_lines + lines[line_no+1:]

def replace_range(lines, line_nos, new_lines):
    """
    >>> a = list(range(10))
    >>> b = list(range(11, 13))
    >>> c = replace_range(a, (0, 2), b)
    >>> c
    [11, 12, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> c = replace_range(a, (8, 10), b)
    >>> c
    [0, 1, 2, 3, 4, 5, 6, 7, 11, 12]
    >>> c = replace_range(a, (0, 10), b)
    >>> c
    [11, 12]
    >>> c = replace_range(a, (0, 10), [])
    >>> c
    []
    >>> c = replace_range(a, (0, 9), [])
    >>> c
    [9]
    """
    start, end = line_nos
    return lines[:start] + new_lines + lines[end:]

def find_line(lines, regex):
    for i, line in enumerate(lines):
        if regex.match(line):
            return i

if __name__ == '__main__':
    import doctest
    doctest.testmod()

The tests work on arrays of integers, for clarity, but the transformations work for arrays of strings, too.

Generally, I scan the list of lines to identify changes I want to apply, usually with regular expressions, and then I apply the changes on matching data. Today, for example, I ended up making about 2000 line changes across 150 files.

This works better thansed when you need to apply multiline patterns or additional logic to identify whether a change is applicable.

玉环 2024-12-08 08:33:31

return [l for l in open(fname) if 'Lett' 不在 l 中]

return [l for l in open(fname) if 'Lett' not in l]

计㈡愣 2024-12-08 08:33:31
result = ''
for line in open('in.txt').readlines():
    if 'lett' not in line:
        result += line
f = open('out.txt', 'a')
f.write(result)
result = ''
for line in open('in.txt').readlines():
    if 'lett' not in line:
        result += line
f = open('out.txt', 'a')
f.write(result)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文