如何返回文件中的某些文本,Python-编程的新文本

发布于 2025-02-01 21:30:14 字数 1763 浏览 1 评论 0原文

我有两个文本文件: file1.txt:

随机单词随机单词

***从***开始

需要这些单词

***以***结尾***

更多随机单词

file2.txt:

随机单词随机单词

更多随机单词***从***开始

不需要这些单词

***

在这里***需要以下这些单词

***以***结尾***

应该忽略

i到目前为止已经开发了此功能:

导入re

“”“”返回较低案例单词列表的函数 在感兴趣的区域内“”“”

def get_cerne_words_from_file(fileName):

"""defines the given region and returns a list of words inside"""

with open(filename, 'r') as file:
    lines = file.read()
    list_of_lines = lines.splitlines()
    index_start = 0
    index_end = 0
    for i in range(len(list_of_lines)):
        if list_of_lines[i].startswith('***BEGINNING AT '):
            index_start += i
        if list_of_lines[i].startswith('*** ENDING AT'):
            index_end += i
    valid_lines = list_of_lines[index_start : index_end] 
    valid_lines = "".join(str(x) for x in valid_lines)
    valid_lines = valid_lines.lower()
    valid_lines = valid_lines.split()
    
    valid_words = []
    words_on_line = []
    for line in valid_lines:
        words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line)
    for word in words_on_line:
        valid_words.append(word)
    return valid_words
            

fileName =“ file 2.txt”

单词= get_words_from_file(fileName)

print(fileName,“ loaded ok。”)print(“ loaded ok。”)

print(“ {}”有效的单词。格式(LEN(WIDS)))

在单词中打印(“有效的单词列表:”):

print(word)

是:

file2.txt加载确定。

当前

输出

>我试图获得:

加载了

4个

file2.txt

单词

有效

strong 总体而言,不太确定

任何事情会有所帮助!

I have two text files:
File1.txt:

random words random words

*** BEGINNING AT ***

Need these words

*** ENDING AT ***

more random words

file2.txt:

random words random words

more random words *** BEGINNING AT ***

Don't need these words

*** BEGINNING AT ***

Need these words here

*** ENDING AT ***

These words should be ignored

I so far have developed this function:

import re

"""function that returns a list of lower case words
that are within the region of interest"""

def get_certain_words_from_file(filename):

"""defines the given region and returns a list of words inside"""

with open(filename, 'r') as file:
    lines = file.read()
    list_of_lines = lines.splitlines()
    index_start = 0
    index_end = 0
    for i in range(len(list_of_lines)):
        if list_of_lines[i].startswith('***BEGINNING AT '):
            index_start += i
        if list_of_lines[i].startswith('*** ENDING AT'):
            index_end += i
    valid_lines = list_of_lines[index_start : index_end] 
    valid_lines = "".join(str(x) for x in valid_lines)
    valid_lines = valid_lines.lower()
    valid_lines = valid_lines.split()
    
    valid_words = []
    words_on_line = []
    for line in valid_lines:
        words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line)
    for word in words_on_line:
        valid_words.append(word)
    return valid_words
            

filename = "file2.txt"

words = get_words_from_file(filename)

print(filename, "loaded ok.")

print("{} valid words found.".format(len(words)))

print("Valid word list:")

for word in words:

print(word)

the current output is:

file2.txt loaded ok.

0 valid words found.

Valid word list:

But I'm trying to get:

file2.txt loaded ok.

4 valid words found.

Valid word list:

need

these

words

here

My thinking is its something wrong with the first section, but new to python and programming as a whole so not too sure

Anything would help thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

春花秋月 2025-02-08 21:30:14

从我所看到的 - 您不需要正则;您可以在“”关键字中使用python。

filename = 'input_file'
valid_words = ['word1', 'word2']

with open(filename, 'r') as file:
    lines = file.read()
    list_of_lines = lines.splitlines()
    lines_with_word = []
    for line in list_of_lines:
        for word in valid_words:
            if word in line:
                lines_with_word.append(line)
    print(lines_with_word)

from what I see - you don't need regex; you can use Python "in" keyword.

filename = 'input_file'
valid_words = ['word1', 'word2']

with open(filename, 'r') as file:
    lines = file.read()
    list_of_lines = lines.splitlines()
    lines_with_word = []
    for line in list_of_lines:
        for word in valid_words:
            if word in line:
                lines_with_word.append(line)
    print(lines_with_word)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文