如何返回文件中的某些文本,Python-编程的新文本
我有两个文本文件: file1.txt:
随机单词随机单词
***从***开始
需要这些单词
***以***结尾***
更多随机单词
file2.txt:
随机单词随机单词
更多随机单词***从***开始
不需要这些单词
***
在这里***需要以下这些单词
***以***结尾***
应该忽略
i到目前为止已经开发了此功能:
导入re
“”“”返回较低案例单词列表的函数 在感兴趣的区域内“”“”
def get_cerne_words_from_file(fileName):
"""defines the given region and returns a list of words inside"""
with open(filename, 'r') as file:
lines = file.read()
list_of_lines = lines.splitlines()
index_start = 0
index_end = 0
for i in range(len(list_of_lines)):
if list_of_lines[i].startswith('***BEGINNING AT '):
index_start += i
if list_of_lines[i].startswith('*** ENDING AT'):
index_end += i
valid_lines = list_of_lines[index_start : index_end]
valid_lines = "".join(str(x) for x in valid_lines)
valid_lines = valid_lines.lower()
valid_lines = valid_lines.split()
valid_words = []
words_on_line = []
for line in valid_lines:
words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line)
for word in words_on_line:
valid_words.append(word)
return valid_words
fileName =“ file 2.txt”
单词= get_words_from_file(fileName)
print(fileName,“ loaded ok。”)print(“ loaded ok。”)
print(“ {}”有效的单词。格式(LEN(WIDS)))
的
在单词中打印(“有效的单词列表:”):
print(word)
是:
file2.txt加载确定。
当前
输出
>我试图获得:
加载了
4个
的
file2.txt
单词
有效
。
strong 总体而言,不太确定
任何事情会有所帮助!
I have two text files:
File1.txt:
random words random words
*** BEGINNING AT ***
Need these words
*** ENDING AT ***
more random words
file2.txt:
random words random words
more random words *** BEGINNING AT ***
Don't need these words
*** BEGINNING AT ***
Need these words here
*** ENDING AT ***
These words should be ignored
I so far have developed this function:
import re
"""function that returns a list of lower case words
that are within the region of interest"""
def get_certain_words_from_file(filename):
"""defines the given region and returns a list of words inside"""
with open(filename, 'r') as file:
lines = file.read()
list_of_lines = lines.splitlines()
index_start = 0
index_end = 0
for i in range(len(list_of_lines)):
if list_of_lines[i].startswith('***BEGINNING AT '):
index_start += i
if list_of_lines[i].startswith('*** ENDING AT'):
index_end += i
valid_lines = list_of_lines[index_start : index_end]
valid_lines = "".join(str(x) for x in valid_lines)
valid_lines = valid_lines.lower()
valid_lines = valid_lines.split()
valid_words = []
words_on_line = []
for line in valid_lines:
words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line)
for word in words_on_line:
valid_words.append(word)
return valid_words
filename = "file2.txt"
words = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words)))
print("Valid word list:")
for word in words:
print(word)
the current output is:
file2.txt loaded ok.
0 valid words found.
Valid word list:
But I'm trying to get:
file2.txt loaded ok.
4 valid words found.
Valid word list:
need
these
words
here
My thinking is its something wrong with the first section, but new to python and programming as a whole so not too sure
Anything would help thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)