您如何使用Python打开并阅读大文件?
基本任务是编写一个函数,get_words_from_file(文件名),该函数返回感兴趣区域内的较低案例单词的列表。他们与您分享正则表达式:“ [az]+[ - '] [az]+| [az]+[']?| [az]+”,找到所有符合此定义的单词。我的代码在某些测试上效果很好,但在较大的文件上失败了,因此我认为我为较大的文件打开了错误的文件。 这些测试没有问题;
filename = "abc.txt"
words2 = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words2)))
print("Valid word list:")
print("\n".join(words2))
#or
filename = "synthetic.txt"
words = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words)))
print("Valid word list:")
for word in words:
print(word)
#my code:
import re
def get_words_from_file(filename):
"""Returns a list of lower case words that are with the region of interest, every
word in the text file, but, not any of the punctuation."""
with open(filename, 'r', encoding='utf-8') as file:
flag = False
words = []
for line in file:
if(str(line).strip()=="*** START OF"):
flag=True
elif(str(line).strip()=="*** END "):
flag=False
break
elif(flag):
new_line = line.lower()
words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+",
new_line)
words.extend(words_on_line)
return words
任何帮助都很棒!
The basic task is to write a function, get_words_from_file(filename), that returns a list of lower case words that are within the region of interest. They share with you a regular expression: "[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", that finds all words that meet this definition. My code works well on some of the tests but fails on a larger file so I think I'm opening the file wrong for bigger files.
Have no problem with these tests;
filename = "abc.txt"
words2 = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words2)))
print("Valid word list:")
print("\n".join(words2))
#or
filename = "synthetic.txt"
words = get_words_from_file(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(words)))
print("Valid word list:")
for word in words:
print(word)
#my code:
import re
def get_words_from_file(filename):
"""Returns a list of lower case words that are with the region of interest, every
word in the text file, but, not any of the punctuation."""
with open(filename, 'r', encoding='utf-8') as file:
flag = False
words = []
for line in file:
if(str(line).strip()=="*** START OF"):
flag=True
elif(str(line).strip()=="*** END "):
flag=False
break
elif(flag):
new_line = line.lower()
words_on_line = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+",
new_line)
words.extend(words_on_line)
return words
Any help would be awesome!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论