TF-IDF计算KeyError
我想计算文本文档的文档频率。首先,我创建了术语词典并计算了术语频率。我在这些步骤中没有任何问题,但是当我尝试使用下面的函数时,它会出现错误:
def computeDF(docList):
df = {}
df = dict.fromkeys(docList[0].keys(), 0)
for doc in docList:
for word, val in doc.items():
if val > 0:
df[word] += 1
for word, val in df.items():
df[word] = float(val)
return df
像这样调用函数:
dictList = []
for i in range(N):
# creating dictionary for all documents
tokens = processed_text[i]
dictionary = dict.fromkeys(tokens,0)
# calculation of term frequencies for all documents
for word in tokens:
dictionary[word] += 1
tf = termFreq(dictionary, tokens)
dictList.append(dictionary)
df = computeDF(dictList)
我使用 10 个字典的列表调用该函数,因为它与列表对象一起使用。
N = 10(文档数) dictList
像这样继续: dictList
错误:
line 155, in <module> df = computeDF(dictList)
line 134, in computeDF df[word] += 1
KeyError: 'flagstaff'
当我尝试时它有效具有相同对象类型的不同 python 文件中的函数。我不明白有什么问题。我该如何解决这个问题?
I want to calculate document frequencies of text documents. First I created the term dictionary and calculated the term frequencies. I have no problems in these steps, but when I try to use the function below it gives an error:
def computeDF(docList):
df = {}
df = dict.fromkeys(docList[0].keys(), 0)
for doc in docList:
for word, val in doc.items():
if val > 0:
df[word] += 1
for word, val in df.items():
df[word] = float(val)
return df
Called the function like this:
dictList = []
for i in range(N):
# creating dictionary for all documents
tokens = processed_text[i]
dictionary = dict.fromkeys(tokens,0)
# calculation of term frequencies for all documents
for word in tokens:
dictionary[word] += 1
tf = termFreq(dictionary, tokens)
dictList.append(dictionary)
df = computeDF(dictList)
I called the function with list of 10 dictionaries, because it works with list object.
N = 10 (num of documents)dictList
continues like this: dictList
Error:
line 155, in <module> df = computeDF(dictList)
line 134, in computeDF df[word] += 1
KeyError: 'flagstaff'
It works when I try the function in different python file with same object types. I don't understand what is the problem. How can I solve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您有 df = dict.fromkeys(docList[0].keys(), 0) ,您需要类似的东西,
这样您就拥有所有文档的密钥,而不仅仅是第一个文档。如果你想在一行中完成它,你可以这样做:
Where you have
df = dict.fromkeys(docList[0].keys(), 0)
you need something likeThat way you have keys for all your docs not just the first one. If you want todo it in one line you can do it like this: