TF-IDF计算KeyError

发布于 2025-01-15 14:25:56 字数 1177 浏览 3 评论 0原文

我想计算文本文档的文档频率。首先，我创建了术语词典并计算了术语频率。我在这些步骤中没有任何问题，但是当我尝试使用下面的函数时，它会出现错误：

def computeDF(docList):
    df = {}
    df = dict.fromkeys(docList[0].keys(), 0)
    
    for doc in docList:
        for word, val in doc.items():
            if val > 0:
                df[word] += 1

    for word, val in df.items():
        df[word] = float(val)

    return df

像这样调用函数：

dictList = []
for i in range(N):
    # creating dictionary for all documents
    tokens = processed_text[i]
    dictionary = dict.fromkeys(tokens,0)

    # calculation of term frequencies for all documents
    for word in tokens:
        dictionary[word] += 1
        tf = termFreq(dictionary, tokens)
        dictList.append(dictionary)

    df = computeDF(dictList)

我使用 10 个字典的列表调用该函数，因为它与列表对象一起使用。

N = 10（文档数） dictList 像这样继续： dictList

错误：

line 155, in <module> df = computeDF(dictList)

line 134, in computeDF df[word] += 1
KeyError: 'flagstaff'

当我尝试时它有效具有相同对象类型的不同 python 文件中的函数。我不明白有什么问题。我该如何解决这个问题？

原文

I want to calculate document frequencies of text documents. First I created the term dictionary and calculated the term frequencies. I have no problems in these steps, but when I try to use the function below it gives an error:

def computeDF(docList):
    df = {}
    df = dict.fromkeys(docList[0].keys(), 0)
    
    for doc in docList:
        for word, val in doc.items():
            if val > 0:
                df[word] += 1

    for word, val in df.items():
        df[word] = float(val)

    return df

Called the function like this:

dictList = []
for i in range(N):
    # creating dictionary for all documents
    tokens = processed_text[i]
    dictionary = dict.fromkeys(tokens,0)

    # calculation of term frequencies for all documents
    for word in tokens:
        dictionary[word] += 1
        tf = termFreq(dictionary, tokens)
        dictList.append(dictionary)

    df = computeDF(dictList)

I called the function with list of 10 dictionaries, because it works with list object.

N = 10 (num of documents)
dictList continues like this: dictList

Error:

line 155, in <module> df = computeDF(dictList)

line 134, in computeDF df[word] += 1
KeyError: 'flagstaff'

It works when I try the function in different python file with same object types. I don't understand what is the problem. How can I solve this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

动听の歌 2025-01-22 14:25:57

如果您有 df = dict.fromkeys(docList[0].keys(), 0) ，您需要类似的东西，

keys = set()
for doc in docList:
    keys = keys.union(set(doc.keys()))
df = dict.fromkeys(docList[0].keys(), 0)

这样您就拥有所有文档的密钥，而不仅仅是第一个文档。如果你想在一行中完成它，你可以这样做：

keys = set().union(*[set(doc.keys()) for doc in docList])

Where you have df = dict.fromkeys(docList[0].keys(), 0) you need something like

keys = set()
for doc in docList:
    keys = keys.union(set(doc.keys()))
df = dict.fromkeys(docList[0].keys(), 0)

That way you have keys for all your docs not just the first one. If you want todo it in one line you can do it like this:

keys = set().union(*[set(doc.keys()) for doc in docList])

回复收藏 0 原文

~没有更多了~

关于作者

地狱即天堂

暂无简介

文章

28 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

TF-IDF计算KeyError

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

TF-IDF计算KeyError

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。