错误“不可散列类型:“numpy.ndarray””在计数器功能中

发布于 2025-01-12 00:41:00 字数 391 浏览 0 评论 0 原文

我正在尝试使用 Counter 来获取训练数据集中最常见的单词,但我无法使用列表列表。

这是我的代码:

from collections import Counter

#Inverse transformation of the train dataset
X_train_t=vectorizer.inverse_transform(X_train)
print (Counter(X_train_t).most_common()[-1])

但是我得到了不可散列的类型:'numpy.ndarray'。

X_train_t 的类型为 type(X_train_t)=

I'm trying to use Counter to get the most common word in my train dataset, but I can't use with a list of lists.

This is my code:

from collections import Counter

#Inverse transformation of the train dataset
X_train_t=vectorizer.inverse_transform(X_train)
print (Counter(X_train_t).most_common()[-1])

But I get unhashable type: 'numpy.ndarray'.

X_train_t is of type type(X_train_t)=<class 'list'>.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

初见你 2025-01-19 00:41:00

inverse_transform( ) 以以下形式返回 numpy.arrays 列表[array(['this', 'is', 'the', 'first', 'document'], dtype='
使用 numpy.concatenate() 将它们连接起来,以获得以下形式的单个数组:['this' 'is' 'the' 'first' 'document' 'this'... ]

Counter 默认按降序排序。因此,[0] 为您提供最常用的单词。

这是一个完整的工作示例:

from collections import Counter

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

corpus = [
    'This is the first document.',
     'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer()

X_train = vectorizer.fit_transform(corpus)

X_train_t=vectorizer.inverse_transform(X_train)
X_train_t= np.concatenate(X_train_t, axis=0 )
print (Counter(X_train_t).most_common()[0])

inverse_transform() returns a list of numpy.arrays in the form of [array(['this', 'is', 'the', 'first', 'document'], dtype='<U8'), ...].
Join them using numpy.concatenate() to obtain a single array in the form of: ['this' 'is' 'the' 'first' 'document' 'this'...].

Counter sorts in descending order by default. Thus, [0] gives you the most frequent word.

Here as a full working example:

from collections import Counter

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

corpus = [
    'This is the first document.',
     'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer()

X_train = vectorizer.fit_transform(corpus)

X_train_t=vectorizer.inverse_transform(X_train)
X_train_t= np.concatenate(X_train_t, axis=0 )
print (Counter(X_train_t).most_common()[0])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文