错误“不可散列类型：“numpy.ndarray””在计数器功能中

发布于 2025-01-12 00:41:00 字数 391 浏览 0 评论 0 原文

我正在尝试使用 Counter 来获取训练数据集中最常见的单词，但我无法使用列表列表。

这是我的代码：

from collections import Counter

#Inverse transformation of the train dataset
X_train_t=vectorizer.inverse_transform(X_train)
print (Counter(X_train_t).most_common()[-1])

但是我得到了不可散列的类型：'numpy.ndarray'。

X_train_t 的类型为 type(X_train_t)=。

原文

I'm trying to use Counter to get the most common word in my train dataset, but I can't use with a list of lists.

This is my code:

from collections import Counter

#Inverse transformation of the train dataset
X_train_t=vectorizer.inverse_transform(X_train)
print (Counter(X_train_t).most_common()[-1])

But I get unhashable type: 'numpy.ndarray'.

X_train_t is of type type(X_train_t)=<class 'list'>.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初见你 2025-01-19 00:41:00

inverse_transform( ) 以以下形式返回 numpy.arrays 列表[array(['this', 'is', 'the', 'first', 'document'], dtype='。使用 numpy.concatenate() 将它们连接起来，以获得以下形式的单个数组：['this' 'is' 'the' 'first' 'document' 'this'... ]。

Counter 默认按降序排序。因此，[0] 为您提供最常用的单词。

这是一个完整的工作示例：

from collections import Counter

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

corpus = [
    'This is the first document.',
     'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer()

X_train = vectorizer.fit_transform(corpus)

X_train_t=vectorizer.inverse_transform(X_train)
X_train_t= np.concatenate(X_train_t, axis=0 )
print (Counter(X_train_t).most_common()[0])

inverse_transform() returns a list of numpy.arrays in the form of [array(['this', 'is', 'the', 'first', 'document'], dtype='<U8'), ...].
Join them using numpy.concatenate() to obtain a single array in the form of: ['this' 'is' 'the' 'first' 'document' 'this'...].

Counter sorts in descending order by default. Thus, [0] gives you the most frequent word.

Here as a full working example:

from collections import Counter

import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

corpus = [
    'This is the first document.',
     'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer()

X_train = vectorizer.fit_transform(corpus)

X_train_t=vectorizer.inverse_transform(X_train)
X_train_t= np.concatenate(X_train_t, axis=0 )
print (Counter(X_train_t).most_common()[0])

回复收藏 0 原文

~没有更多了~