从大熊猫数据框架中可视化最常见的术语的WordCloud

发布于 2025-02-05 07:46:28 字数 890 浏览 4 评论 0原文

我有一个像这样的大熊猫数据框的子集：

df = pd.DataFrame({index: [0,1,2]}, {viz_words: ['palace boat painting reel', ['paintings painting gallery'], ['biscuits cake gallery cafe']}

<OUT> df
index       viz_words
0           palace boat painting reel
1           paintings painting gallery
2           biscuits cake gallery cafe

我试图从viz_words列中获取所有单词，并创建一个可视化的较高频率术语的WordCloud，因为较大和较小的频率项较小。

到目前为止，我有这个问题，但是问题是它仅在viz_words之间可视化“常见”术语，这意味着它在很多术语中都错过了。

dictionary_small_scale = dict([tuple(x) for x in df.viz_words.str.split(expand=True).stack().value_counts().\
reset_index().values])

#generate wordcloud and plot using matplotlib 
wordcloud = WordCloud()
wordcloud.generate_from_frequencies(dictionary_small_scale)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

原文

I have a subset of a pandas dataframe like so:

df = pd.DataFrame({index: [0,1,2]}, {viz_words: ['palace boat painting reel', ['paintings painting gallery'], ['biscuits cake gallery cafe']}

<OUT> df
index       viz_words
0           palace boat painting reel
1           paintings painting gallery
2           biscuits cake gallery cafe

I am trying to take all words from viz_words column and create a wordcloud that visualises higher-frequency terms larger as larger and smaller-frequency terms smaller.

I have this so far, but the issue is that it is only visualising the "common" terms among viz_words which means it misses out on a lot of terms.

dictionary_small_scale = dict([tuple(x) for x in df.viz_words.str.split(expand=True).stack().value_counts().\
reset_index().values])

#generate wordcloud and plot using matplotlib 
wordcloud = WordCloud()
wordcloud.generate_from_frequencies(dictionary_small_scale)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

分享到QQ

分享到微博