如何在各个主题中计算出伯尔图类的主题下的每个文档概率？

发布于 2025-01-31 02:27:00 字数 898 浏览 5 评论 0原文

我正在尝试使用bertopic来分析文档的主题分布，在执行bertopic之后，我想计算每个文档中各个主题下的概率，我应该如何做它？

# define model
model = BERTopic(verbose=True,
                 vectorizer_model=vectorizer_model,
                 embedding_model='paraphrase-MiniLM-L3-v2',
                 min_topic_size= 50,
                 nr_topics=10)

#  train model
headline_topics, _ = model.fit_transform(df1.review_processed3)

# examine one of the topic
a_topic = freq.iloc[0]["Topic"] # Select the 1st topic
model.get_topic(a_topic) # Show the words and their c-TF-IDF scores

以下是其中一个主题的单词及其C-TF-IDF得分 image 1

我应该如何将结果更改为主题分布，以便在下面计算主题分布得分并确定主要主题？图像2

原文

I am trying to use BERTopic to analyze the topic distribution of documents, after BERTopic is performed, I would like to calculate the probabilities under respective topics per document, how should I did it?

# define model
model = BERTopic(verbose=True,
                 vectorizer_model=vectorizer_model,
                 embedding_model='paraphrase-MiniLM-L3-v2',
                 min_topic_size= 50,
                 nr_topics=10)

#  train model
headline_topics, _ = model.fit_transform(df1.review_processed3)

# examine one of the topic
a_topic = freq.iloc[0]["Topic"] # Select the 1st topic
model.get_topic(a_topic) # Show the words and their c-TF-IDF scores

Below is the words and their c-TF-IDF scores for one of the Topics
image 1

How should I change the result into Topic Distribution as below in order to calculate the topic distribution score and also identify the main topic?
image 2

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

二智少女 2025-02-07 02:27:00

首先，要计算概率，您必须添加到模型定义calculate_probabilities = true（如果您有很多文档，这可能会减慢主题的提取，＆gt; 100000）。

# define model
model = BERTopic(verbose=True,
                 vectorizer_model=vectorizer_model,
                 embedding_model='paraphrase-MiniLM-L3-v2',
                 min_topic_size= 50,
                 nr_topics=10,
                 calculate_probabilities=True)

然后，调用fit_transform，您应该保存概率：

headline_topics, probs = model.fit_transform(df1.review_processed3)

现在，您可以创建PANDAS DataFrame，该文件框架显示每个文档各个主题下的概率。

import pandas as pd
probs_df=pd.DataFrame(probs)
probs_df['main percentage'] = pd.DataFrame({'max': probs_df.max(axis=1)})

First, to compute probabilities, you have to add to your model definition calculate_probabilities=True (this could slow down the extraction of topics if you have many documents, > 100000).

# define model
model = BERTopic(verbose=True,
                 vectorizer_model=vectorizer_model,
                 embedding_model='paraphrase-MiniLM-L3-v2',
                 min_topic_size= 50,
                 nr_topics=10,
                 calculate_probabilities=True)

Then, calling fit_transform, you should save the probabilities:

headline_topics, probs = model.fit_transform(df1.review_processed3)

Now, you can create a pandas dataframe which shows probabilities under respective topics per document.

import pandas as pd
probs_df=pd.DataFrame(probs)
probs_df['main percentage'] = pd.DataFrame({'max': probs_df.max(axis=1)})

回复收藏 0 原文

~没有更多了~

关于作者

家住魔仙堡

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

如何在各个主题中计算出伯尔图类的主题下的每个文档概率？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

如何在各个主题中计算出伯尔图类的主题下的每个文档概率？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

lylex099819

yg

mb_PT8LkUS5

埋情葬爱

佚名

奢望

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。