如何在各个主题中计算出伯尔图类的主题下的每个文档概率?
我正在尝试使用bertopic
来分析文档的主题分布,在执行bertopic
之后,我想计算每个文档中各个主题下的概率,我应该如何做它?
# define model
model = BERTopic(verbose=True,
vectorizer_model=vectorizer_model,
embedding_model='paraphrase-MiniLM-L3-v2',
min_topic_size= 50,
nr_topics=10)
# train model
headline_topics, _ = model.fit_transform(df1.review_processed3)
# examine one of the topic
a_topic = freq.iloc[0]["Topic"] # Select the 1st topic
model.get_topic(a_topic) # Show the words and their c-TF-IDF scores
以下是其中一个主题的单词及其C-TF-IDF得分 image 1
我应该如何将结果更改为主题分布,以便在下面计算主题分布得分并确定主要主题? 图像2
I am trying to use BERTopic
to analyze the topic distribution of documents, after BERTopic
is performed, I would like to calculate the probabilities under respective topics per document, how should I did it?
# define model
model = BERTopic(verbose=True,
vectorizer_model=vectorizer_model,
embedding_model='paraphrase-MiniLM-L3-v2',
min_topic_size= 50,
nr_topics=10)
# train model
headline_topics, _ = model.fit_transform(df1.review_processed3)
# examine one of the topic
a_topic = freq.iloc[0]["Topic"] # Select the 1st topic
model.get_topic(a_topic) # Show the words and their c-TF-IDF scores
Below is the words and their c-TF-IDF scores for one of the Topics
image 1
How should I change the result into Topic Distribution as below in order to calculate the topic distribution score and also identify the main topic?
image 2
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,要计算概率,您必须添加到模型定义
calculate_probabilities = true
(如果您有很多文档,这可能会减慢主题的提取,> 100000)。然后,调用
fit_transform
,您应该保存概率:现在,您可以创建PANDAS DataFrame,该文件框架显示每个文档各个主题下的概率。
First, to compute probabilities, you have to add to your model definition
calculate_probabilities=True
(this could slow down the extraction of topics if you have many documents, > 100000).Then, calling
fit_transform
, you should save the probabilities:Now, you can create a pandas dataframe which shows probabilities under respective topics per document.