在Windows上运行Gensim的LDA模型时运行时错误
好的,所以我知道这是一个Windows错误,而不是Gensim错误。根据Internet上的先前示例以及堆栈溢出的其他注释/解决方案,我提出了下面的代码。但是,该代码永远不会达到连贯分数的打印。详细信息是:Windows 10,Visual Code,Python 3.8.13。
我的问题知道如何解决这个问题或我做错了什么?
from multiprocessing import Process, freeze_support
import re
from sklearn.datasets import fetch_20newsgroups
from gensim.models.coherencemodel import CoherenceModel
from gensim.corpora.dictionary import Dictionary
def main():
print("start main")
texts, _ = fetch_20newsgroups( subset='all', remove=('headers', 'footers', 'quotes'), return_X_y=True )
tokenizer = lambda s: re.findall( '\w+', s.lower() )
texts = [ tokenizer(t) for t in texts ]
# Creating some random topics
topics = [ ['space', 'planet', 'mars', 'galaxy'],
['cold', 'medicine', 'doctor', 'health', 'water'],
['cats', 'health', 'keyboard', 'car', 'banana'],
['windows', 'mac', 'computer', 'operating', 'system']
]
# Creating a dictionary with the vocabulary
word2id = Dictionary( texts )
# Coherence model
cm = CoherenceModel(topics=topics, texts=texts, coherence='c_v', dictionary=word2id)
coherence_per_topic = cm.get_coherence()
print("coherence", coherence_per_topic)
if __name__ == '__main__':
freeze_support()
Process(target=main).start()
Okay, so i know this is a Windows error and not a Gensim error. Based on previous examples on the internet and other comments/solutions from Stack Overflow I came up with the code below. However, the code never makes it to the print of the coherence score. The details are: Windows 10, Visual Code, Python 3.8.13.
My question is any idea how to fix this or what I am doing wrong?
from multiprocessing import Process, freeze_support
import re
from sklearn.datasets import fetch_20newsgroups
from gensim.models.coherencemodel import CoherenceModel
from gensim.corpora.dictionary import Dictionary
def main():
print("start main")
texts, _ = fetch_20newsgroups( subset='all', remove=('headers', 'footers', 'quotes'), return_X_y=True )
tokenizer = lambda s: re.findall( '\w+', s.lower() )
texts = [ tokenizer(t) for t in texts ]
# Creating some random topics
topics = [ ['space', 'planet', 'mars', 'galaxy'],
['cold', 'medicine', 'doctor', 'health', 'water'],
['cats', 'health', 'keyboard', 'car', 'banana'],
['windows', 'mac', 'computer', 'operating', 'system']
]
# Creating a dictionary with the vocabulary
word2id = Dictionary( texts )
# Coherence model
cm = CoherenceModel(topics=topics, texts=texts, coherence='c_v', dictionary=word2id)
coherence_per_topic = cm.get_coherence()
print("coherence", coherence_per_topic)
if __name__ == '__main__':
freeze_support()
Process(target=main).start()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论