在Windows上运行Gensim的LDA模型时运行时错误

发布于 2025-01-30 08:46:08 字数 1328 浏览 2 评论 0原文

好的,所以我知道这是一个Windows错误,而不是Gensim错误。根据Internet上的先前示例以及堆栈溢出的其他注释/解决方案,我提出了下面的代码。但是,该代码永远不会达到连贯分数的打印。详细信息是:Windows 10,Visual Code,Python 3.8.13。

我的问题知道如何解决这个问题或我做错了什么?

from multiprocessing import Process, freeze_support
import re
from sklearn.datasets import fetch_20newsgroups
from gensim.models.coherencemodel import CoherenceModel
from gensim.corpora.dictionary import Dictionary

def main():
    print("start main")
    texts, _ = fetch_20newsgroups( subset='all', remove=('headers', 'footers', 'quotes'), return_X_y=True )
    tokenizer = lambda s: re.findall( '\w+', s.lower() )
    texts = [ tokenizer(t) for t in  texts ]

    # Creating some random topics
    topics = [ ['space', 'planet', 'mars', 'galaxy'],
            ['cold', 'medicine', 'doctor', 'health', 'water'],
            ['cats', 'health', 'keyboard', 'car', 'banana'],
            ['windows', 'mac', 'computer', 'operating', 'system']
            ]

    # Creating a dictionary with the vocabulary
    word2id = Dictionary( texts )

    # Coherence model
    cm = CoherenceModel(topics=topics, texts=texts, coherence='c_v', dictionary=word2id)

    coherence_per_topic = cm.get_coherence()
    print("coherence", coherence_per_topic)

if __name__ == '__main__':
    freeze_support()
    Process(target=main).start()    

Okay, so i know this is a Windows error and not a Gensim error. Based on previous examples on the internet and other comments/solutions from Stack Overflow I came up with the code below. However, the code never makes it to the print of the coherence score. The details are: Windows 10, Visual Code, Python 3.8.13.

My question is any idea how to fix this or what I am doing wrong?

from multiprocessing import Process, freeze_support
import re
from sklearn.datasets import fetch_20newsgroups
from gensim.models.coherencemodel import CoherenceModel
from gensim.corpora.dictionary import Dictionary

def main():
    print("start main")
    texts, _ = fetch_20newsgroups( subset='all', remove=('headers', 'footers', 'quotes'), return_X_y=True )
    tokenizer = lambda s: re.findall( '\w+', s.lower() )
    texts = [ tokenizer(t) for t in  texts ]

    # Creating some random topics
    topics = [ ['space', 'planet', 'mars', 'galaxy'],
            ['cold', 'medicine', 'doctor', 'health', 'water'],
            ['cats', 'health', 'keyboard', 'car', 'banana'],
            ['windows', 'mac', 'computer', 'operating', 'system']
            ]

    # Creating a dictionary with the vocabulary
    word2id = Dictionary( texts )

    # Coherence model
    cm = CoherenceModel(topics=topics, texts=texts, coherence='c_v', dictionary=word2id)

    coherence_per_topic = cm.get_coherence()
    print("coherence", coherence_per_topic)

if __name__ == '__main__':
    freeze_support()
    Process(target=main).start()    

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文