在Windows上运行Gensim的LDA模型时运行时错误

发布于 2025-01-30 08:46:08 字数 1328 浏览 2 评论 0原文

好的，所以我知道这是一个Windows错误，而不是Gensim错误。根据Internet上的先前示例以及堆栈溢出的其他注释/解决方案，我提出了下面的代码。但是，该代码永远不会达到连贯分数的打印。详细信息是：Windows 10，Visual Code，Python 3.8.13。

我的问题知道如何解决这个问题或我做错了什么？

from multiprocessing import Process, freeze_support
import re
from sklearn.datasets import fetch_20newsgroups
from gensim.models.coherencemodel import CoherenceModel
from gensim.corpora.dictionary import Dictionary

def main():
    print("start main")
    texts, _ = fetch_20newsgroups( subset='all', remove=('headers', 'footers', 'quotes'), return_X_y=True )
    tokenizer = lambda s: re.findall( '\w+', s.lower() )
    texts = [ tokenizer(t) for t in  texts ]

    # Creating some random topics
    topics = [ ['space', 'planet', 'mars', 'galaxy'],
            ['cold', 'medicine', 'doctor', 'health', 'water'],
            ['cats', 'health', 'keyboard', 'car', 'banana'],
            ['windows', 'mac', 'computer', 'operating', 'system']
            ]

    # Creating a dictionary with the vocabulary
    word2id = Dictionary( texts )

    # Coherence model
    cm = CoherenceModel(topics=topics, texts=texts, coherence='c_v', dictionary=word2id)

    coherence_per_topic = cm.get_coherence()
    print("coherence", coherence_per_topic)

if __name__ == '__main__':
    freeze_support()
    Process(target=main).start()

原文

Okay, so i know this is a Windows error and not a Gensim error. Based on previous examples on the internet and other comments/solutions from Stack Overflow I came up with the code below. However, the code never makes it to the print of the coherence score. The details are: Windows 10, Visual Code, Python 3.8.13.

My question is any idea how to fix this or what I am doing wrong?

from multiprocessing import Process, freeze_support
import re
from sklearn.datasets import fetch_20newsgroups
from gensim.models.coherencemodel import CoherenceModel
from gensim.corpora.dictionary import Dictionary

def main():
    print("start main")
    texts, _ = fetch_20newsgroups( subset='all', remove=('headers', 'footers', 'quotes'), return_X_y=True )
    tokenizer = lambda s: re.findall( '\w+', s.lower() )
    texts = [ tokenizer(t) for t in  texts ]

    # Creating some random topics
    topics = [ ['space', 'planet', 'mars', 'galaxy'],
            ['cold', 'medicine', 'doctor', 'health', 'water'],
            ['cats', 'health', 'keyboard', 'car', 'banana'],
            ['windows', 'mac', 'computer', 'operating', 'system']
            ]

    # Creating a dictionary with the vocabulary
    word2id = Dictionary( texts )

    # Coherence model
    cm = CoherenceModel(topics=topics, texts=texts, coherence='c_v', dictionary=word2id)

    coherence_per_topic = cm.get_coherence()
    print("coherence", coherence_per_topic)

if __name__ == '__main__':
    freeze_support()
    Process(target=main).start()

分享到QQ

分享到微博