使用主题建模或另一种NLP方法,是否可以定义一个单词,这些单词进入主题/类别以获得更好的定义主题模型?
我有一个问题,我正在使用主题建模并考虑LDA&然而,LSA方法发现,某些主题并未像我喜欢的那样准确地定义。是否可以将单词定义为主题以帮助机器学习更好,更轻松?如果没有,我可以使用哪些技术来解决这个问题?
如前所述,我尝试过用于主题建模的LDA和LSA技术,发现LDA最准确地给出了连贯得分为0.46,并重新定义了主题名称。但是,主题中的单词不能反映主题名称,这需要对模型进行调整。
我已经研究了其他NLP解决方案,例如关键字提取和命名实体关系(NER),但不认为它们适合我的问题。
如果可能的话,我希望有2个级别的分类,其中1级是概述,而2级则更详细。下面的示例是一个松散的总结客户反馈示例:
级别1
培训
交流
技术
产品&服务
其他
级别2
内部
外部
分辨率良好
分辨率不好
不清楚的反馈
情况下,这是我希望主题建模输出产生的格式,但不确定这是否可行吗?
实际上,研究文本的加权将起作用。示例:
'公司的出色培训' - 将被归类为培训(1级)和分辨率良好(级别2)。在这里拾取的单词非常好,并且在分类方面超过其他单词时,训练很棒。
如果需要,很高兴提供更多信息。
I have a problem where I am using topic modelling and taking into consideration LDA & LSA approaches however have found that some of the topics are not being defined as accurately as I like. Is it possible to define words into topics to help the allow the machine to learn better and easier? If not, what techniques could I alternatively use to counter this problem?
As previously explained, I have tried LDA and LSA techniques for topic modelling and found LDA to be most accurate giving a coherence score of 0.46, and have redefined the topic names. However, the words in the topics do not reflect the topic names, and this requires tuning of the model.
I have researched into other NLP solutions such as keyword extractions and named entity relationship (NER) but do not think they are suitable for my problem.
I am wanting to have 2 levels of categorization if possible, where level 1 is an overview and level 2 is in more detail. The example below is a loosely summarized customer feedback example:
Level 1
Training
Communication
Technology
Products & Services
Other
Level 2
Internal
External
Resolution Good
Resolution Bad
Unclear feedback
Ideally this is the format I would like the topic modelling output to produce but unsure if this is viable?
Realistically, working on the weighting of the text would work. Example:
'Great training from the company' - Would be categorized as Training (Level 1) and Resolution Good (level 2). The words being picked up here are great and training as they outweigh the other words in terms of categorization.
Happy to provide further information if required.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如您所知,主题建模通常是一种无监督技术,因此我很难想象您可以仅使用这种方法来解决复杂的问题(2 个分类级别)。也许主题建模可能是第一步,它可以帮助您进行后续的监督方法。
无论如何,如果您想尝试提供一些单词来指导主题建模任务,至少有两个库可供查看:
请分享您对此任务的最新进展。
As you understand, topic modelling is generally an unsupervised technique, so I hardly imagine you can solve your complex problem (2 levels of classification) just using this approach. Perhaps topic modelling could be a first step, which can help you in a subsequent supervised approach.
In any case, if you want to try to provide some words in order to guide the topic modelling task, there are at least two libraries to take a look at:
Please share your updates on this task.
似乎不可能获得多个级别来回答我的问题,但是解决此问题的方法是运行主题建模方法两次以获得 2 个不同的级别。但是,这需要对主题输出的定义以及您尝试在每个主题中定义的内容进行更多监督。
经过广泛研究后,我发现有用的技术方法是 CorEx -https://github.com/gregversteeg/corex_topic
它允许您自定义主题的数量,更重要的是您想要在每个主题中定义的单词。我发现这回答了我对更受监督的方法的疑问。
It seems that it is not possible to get multiple levels to answer my questions however a way around this is by running the topic modelling approach twice to get 2 different levels. However, this requires more sueprvision in terms to the definition of the topic outputs and what you are trying to define in each topic.
The technique approach I found useful after extensive research was CorEx -https://github.com/gregversteeg/corex_topic
It allow you to self define the number of topics and more importantly the words you want in each topic. I found that this answered my query to a more supervised approach.