使用 LingPipe 进行分层情感分析
这是在使用 LingPipe 机器学习工具进行情感分析的背景下进行的。我必须对大段落中的句子是否具有积极/消极情绪进行分类。我知道 LingPipe 中的以下方法
根据极性(负或正)对完整段落进行分类。
在这里,我还不知道句子级别的极性。我们仍然处于段落级别。如何确定段落的句子级别的极性,即段落中的句子是否是肯定/否定句子?我知道 LingPipe 能够对句子进行主观/客观分类。所以采用这种方法,,,,
,,,, 我应该
首先在大量主观/客观句子上训练 LingPipe。
- 使用经过训练的模型从测试段落中提取所有主观句子。
- 根据提取的主观句子的极性,通过手动将其标记为正/负来训练 LingPipe 分类器。
现在使用经过训练的极性模型并输入一个测试主观句子(通过将句子传递给经过训练的主观/客观)模型,然后确定该陈述是正面还是负面?
上面的方法有效吗?在上面提出的方法中,我们知道 LingPipe 能够接受大量文本内容(段落)进行极性分类。如果我们只通过一个主观句子进行极性分类,效果会好吗?我很困惑!
This is in the context of doing sentiment analysis using LingPipe machine learning tool. I have to classify if a sentence in a big paragraph has a positive/negative sentiment. I know of the following approach in LingPipe
Classify if the complete paragraph based on its polarity - negative or positive.
Here, I yet don't know the polarity at the sentence level. We are still at the paragraph level. How do I determine the polarity at the sentence level of a paragraph, of whether a sentence in a paragraph is a positive/negative sentence? I know that LingPipe is capable of classifying if a sentence is subjective/objective. So using this approach,,,,
,,,, should I
First train LingPipe on a large set of sentences that are subjective/objective.
- Use the trained model to extract all subjective sentences out of a test paragraph.
- Train a LingPipe classifier based on the extracted subjective sentences for polarity by manually labeling them as positive/negative.
Now used the trained polarity model and feed a test subjective sentence (that is done by passing a sentence through the trained subjective/objective) model, and then determine if the statement is positive/negative?
Does the above approach work? In the above proposed approach, we know that LingPipe is capable of accepting a large textual content (paragraph) for polarity classification. Will it do a good job if we just pass a single subjective sentence for polarity classification? I am confused!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可能想看看文献中的多层次分析方法,例如
Li, S. 等人。 (2010)。 “利用组合多级模型进行文档情感分析”,2010 年模式识别国际会议。
叶塞娜琳娜,A.,等人。 (2010)。 级结构化模型”,2010 年自然语言处理经验方法会议论文集,第 1046-1056 页,美国马萨诸塞州麻省理工学院,2010 年 10 月 9-11 日。
“文档级情感分类的多 在信息检索中很常见,例如向量空间相似性搜索的内容索引。
像 Ling Pipe 这样的环境是一个很好的入门方式,但最终您需要使用较低级别、更细粒度的工具,例如 yura 建议的工具。
You might want to take a look at the multi-level analysis approaches in the literature, e.g.
Li, S., et al. (2010). "Exploiting Combined Multi-level Model for Document Sentiment Analysis," 2010 International Conference on Pattern Recognition.
Yessenalina, A., et al. (2010). "Multi-level Structured Models for Document-level Sentiment Classification," Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1046–1056,MIT, Massachusetts, USA, 9-11 October 2010.
Multi-level analysis approaches are quite common in information retrieval, as in content indexing for vector space similarity search.
Environments such as Ling Pipe are a good way to get started but eventually you need to employ lower level, finer grained tools such as yura suggested.
大多数机器学习库(包括 lingpipe)都是基于行的(具有平面特征的对象)。因此,如果您想用它进行一些层次分类,您应该对数据进行去标准化。例如,您可以在同一特征集中拥有段落和句子的特征。如果仅按单词分类,则可以创建此类功能 PARGRAPH_WORDX=true、SENTENCE_WORDX=true。
其他一些工具包允许您在不进行非规范化的情况下表达模型,这就是所谓的图形模型,例如 CRF、ACRF、马尔可夫模型等,您可以在 mallet 和 Factorie 中找到这些模型的实现。
Most machine leraning libraries including lingpipe are row based(object with planar features) . So if you want do some hierarchical classification with it you should denormolize you data. for example you can have features of paragrahp and sentence at same feature set. If you use by word only clasification you can create such features PARGRAPH_WORDX=true, SENTENCE_WORDX=true.
Some other toolkits allow you to express you model withot denormalisation, it is so called graphical models exampels are CRF, ACRF, Markov Models etc implementation of those you can find in mallet and Factorie.