我希望使用一些以前从未使用过的新数据源来训练朴素贝叶斯。我已经看过 Lee & 了。 IMDB 评论的 Pang 语料库和 MPQA 意见语料库。我正在寻找符合以下标准的新网络服务。
- 容易分类 - 必须有喜欢/不喜欢或 5 星级评级
- 随时可用
- 与新材料相关(不如前两个重要)
- Etsy API
- 烂番茄 API
- Yelp API
任何其他建议将不胜感激 =)
I'm looking to train a naive Bayes with some new data sources that haven't been used before. I've already looked at the Lee & Pang corpus of IMDB reviews and the MPQA opinion corpus. I'm looking for new web services that fit the following criteria.
- Easily Classified - must have a like/dislike or 5 star rating
- Readily available
- Pertain to new material (less important than the first two)
Here are some samples I have come up with on my own.
- Etsy API
- Rotten Tomatoes API
- Yelp API
Any other suggestions would be much appreciated =)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
在 Pang & Lee 的后期著作 (2008)“意见挖掘和情绪分析”这里 他们有一个公开可用资源的部分。它具有到这些语料库的链接。
In Pang&Lee's later work (2008) "Opinion Mining and Sentiment Analysis" here they have a section for publicly available resources. It has links to those corpora.
Take a look at sentiment140. It has a corpus that you can download and train with. You can easily extend to new tweets.