开始文档分类时寻求书籍和文章参考
我对做一个关于文档分类的项目感兴趣,并且一直在寻找对与此相关的文本挖掘的理论部分有用的书籍,或者描述从带有分类文档(带有子类别)的训练数据到预测文档类别的系统。似乎有一些(相当昂贵!)的标题可用,但这些是会议记录,其中包含有关较小的非常具体主题的文章。有人可以推荐数据挖掘文献中的书籍,为文本挖掘项目提供良好的理论基础,特别是文档分类或概述此过程的文章吗?
I am interested in doing a project on document classification and have been looking for books that could be useful for the theoretical parts in text mining related to this or examples of articles describing the process of going from training data with documents classified (with subcategories) to a system which predicts the class of a document. There seem to be some (rather expensive!) titles available but these are conference proceedings with articles on smaller very specific topics. Can someone suggest books from the data mining literature that provides a good theoretical basis for a project on text mining, specifically document classification or articles with an overview of this process ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
克里斯托弗·D·曼宁 (Christopher D. Manning)、普拉巴卡尔·拉加万 (Prabhakar Raghavan) 和Hinrich Schütze 有一本免费信息检索书。尝试 第 13 章 - 文本分类&朴素贝叶斯。
另请参阅 Manning 和 Schütze 的 nlp 书籍的配套网站,特别是 文本分类章节的链接。
Fabrizio Sebastiani 写了一个有用的教程关于文本分类(PDF)和文本分类机器学习综述论文(PDF)。
Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze have a free information retrieval book. Try chapter 13 - Text classification & Naive Bayes.
See also the companion site for Manning and Schütze's nlp book, specifically links for the text categorization chapter.
Fabrizio Sebastiani wrote a useful tutorial about text categorization(PDF) and review paper of machine learning for text categorization (PDF).