显示相关内容或文章的技术
我一直在尝试学习文本挖掘和集体智能领域的其他相关内容。我有兴趣制作一个应用程序,它将扫描文档并在页面上显示相关的帖子/文章。
什么算法有助于检索所需信息?
谢谢
/A
I've been trying to learn Text mining and other related things in Collective Intelligence field. I am interested to make an app which will scan thru the document and show related posts/articles on page.
What algorithm(s) would be helpful to retrieve required info?
Thanks
/A
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一个简单的方法是统计页面上的不常见单词及其实例。单词出现的次数越多,就越能更好地描述帖子的内容。然后您可以使用它来查找其他文章/帖子。
A simple method is to count the non-common words and their instances on the page. The more a word shows up, the better it is at describing the content of the post. You can then use it to look up other articles/posts.
您可以使用资源描述框架(RDF)。 RDF 库包含结构化知识以及它们之间的联系。因此,您可以获得文本中每个单词的 RDF 记录,并将它们连接到图中。具有最大边数和根节点的节点(如果图形像树)将参考文档的主题。
You can use Resource Description Framework (RDF). RDF bases contain structured knowledge and connections between them. So, you can get RDF records for every word in text and connect them in graph. Nodes with maximum number of edges and root nodes (if the graph is like a tree) will refer to the theme of the document.