自动文档标记相关
我开始从事一个项目,在该项目中我必须用关键字标记文档,如果您手动执行此操作,这确实非常困难且耗时(特别是如果您有数千个文档)。 所以我计划自动化这个过程(知道结果不会完美,但至少它会给你一些建议的标签)。 在最新的 Firefox 版本中,他们实现了这样的系统(当您为页面添加书签时,它会建议您一些标签)。
yahoo 术语提取服务也是一个很好的例子
所以如果有人可以帮助我解决这个问题我真的很感激你的帮助。 或者,如果有人了解 Firefox 标记系统,那么提供一点帮助将会很棒。
I started working on a project in which i must tag documents with keywords, and it is really hard and time consuming if you do it manually (specially if you have thousands of documents). So I am planning to automatize the process (knowing that the result would not perfect but at least it gives you some suggested tags ).
In the latest firefox version they implemented a system like this (when you bookmark a page, it suggests you some tags).
yahoo term extraction service is also a great example
So if any body can help me get around this problem I would really appreciate the help. Or if someone know about the firefox tagging system a little bit of help would be great.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
统计算法有效吗? 也许是贝叶斯的东西? 我知道它们用于垃圾邮件过滤,也许您可以调整贝叶斯过滤器来满足您的需求。
至少,您可以建议经常使用但在英语中不常见的单词(he、she、I、and、it、then、or 等...)
Would a statistical algorithm work? Something Bayesian perhaps? I know they're used in spam filtering, maybe you can adapt a Bayes filter to suit your needs.
At the very least, you could suggest words that are used frequently but are not common words in English (he, she, I, and, it, then, or, etc...)