文本检索系统的精度值可以达到 100% 吗?
由于精度的公式为:
retrieved_and_relevant/(retrieved_and_relevant+retrieved_and_irrelevant)
我想知道文本检索系统中的精度值是否会不同于 100%。我这么认为是因为,我们所有程序员都付出了巨大的努力,没有忘记将所有文档的每一个文本都压缩出来。因此,当查询文本被发送到文本检索系统时,它将输出包含查询文本的所有文档。这意味着所有检索到的文档都是相关文档;基本上得分为100%。
这是真的还是我遗漏了一些要点?
Since the formula for precision is :
retrieved_and_relevant/(retrieved_and_relevant+retrieved_and_irrelevant)
I am wondering if the value for precision in a text-retrieval system will ever be different from 100%. I think so because, all we programmers put a hell lot of effort in not forgetting to squeeze each and every text of all documents out there. So, when a query text is fired into the text retrieval system, it will output all the documents containing the query text. This means that all those documents retrieved are relevant documents; essentially making the score of 100%.
Is this true or am I missing some point ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您对精度背后的概念有点困惑。
一个简单的例子是搜索术语
iraq war
。根据搜索引擎的设计方式以及结果可能是也可能不是用户正在寻找的结果。它可能会返回每个文档可能完全不同并包含确切的搜索词,但可能与用户正在寻找的内容无关。
搜索引擎肯定希望有 100% 的精确度,但这种情况很少见。
精度只能由执行搜索查询本身的用户确定,因为他们是唯一毫无疑问知道结果相关或不相关的人。这绝对是值得努力的目标,但不要相信它永远等于 100%。
You're slightly confused on the concept behind precision.
A simple example would be searching for the terms
iraq war
. Depending on how the search engine is designed and the results may or may not be what the user is looking for. It might returnEach document could be completely different and contain the exact search terms, but might be irrelevant to what the user was looking for.
The search engine would definitely LIKE to have a precision of 100% but it's very rare that this is the case.
Precision can ONLY be determined by the user who performs the search query itself as they are the only one who knows without a doubt that a result is relevant or not. It's definitely something to strive for, but don't believe it will always equal 100%.