文本检索系统的精度值可以达到 100% 吗?

发布于 2024-10-08 08:48:28 字数 271 浏览 0 评论 0原文

由于精度的公式为:

retrieved_and_relevant/(retrieved_and_relevant+retrieved_and_irrelevant)

我想知道文本检索系统中的精度值是否会不同于 100%。我这么认为是因为,我们所有程序员都付出了巨大的努力,没有忘记将所有文档的每一个文本都压缩出来。因此,当查询文本被发送到文本检索系统时,它将输出包含查询文本的所有文档。这意味着所有检索到的文档都是相关文档;基本上得分为100%。

这是真的还是我遗漏了一些要点?

Since the formula for precision is :

retrieved_and_relevant/(retrieved_and_relevant+retrieved_and_irrelevant)

I am wondering if the value for precision in a text-retrieval system will ever be different from 100%. I think so because, all we programmers put a hell lot of effort in not forgetting to squeeze each and every text of all documents out there. So, when a query text is fired into the text retrieval system, it will output all the documents containing the query text. This means that all those documents retrieved are relevant documents; essentially making the score of 100%.

Is this true or am I missing some point ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

何以心动 2024-10-15 08:48:28

您对精度背后的概念有点困惑。

一个简单的例子是搜索术语iraq war。根据搜索引擎的设计方式以及结果可能是也可能不是用户正在寻找的结果。它可能会返回

  • 伊拉克,该国卷入的战争
  • 一个关于当前伊拉克战争中一名士兵的虚构故事,
  • 一篇谈论各种战争及其财务影响的新闻文章。

每个文档可能完全不同并包含确切的搜索词,但可能与用户正在寻找的内容无关。

搜索引擎肯定希望有 100% 的精确度,但这种情况很少见。

精度只能由执行搜索查询本身的用户确定,因为他们是唯一毫无疑问知道结果相关或不相关的人。这绝对是值得努力的目标,但不要相信它永远等于 100%。

You're slightly confused on the concept behind precision.

A simple example would be searching for the terms iraq war. Depending on how the search engine is designed and the results may or may not be what the user is looking for. It might return

  • Wars that Iraq, the country is involved in
  • A fictional story about a soldier in the current Iraq war,
  • A news article that talks about various wars and their financial impact.

Each document could be completely different and contain the exact search terms, but might be irrelevant to what the user was looking for.

The search engine would definitely LIKE to have a precision of 100% but it's very rare that this is the case.

Precision can ONLY be determined by the user who performs the search query itself as they are the only one who knows without a doubt that a result is relevant or not. It's definitely something to strive for, but don't believe it will always equal 100%.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文