“二维搜索”在 Solr 中或如何获取多值字段“items”的最佳项目？

发布于 2024-09-04 18:08:03 字数 988 浏览 14 评论 0原文

标题有点尴尬，但我找不到更好的。我的问题如下：

我有几个用户存储为文档，并且我为每个文档存储几个键值对或项目（有一个 id）。现在，如果我使用 hl.snippets=5 应用突出显示，我可以获得前 5 项。但每个用户可能有数百个项目，因此

您不会获得最相关的 5 个项目。您将获得前 5 个项目...

另一个问题是

突出显示的文本不会包含 id，因此检索突出显示的项目文本的附加信息很难看。

项目是电子邮件的示例：

user1 has item1 { text:"developers developers developers", id:1, title:"ms" }
          item2 { text:"c# development",                   id:2, title:"nice!" }
          ...
          item77 ...

user2 has item1 { text:"nice restaurant", id:3, title:"bla"}
          item2 { text:"best cafe",       id:4, title:"blup"}
          ...
          item223 ...

现在，如果我对文本字段使用突出显示并查询“restaurant”，我会得到 user2 和文本 nice restaurant。但是我如何确定要显示的突出显示文本的 ID（例如该项目的标题）？如果更多相关项目列在项目列表的末尾，会发生什么情况？突出显示不会显示这些...

那么我怎样才能找到具有多个此类项目的文档中的最佳项目呢？

我添加了我的两个发现作为答案，但正如我将指出的那样，每个发现都有其自己的缺点。

有人能给我指出更好的解决方案吗？

原文

The title is a bit awkward but I couldn't found a better one. My problem is as follows:

I have several users stored as documents and I am storing several key-value-pairs or items (which have an id) for each document. Now, if I apply highlighting with hl.snippets=5 I can get the first 5 items. But every user could have several hundreds items, so

you will not get the most relevant 5 items. You will get the first 5 items ...

Another problem is that

the highlighted text won't contain the id and so retrieving additional information of the highlighted item text is ugly.

Example where items are emails:

user1 has item1 { text:"developers developers developers", id:1, title:"ms" }
          item2 { text:"c# development",                   id:2, title:"nice!" }
          ...
          item77 ...

user2 has item1 { text:"nice restaurant", id:3, title:"bla"}
          item2 { text:"best cafe",       id:4, title:"blup"}
          ...
          item223 ...

Now if I use highlighting for the text field and query against "restaurant" I get user2 and the text nice <b>restaurant</b>. But how can I determine the id of the highlighted text to display e.g. the title of this item? And what happens if more relevant items are listed at the end of the item-list? Highlighting won't display those ...

So how can I find the best items of a documents with multiple such items?

I added my two findings as answers, but as I will point out each of them has its own drawbacks.

Could anyone point me to a better solution?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

記柔刀 2024-09-11 18:08:03

我设计 Solr 模式的经验法则之一是：文档就是您要搜索的内容。

如果您想搜索“项目”，那么这些“项目”就是您的文档。如何存储其他内容（例如“用户”）是次要的。因此，“用户”可能位于另一个索引中，就像您提到的那样，它们可能被“非规范化”（例如，它们的信息在每个文档中重复），在关系数据库中等，具体取决于 RDBMS 的可用性、有多少“用户”、如何这些“用户”拥有许多字段等。

编辑：现在您解释“项目”是电子邮件，可能的搜索是“餐厅 X”，并且您想要找到最好的“项目”（电子邮件）。因此，该文档就是电子邮件。该架构可以像这样简单：（id、标题、文本、用户）。

您可以启用突出显示以获取与“restaurant X”查询匹配的“text”或“title”字段的片段。

如果您想向最终用户提供有关撰写“餐厅 X”的用户的信息，您可以对“用户”字段进行分面。然后，最终用户会看到 John 写了 10 封关于“餐厅 X”的电子邮件，而 Robert 写了 6 封电子邮件。最终用户认为“这个约翰家伙一定对这家餐厅了解很多”，因此他深入搜索“餐厅 X” ' 使用过滤器查询 user:John

回复收藏 0 原文