如何在不分组的情况下对属于 Solr 中多个类别的排序文档进行过滤/排序?
我正在寻找一些帮助和智慧,了解如何针对我的情况正确设计索引文档的架构。基本上我的产品可以属于多个类别。在这些类别中,这些产品可能会或可能不会被排序。理想情况下,我希望每个产品只保留一份唯一的文档。
我正在使用 Solr 3.4.0,目前有具有以下结构的文档:
{
productId : "1",
sku : "ABC123",
productName : "My Product",
categorySequence : ["123-1", "456-7", "789-noseq", "000-noseq"],
description : "Product description",
rating: "4.36"
}
categorySequence 是我遇到麻烦的地方。它是一个多值字段,其中包含使用类别 ID 格式化的字符串以及该类别 ID 中我的产品的序列(以破折号分隔)。如果产品未在类别中排序,我会任意附加“noseq”。
由于我的产品可以存在于多个类别中,因此我对categorySequence 字段执行了一个过滤器查询,如下所示:
fq=categorySequence:123-*
这对我来说仅返回 ID 为“123”的类别中的产品。
然而,我现在发现的问题是您无法对多值字段进行排序。我最初希望这是一种按适当顺序对过滤后的产品进行排序的快速方法。
我在这里看到了一些关于同一产品的分组和多个文档的其他建议。然而,我的产品可以存在于很多类别中,并且正如您可以想象的那样,会创建很多文档。
我希望坚持使用代表单个产品的单个文档。有人可以帮我指出正确的方向吗?我想我基本上是在考虑对二维字段进行过滤和排序?
I'm looking for some help and wisdom on how to properly design the schema for indexing documents for my situation. Basically I have products which can belong in multiple categories. Within those categories these products may or may not be sequenced. Ideally I'd like to keep just one unique document per product.
I'm using Solr 3.4.0 and currently have documents with this structure:
{
productId : "1",
sku : "ABC123",
productName : "My Product",
categorySequence : ["123-1", "456-7", "789-noseq", "000-noseq"],
description : "Product description",
rating: "4.36"
}
The categorySequence is where I'm having trouble. It's a multi value field which contains strings that are formatted with the category id and the sequence of my product within that category id separated by a dash. In cases where the product is not sequenced in the category I've arbitrarily appended "noseq".
Since my product can exist in multiple categories, I do a filter query on the categorySequence field like this:
fq=categorySequence:123-*
which is working for me to bring back only products which are in the category with the id "123".
However my problem now as I have discovered is that you can't sort on multi value fields. I initially was hoping this would be a quick way to sort the filtered products in the appropriate sequence.
I've seen some other suggestions on here regarding grouping and having multiple documents for the same product. However my products can exist in lots of categories and as you can imagine would create a lot of documents.
I'm hoping to stick with a single document representing a single product. Can someone help point me in the right direction? I guess I'm basically looking at doing a filter and a sort on a two dimensional field?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
面临类似的问题,这就是我们实现的方法 -
字段 -
馈送到 Solr 的数据 -
不需要存储没有任何排序顺序的数据。这些的位置可以用 sortMissingLast & 来处理。 sortMissingFirst 属性。
这些字段将维护类别产品的位置/顺序。
当您知道类别 ID 时,您可以轻松过滤和排序产品。
fq=categorySequence:123-*&sort=123_sort_seq asc
不需要维护产品的多个副本。
Faced an similar issue, and here is what we implemented -
Field -
data fed to Solr -
Do not need to store ones without any sort sequence. The positions of these can be handled with sortMissingLast & sortMissingFirst attributes.
These fields will maintain the position/sequence of products for the categories.
As you know the category id you can easily filter and sort for products.
fq=categorySequence:123-*&sort=123_sort_seq asc
Won't need to maintain multiple copies of the products.