在 Solr 中,如何针对一个字段查询多值字段中的一组不同值
我基本上希望 Solr 搜索多值字段的每条记录以获取搜索参数。继续阅读我的示例:
我正在使用 Solr 来索引我的数据。我有与给定产品匹配的并行数组(以多值字段的形式)中的应用程序数据。请参阅以下示例,其中品牌、型号和年份是多值字段:
<-solr记录开始->
货号: 1234
品牌: 讴歌,讴歌,讴歌
型号:Integra、RSX、RSX
年份: 1997, 2004, 2000
发动机:3.4、4.5、4.5
<-solr记录结束->
我正在使用过滤器查询 (&fq=) 来缩小我的选择范围。问题是,如果有人查找 2000 Acura Integra,它将与上述记录匹配,但由于品牌、型号和年份数据是并行编码的,因此实际上不存在该产品的 2000 Acura Integra。 Solr 正在匹配 make 字段中的 make、模型字段中的模型以及年份字段中的年份(理应如此)并返回此结果,并且不尊重我的并行性。 到目前为止,我的查询如下所示:
fq=make:"acura"&fq=model:"integra"&fq=year:2000 (当我 POST 到 Solr 时,我通常会转义 URL 字符,这只是一个示例)
所以我的解决方案是创建另一个多值字段,称为摘要字段,在其中我将所有品牌、型号、年份和其他数据(如引擎)放在一起,并用空格分隔。有必要在单词周围加上引号,这样包含多个单词的术语就不会无意中与搜索参数匹配。上面的示例现在看起来像这样:
<-solr记录开始->
货号: 1234
品牌: 讴歌,讴歌,讴歌
型号:Integra、RSX、RSX
年份: 1997, 2004, 2000
发动机:3.4、4.5、4.5
摘要:“讴歌”“Integra”“1997”“3.4”,“讴歌”“RSX”“2004”“4.5”,“讴歌”“RSX”“2000”,“4.5”
<-solr记录结束->
然后,我将以下内容添加到我的查询中:
摘要:(““acura”AND“Integra”AND“2000”)
我希望,如果我将其添加到查询中,该记录将不再出现,因为摘要字段中没有 acura integra 2000。然而,这是行不通的。记录仍然出现。我很困惑。有谁有解决这个问题的办法。这几天来我都快要死了。
我基本上希望 Solr 在多值字段的每条记录中搜索我的搜索参数..这可能吗?有更好的方法来做我想做的事情吗?
谢谢
I basically want Solr to search each record of the multivalued field for my search parameter.. read on for my example:
I am using Solr to index my data. I have application data in parallel arrays (in the form of multi-valued fields) that match a given product. See the following example, where make, model, and year are multivalued fields:
<-solr record start->
sku: 1234
make: acura, acura, acura
model: integra, rsx, rsx
year: 1997, 2004, 2000
engine: 3.4, 4.5, 4.5
<-solr record end->
I am using filter queries (&fq=) to narrow my selections. The problem is, if someone looks up a 2000 Acura Integra, it will match the above record, but since the make, model, and year data is encoded in parallel, there actually is no 2000 Acura Integra for this product. Solr is matching the make in the make field, the model in the model field, and the year in the year field (as it should) and returning this result, and not respecting my parallelism.
My Query would look like this so far:
fq=make:"acura"&fq=model:"integra"&fq=year:2000 (I would normally escape URL characters when I POST to Solr, this is just an example)
So my solution was to create another multivalued field, called summary field,in which I would put all the make, model, year and other data (like engine) together separated by a space. It is necessary to have quotations around the words so terms with multiple words don't match search parameters inadvertently. The above example would now look like this:
<-solr record start->
sku: 1234
make: acura, acura, acura
model: integra, rsx, rsx
year: 1997, 2004, 2000
engine: 3.4, 4.5, 4.5
summary: "acura" "integra" "1997" "3.4", "acura" "rsx" "2004" "4.5", "acura" "rsx" "2000", "4.5"
<-solr record end->
I then add to my query the following:
summary:(""acura" AND "integra" AND "2000")
I would expect, if I added that to my query, that this record would no longer come up, since there is no acura integra 2000 in the summary field. However, this doens't work. The record still comes up. I am stumped. Does anyone have a solution to this problem. It's been killing me for days.
I basically want Solr to search each record of the multivalued field for my search parameter.. is this possible? Is there a better way to do what I am trying to do?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看来您的架构不太正确。您需要完全非规范化数据并为每辆车创建一个文档。 “车辆”的含义取决于您将进行何种类型的搜索。例如,可能的模式是:
摘要字段是 make+model+years+engines 的 copyField
It seems that your schema isn't quite right. You need to completely denormalize your data and create one document per vehicle. What a "vehicle" means depends on what kind of searches you will run. For example, a possible schema would be:
The summary field would be a copyField of make+model+years+engines
我仍然不确定如何在没有汇总字段的情况下保持并行性,但我想出了如何使用汇总字段来做到这一点。我认为不要使用 AND 语句来搜索多值字段中的每个记录以进行匹配(每个 AND 字词可以匹配多值字段中的不同行,不一定是同一行),而是输入您想要的确切字词按照您构建原始摘要记录的顺序进行查找,并使用 ~ 运算符。
看一下下面的示例:
以下是我希望匹配的多值字段中的某一行中的汇总字段的内容:
<代码>
“本田” “雅阁” “2004” “3.5L”
这是我将运行的查询:
<代码>
Summary_field:("\"本田\" \"2004\"")
单独使用上面的查询是行不通的。尽管我可以有一个函数,将来自应用程序的用户输入按照与构建原始汇总字段相同的顺序进行排序,因为应用程序中的用户可以按任何顺序输入一段数据(品牌、型号年份),但可能是我试图匹配的数据之间的其他词。在上面的示例中,我希望将 Honda 2004 与该记录相匹配。然而,雅阁介于其间。
要解决此问题,只需使用 ~n 运算符,其中 n 是您正在搜索的术语之间的其他术语的最大数量。所以如果我改用:
Summary_field:("\"本田\" \"2004\""~1)
我是说从 Honda 到 2004 年之间,有可能存在另外 1 个词。因此,上面的查询将匹配。即使您向汇总字段添加多个术语,只要您使用相同顺序的值对其进行查询,并且您的模糊搜索逻辑使用的数字将是两个值之间的最大距离,您的查询将始终正确匹配正确的摘要字段。因此,如果您向摘要字段添加 20 个字段以保持并行性,则只需使用 ~18,因为这是在最坏情况下用户可以选择的单词之间的最大可能距离。
I am still not sure on how to maintain parallelism without a summary field, but I figured out how to do it with a summary field. Instead of using AND statements, which I believe search each record in the multivalued field for a match (each AND'ed term could match a different row in the Multivalued field, not necessarily the same row), you instead put the exact terms you're looking for, in the same order that you built your original summary record, and use the ~ operator.
Take a look at the following example:
The following are the contents of the summary field in one of the rows in the multivalued field, which I wish to match:
"Honda" "Accord" "2004" "3.5L"
Here is the query I will run:
summary_field:("\"Honda\" \"2004\"")
The above query alone will not work. Even though I can have a function that puts user input from the application into the same order that the original summary field was built with, because users in the application can enter a piece of data (a make, model year) in any order, there may be other words in between the data I am trying to match. In the above eample, I want to match Honda 2004 to that record. However, Accord is between it.
To get around this problem, simply use the ~n operator, where n is the maximum number of other terms in between the terms your are searching for. So if I instead use:
summary_field:("\"Honda\" \"2004\""~1)
I am saying that between Honda and 2004, there is a possibility of there being 1 other word. Therefore, this above query will match. Even if you add multiple terms to your summary field, as long as you query against it with the values in the same order, and your fuzzy search logic uses a number that will be the maximum distance between 2 values, your query will always correctly match the correct summary field. So, if you have 20 fields that you add to your summary field to maintain parallelism, you simply need to use ~18, as that is the maximum possible distance in a worst case scenario between words that could be picked by the user.
不能只做如下查询吗?
即没有品牌和型号周围的引号。
Can you not just do a query as follows?
I.e. Without the Quotes around the make and model.