我有一个我认为简单的 solr 练习,但我不确定该使用什么。
我有很多名字,例如乔·史密斯(Joe Smith)、杰克丹尼(Jack Daniels)和史蒂夫(Steve)。它们可以是一个名字,也可以是两个名字。我希望能够搜索这个st,如果您搜索“Danie”,您会得到以“Danie”开头的名字或姓氏的所有内容。三个返回示例是“Danielle”、“Steven Daniels”和“Daniel Danielson”。
我也希望优先考虑名字。
所以两个问题是我是否需要使用 copyField 并将姓名分解为名字和姓氏?我的分析仪会是什么样子?
编辑:对搜索能力的两次编辑。
1. 像“Joe S”这样的内容应该返回所有看起来像“Joe S*”的用户
2. 如果用户使用“&”进行搜索字符,应包含在搜索中而不是用作运算符。
I have what I think is a simple solr exercise, but I'm unsure what to use.
I have a field of names, e.g. Joe Smith and Jack Daniels and Steve. They could each be one name or two names. I want to be able to search this s.t. if you search for "Danie" you get everything that has a first or last name that starts with "Danie". Three example returns would be "Danielle", "Steven Daniels", and "Danier Daniellson".
I would also like it so that the preference is given to the first name.
So two questions would be do I need to use a copyField and break up the names into first and last name? And what would my analyzer look like?
Edit: Two edits on the searching ability.
1. Something like "Joe S" should return all users that look like "Joe S*"
2. If a user searches with an "&" character, that should be included in the search and not used as an operator.
发布评论
评论(1)
为了解决您的第一部分,我建议采用以下解决方案:
为您的字段建立索引两次:
您可以在此处找到有关这些标记器的更多信息:http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
在使用不同标记器的两个过滤器中对它们进行索引后,您只需使用boost 查询以增强一个字段(优先考虑名字的字段)的结果,如下所述: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field
对于这部分,您可以使用 DisMax 查询 http://wiki.apache.org/solr/DisMaxQParserPlugin或者当您提出请求时使用“&”而不是 &
此外,您还需要使用像 WhiteSpaceDelimiter 这样的标记生成器来将其他字符保留在标记中。
To solve your first part I suggest the following solution:
index your fields twice:
You can find more about these tokenizers here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
After you indexed them in two filters with different tokenizers you just use boost query to boost your results from one field (the one with preference given to the first name) as it is explained here: http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field
For this part you either use DisMax query http://wiki.apache.org/solr/DisMaxQParserPlugin or when you make a request use "&" instead of &
Also you need to use a tokenizer like WhiteSpaceDelimiter to just keep other characters in tokens.