ElasticSearch 中的多语言查询
假设我们在 ElasticSearch 中有以下映射。
{
"content": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"locale_container": {
"type": "object",
"properties": {
"english": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"german": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"russian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"italian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
}
}
}
}
}
}
当特定用户查询索引时,我们可以从她的设置中获取她的文化,即我们知道要使用哪个分析器。我们如何制定一个查询,仅以她自己的语言(比如说德语)搜索“标题”和“文本”字段,并使用德语分析器来标记搜索查询?
Let's say we have the following mapping in ElasticSearch.
{
"content": {
"properties": {
"id": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"locale_container": {
"type": "object",
"properties": {
"english": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "english",
"search_analyzer": "english",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"german": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "german",
"search_analyzer": "german",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"russian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "russian",
"search_analyzer": "russian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
},
"italian": {
"type": "object",
"properties": {
"title": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
},
"text": {
"type": "string",
"index_analyzer": "italian",
"search_analyzer": "italian",
"index": "analyzed",
"term_vector": "with_positions_offsets",
"store": "yes"
}
}
}
}
}
}
}
}
When a particular user queries the index, we can take her culture from her settings, i.e. we know which analyzer to use. How can we formulate a query which will search only "title" and "text" fields in her own language (let's say, German) and use German analyzer to tokenize the search query?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我已经简化了示例,使用
standard
分析器来分析“英语”,使用simple
(不间断)分析“法语”。对于这样的文档:以下查询可以解决问题:
locale_container.english.title:abc
->返回文档locale_container.french.title:def
->还返回文档locale_container.english.title:to
->不返回任何内容,因为 'to' 是一个停用词locale_container.french.title:to
->返回文档I've simplified the example to use
standard
analyzer for 'English' andsimple
(no stopping) for 'French'. For document like this:The following queries do the trick:
locale_container.english.title:abc
-> returns the documentlocale_container.french.title:def
-> returns the document as welllocale_container.english.title:to
-> doesn't return anything, since 'to' is a stopwordlocale_container.french.title:to
-> returns the document