为什么将数字数据映射到关键字可以缩短 ElasticSearch 中的检索时间
我有长期的 SQL 背景——NoSQL(和 ElasticSearch)对我来说非常陌生。
我团队中的一位工程师正在构建一个用于文档存储的新索引,他们已将所有短/整数/长值映射到字符串以在术语查询中使用。
这让我感到惊讶,因为带有 SmallInt/Int/BigInt 键的 SQL 索引的性能比转换为 VarChar(X) 并相应索引的同一组值要好得多。
有人向我指出这篇文章:https://www.elastic。 co/guide/en/elasticsearch/reference/current/number.html
其中有以下评论:
如果出现以下情况,请考虑将数字标识符映射为关键字:
- 您不打算使用范围查询来搜索标识符数据。
- 快速检索非常重要。关键字字段上的术语查询搜索通常比数字字段上的术语搜索更快。
我很高兴从表面上理解这一点,但我不明白为什么会这样。
假设精确匹配类型查询(例如 ID = 100),任何人都可以谈谈 ElasticSearch(或一般的 NoSQL)的机制,这可以解释为什么针对字符串化数值的查询比直接针对数值的查询更快?
I'm coming from a long-term SQL background -- NoSQL (and ElasticSearch) is very new to me.
An engineer on my team is constructing a new index for document storage, and they have mapped all short/int/long values to strings for use in term queries.
This surprised me, as a SQL index with an SmallInt/Int/BigInt key will perform much better than that same set of values turned into a VarChar(X) and indexed accordingly.
I was pointed to this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html
Which has this comment:
Consider mapping a numeric identifier as a keyword if:
- You don’t plan to search for the identifier data using range queries.
- Fast retrieval is important. term query searches on keyword fields are often faster than term searches on numeric fields.
I'm happy take this at face value, but I don't understand why this is.
Assuming an exact match type query (e.g. ID = 100), can anyone speak to the mechanics of ElasticSearch (or NoSQL in general), that would explain why a query against a stringified numeric value is faster than a query against numeric values directly?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
基本上,关键字存储在倒排索引中,并且查找速度非常快,这使得
keyword
成为term/s
查询的理想类型(即完全匹配),但是数字值,存储在 BKD 树(自 ES 5/Lucene 6 起)中<一href="https://www.elastic.co/blog/elasticsearch-5-0-0-released#data-structs" rel="nofollow noreferrer">更多 比数值倒排索引更优,并且还针对
范围
进行了优化 -就像查询一样。缺点是在 BKD 树中搜索精确数值的性能低于在倒排索引中查找术语的性能。
因此,这样做的好处是,如果您的 ID 是数字,并且您计划在范围内查询它们,请将它们映射为数字类型,如整数等。但是,如果您计划将您的 ID 与类似术语/精确的方式,然后将它们存储为具有
keyword
类型的字符串。Basically, keywords are stored in the inverted index and the lookup is really fast, which makes
keyword
the ideal type forterm/s
queries (i.e. exact match)Numeric values, however, are stored in BKD trees (since ES 5/Lucene 6) which are more optimal than the inverted index for numeric values and also optimized for
range
-like queries.The downside is that searching for an exact numerical value within a BKD tree is less performant than looking up the term in the inverted index.
So the take away from this is that if your IDs are numeric and you plan on querying them in ranges, map them with a numeric type like
integer
, etc. But, if you plan on matching your ID in a term/exact-like fashion, then store them as string with akeyword
type.