为什么将数字数据映射到关键字可以缩短 ElasticSearch 中的检索时间

发布于 2025-01-15 18:38:00 字数 702 浏览 0 评论 0原文

我有长期的 SQL 背景——NoSQL(和 ElasticSearch)对我来说非常陌生。

我团队中的一位工程师正在构建一个用于文档存储的新索引,他们已将所有短/整数/长值映射到字符串以在术语查询中使用。

这让我感到惊讶,因为带有 SmallInt/Int/BigInt 键的 SQL 索引的性能比转换为 VarChar(X) 并相应索引的同一组值要好得多。

有人向我指出这篇文章:https://www.elastic。 co/guide/en/elasticsearch/reference/current/number.html

其中有以下评论:

如果出现以下情况,请考虑将数字标识符映射为关键字:

  • 您不打算使用范围查询来搜索标识符数据。
  • 快速检索非常重要。关键字字段上的术语查询搜索通常比数字字段上的术语搜索更快。

我很高兴从表面上理解这一点,但我不明白为什么会这样。

假设精确匹配类型查询(例如 ID = 100),任何人都可以谈谈 ElasticSearch(或一般的 NoSQL)的机制,这可以解释为什么针对字符串化数值的查询比直接针对数值的查询更快?

I'm coming from a long-term SQL background -- NoSQL (and ElasticSearch) is very new to me.

An engineer on my team is constructing a new index for document storage, and they have mapped all short/int/long values to strings for use in term queries.

This surprised me, as a SQL index with an SmallInt/Int/BigInt key will perform much better than that same set of values turned into a VarChar(X) and indexed accordingly.

I was pointed to this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html

Which has this comment:

Consider mapping a numeric identifier as a keyword if:

  • You don’t plan to search for the identifier data using range queries.
  • Fast retrieval is important. term query searches on keyword fields are often faster than term searches on numeric fields.

I'm happy take this at face value, but I don't understand why this is.

Assuming an exact match type query (e.g. ID = 100), can anyone speak to the mechanics of ElasticSearch (or NoSQL in general), that would explain why a query against a stringified numeric value is faster than a query against numeric values directly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

流绪微梦 2025-01-22 18:38:00

基本上,关键字存储在倒排索引中,并且查找速度非常快,这使得 keyword 成为 term/s 查询的理想类型(即完全匹配)

,但是数字值,存储在 BKD 树(自 ES 5/Lucene 6 起)中<一href="https://www.elastic.co/blog/elasticsearch-5-0-0-released#data-structs" rel="nofollow noreferrer">更多 比数值倒排索引更优,并且还针对范围进行了优化 -就像查询一样。

缺点是在 BKD 树中搜索精确数值的性能低于在倒排索引中查找术语的性能。

因此,这样做的好处是,如果您的 ID 是数字,并且您计划在范围内查询它们,请将它们映射为数字类型,如整数等。但是,如果您计划将您的 ID 与类似术语/精确的方式,然后将它们存储为具有 keyword 类型的字符串。

Basically, keywords are stored in the inverted index and the lookup is really fast, which makes keyword the ideal type for term/s queries (i.e. exact match)

Numeric values, however, are stored in BKD trees (since ES 5/Lucene 6) which are more optimal than the inverted index for numeric values and also optimized for range-like queries.

The downside is that searching for an exact numerical value within a BKD tree is less performant than looking up the term in the inverted index.

So the take away from this is that if your IDs are numeric and you plan on querying them in ranges, map them with a numeric type like integer, etc. But, if you plan on matching your ID in a term/exact-like fashion, then store them as string with a keyword type.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文