我应该如何考虑搜索引擎索引?
我正在使用弹性搜索,但不明白索引到底是什么。例如,如果我有 3 个模型(背包、鞋子和手套),我是否将每个模型放入自己的索引中,或者是否为每个模型的属性建立索引:即为鞋子的鞋带、鞋底等建立索引?
我想了解跨索引搜索是否很慢。例如,如果我为模型的每个属性建立索引,并且有 20 个索引,那么当我运行需要查看所有索引中的数据的搜索时,这是否比使用单个索引并查看存储的 20 个属性慢在那个索引中?
I am using elastic search and do not understand exactly what an index is. For example, if I have 3 models (a backpack, a shoe and a glove), do I put each model in its own index or do I index attributes of each model: ie I index a shoe's laces, its sole, etc?
I am trying to understand if it is slow to search across indices. For example, if I index each attribute of my models and I have say, 20 indices, when I run a search that needs to look at data in all of the indices, is this slower than having a single index and looking at 20 attributes stored in that index?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在Elasticsearch中,索引由一个或多个主分片组成,其中分片是一个Lucene实例。每个主分片可以有零个或多个副本,这些副本的存在可以为您提供高可用性并提高搜索性能。
单个分片可以保存大量数据。然而,通过多个分片,可以更轻松地在多个处理器和多个服务器之间分配工作负载。
也就是说,你需要一个平衡。正确的分片数量取决于您的数据和上下文。分片不是免费的,因此,如果您运行的是 100 个节点的集群,拥有数千个分片很有用,但您不希望在单个节点上使用它。
在 Elasticsearch 中,除了索引之外,还有类型的概念。将索引视为数据库,将类型视为表。
使用不同的类型没有开销,并且比使用单独的索引更适合您的示例。
您仍然可以搜索所有类型(或选定的类型列表)和所有索引(或选定的列表)或任意组合。
每种类型都可以有自己的字段(如表中的列)。
因此,在您的示例中,我有一个包含 3 种类型的索引,每种类型都有自己的字段。从默认的主分片数量 (5) 和默认的副本数量 (1) 开始,仅当您更好地了解数据时才更改这些数量。
注意:不要将 Elasticsearch 中的索引与数据库中的索引混淆
In Elasticsearch, an index consists of one or more primary shards, where a shard is a Lucene instance. Each primary shard can have zero or more replicas, whose existence gives you high availability and increased search performance.
A single shard can hold a lot of data. However, with multiple shards it is easier to distribute the workload across multiple processors and multiple servers.
That said, you need a balance. The right number of shards depends on your data and context. Shards aren't free, so while it is useful to have thousands of shards if you're running a 100 node cluster, you don't want that on a single node.
In Elasticsearch, as well as having indices, you have the concept of types. Think of an index as being like a database, and a type being like a table.
Using different types has no overhead, and fits better with your example than having separate indices.
You can still search across all types (or a selected list of types) and across all indices (or a selected list) or any combination.
Each type can have its own fields (like the columns in a table) .
So in your example, I'd have one index containing 3 types, each with its own fields. Start with default number of primary shards (5) and the default number of replicas (1) and change these only when you understand your data better.
Note: don't confuse an index in Elasticsearch with an index in a database