Solr 和 Elasticsearch 的索引更新如何工作?
我有一个使用事件源和命令查询责任分离模式的应用程序。命令部分的开发已经完成,我必须决定如何实现查询部分。
我的系统处理客户订单,因此当订单事件到达时,该订单将使用 orderId 和订单负载进行处理。问题是,在这种形式中,查询订单的唯一方式是通过 orderId,因此我不能提出诸如向我提供系统中状态为 OPEN 的所有订单之类的问题。
对于这一部分,我必须使用查询部分,查询部分的潜在技术实现,像 PostGre DB 这样的经典解决方案,或者我认为 Solr/Elasticsearch 更优雅的方式。
我有关于 Solr/Elasticsearch 的基本知识/经验,我想利用这个机会了解更多信息,但我的困境来了。我们公司的其他部门已经在使用Elasticsearch,该部门的一位同事告诉我,更新elasticsearch不是一个好主意,我不太明白他的论点,所以我想在这里问我打算做什么所以你可以告诉我,这是一个坏主意,或者 Solr 更适合它。
我计划将订单的每次状态更改作为 Elasticsearch 的更新发送,因此它将如下所示。
id | 状态 | 客户 | 商品 | |
---|---|---|---|---|
orderId1 | -> | order.SUBMITTED | order.Customer | order.Items |
orderId1 | ->; | order.CHANGED | order.Customer1 | order.Items |
orderId1 | -> | order.PROCESSING | order.Customer1 | order.Items |
orderId1 | -> | order.ON_DELIVERY | order.Customer1 | order.Items |
orderId1 | -> | order.COMPLETE | order.Customer1 | order.Items |
如您所见,我必须将 orderId 的多个更新发送到 Elasticsearch/Solr。
所以我的同事告诉我,Elasticsearch 中的索引文档是不可变的,当我发送 order.SUBMITTED 事件进行索引时,它将创建文档,但 order.CHANGED 事件不会更新文档而是创建另一个文档。现在,对于我的业务案例(我将询问我的 Customer1 的订单,我将看到状态已提交和已更改,2 条记录作为查询响应)或操作(额外的负载和存储),我无法完全判断其后果。
我是否正确理解了 Elasticsearch 的行为?如果是,Solr 的行为会有什么不同吗?
如果理解正确,两者的行为相同,我可以设计任何不同的东西来帮助实现我的目标吗?
最后,我使用 PostGre 来解决这个问题没有任何问题,我只是认为 Elasticsearch 或 Solr 是解决这个问题的更自然的选择。你怎么认为?
感谢您的回答。
I have an application that is using Event Sourcing and Command Query Responsiblity Seggragation Pattern. Development of the Command part is complete and I have to decide how should I implement the Query part.
My system deals with customer orders, so when event arrives for an order, that order processed with orderId and order payload. The thing is, in this form only whay to query the orders is over orderId so I can't ask a question like give me all the order in the system with status OPEN.
For this part I have to use the query part, my potential technology implementations for the query part, a classical solution like PostGre DB or more elegant way in my opinion Solr/Elasticsearch.
I have a basic knowledge/experience about Solr/Elasticsearch and I want to use this opurtunity to learn more but here comes my dilema. Some other department in our company is already working with Elasticsearch and a colleage from that deperatment told me, updates in elasticsearch is not a good idea, I didn't quite understand his argumentation, so I like to ask here what I am planning to do so you can tell me, it is a bad idea or Solr is better suited for it.
I am planning every status change for my order to send as an update for Elasticsearch, so it will look like the following.
id | Status | Customer | Items | |
---|---|---|---|---|
orderId1 | -> | order.SUBMITTED | order.Customer | order.Items |
orderId1 | -> | order.CHANGED | order.Customer1 | order.Items |
orderId1 | -> | order.PROCESSING | order.Customer1 | order.Items |
orderId1 | -> | order.ON_DELIVERY | order.Customer1 | order.Items |
orderId1 | -> | order.COMPLETE | order.Customer1 | order.Items |
As you see, I have to send several updates for orderId, to Elasticsearch/Solr.
So my colleague told me, Indexed Documents in Elasticsearch are immutables, when I send order.SUBMITTED Event to be indexed, it will create the document but order.CHANGED Event will not update the document but create another one. Now I can't quite judge the consequence of this, for my Business Case (I will ask orders of my Customer1 and I will see Status SUBMITTED and CHANGED, 2 records as query response) or operational (additional load and storage).
Did I understand correctly the behaviour of Eleasticsearch? If yes, will Solr behave any different?
If understood correctly an both will behave same, can I design anything differently that it would help reach my goals.
Finally I have no problem using PostGre for this solution, I just tough Elasticsearch or Solr would be a more natural choice for this problem. What do you think?
Thx for answers.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
关于 Elasticsearch(ES) 中昂贵的更新和更新是不可变的,您的同事是部分正确的,但这并不意味着 ES 不适合频繁更新的系统,事实上,由于其可扩展性和分布式特性,它是首选并被用于高吞吐量和低延迟系统(包括搜索系统)。
您有一些误解,我会尽力解释它们。
SUBMITTED
,后来您将其更新为CHANGED
>,所以即使它是不可变的,但是当您查询订单状态时,您将获得最新状态(如果 刷新 发生在索引,ES 中默认为 1 秒),除了永久删除旧文档(发生在合并过程中,在 #3 中解释)之外,ES 将旧文档标记为已删除(通过更新布尔标志删除来软删除,更新文档时),因此在您搜索期间,这些软删除的文档不会返回。order
状态SUBMITTED 将从索引中删除 合并处理,以便删除旧文档,并且索引大小不会增加。
理解这一点非常重要,这种不可变的更新为提高搜索/读取性能提供了巨大的好处,因为现在这些段(包含 ES 中的文档)可以在多线程环境中使用,也可以由于不变性原因而被缓存。
You colleague is partially correct, about the costly updates in Elasticsearch(ES) and updates being immutable, but it doesn't mean ES is not suitable for system with frequent updates, in fact due to its scalability and distributed nature its preferred choice and being used in high-throughput and low latency systems(including the search systems).
There are few misconception you have, and I would try to explain them.
SUBMITTED
and later you update it toCHANGED
, so even its immutable but when you query the order status, you will get the latest status(if refresh Happened on the index, default is 1 sec in ES), Apart from permanent deletion of old documents(Happens during the merge process, explained in #3), ES marks old document as deleted(soft delete by updating a boolean flag delete, on updation of document), due to this during your search these soft deleted documents are not returned.order
statusSUBMITTED
will be deleted from index during merge process, so that old documents are deleted, and your index size doesn't grow.Also its very important to understand, that this immutable updates provides a huge benefit to improve the search/read performance as now these segments(which contains the documents in ES) can be used in multi-threading env as well as can be cached due to immutability reasons.