如何在 RavenDB 中批量上传关系数据并将其转换为聚合?
我正在尝试了解如何将关系数据高效地批量插入 RavenDB,特别是从关系数据转换为聚合的情况。
假设我们有两个表的两个转储文件:Orders
和 OrderItems
。它们太大而无法加载到内存中,因此我将它们作为流读取。我可以通读每个表并在 RavenDB 中创建与每一行相对应的文档。我可以使用批量请求来执行批量操作。到目前为止既简单又高效。
然后我想在服务器上对其进行转换,删除 OrderItems
并将它们集成到其父 Order
文档中。我怎样才能做到这一点而不需要数千次往返?
I'm trying to get my head around how to do efficient bulk inserts of relational data into RavenDB, particularly where converting from relational data to aggregates.
Let's say we have two dump files of two tables: Orders
and OrderItems
. They're too big to load into memory, so I read them as streams. I can read through each table and create a document in RavenDB corresponding to each row. I can do this as bulk operations using batched requests. Easy and efficient so far.
Then I want to transform this on the server, getting rid of the OrderItems
and integrating them in to their parent Order
documents. How can I do this without thousands of roundtrips?
The answer seems to lie somewhere between set-based updates, live projections and denormalized updates, but I don't know where.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您需要使用 非规范化更新 和 基于集合的更新。查看 PATCH API 以了解它提供的功能。虽然如果您计划一次更新多个文档,则只需要基于集合的更新,但您可以直接使用 PATCH api 对已知文档进行修补。
实时投影仅在您获取查询/索引的结果时才会为您提供帮助,它们不会更改文档本身,只会更改从服务器返回到客户端的内容。
不过,我建议如果可能的话,在将订单和相应的 OrderItems 发送到 RavenDB 之前,将它们合并到内存中。您仍然可以从转储文件中传输数据,只需根据需要使用一些缓存即可。这将是最简单的选择。
已更新
我制作了一些示例代码来展示如何执行此操作。这会修补特定
Post
文档中的Comments
数组/列表,在本例中为“Posts/1”You're going to need to do this with denormalised updates and set-based updates. Take a look at the PATCH API to see what it offers. Although you only need the set-based updates if you plan on updating several docs at once, you can just patch against a know doc directly using the PATCH api.
Live projections will only help you when you are getting the results of a query/index, they don't change the docs themselves, only what is returned from the server to the client.
However I'd recommend that if possible you combine a Order and the corresponding OrderItems in-memory before you send them to RavenDB. You could still stream the data from the dump files, just use some caching if needed. This will be the simplest option.
Updated
I've made some sample code that shows how to do this. This patches the
Comments
array/list within a particularPost
doc, in this case "Posts/1"