需要有关版本控制/迁移数据的帮助
我正在开发一个项目,我将使用 Membase (认为 Memcached + 持久性)作为我们的持久层具有多节点集群。 我们使用 Enyim 客户端与缓存对话,并且我们使用二进制序列化来序列化/将对象反序列化到缓存中或从缓存中反序列化。
我们关心的问题之一是如何有效地管理数据模型的更改,如果我们使用普通 SQL 数据库,我们可以运行更新脚本来更新表。
使用 Membase 并处理缓存的二进制对象,我们可以获取所有缓存的对象并加载两个二进制文件:
- 用于序列化缓存对象的代码版本
- 新的代码版本定义不同的属性
并有效地迁移数据,例如 这个,但是当我们可能有数千万个对象时,这是不可取的理想情况下,我们希望能够仅在必要时才迁移数据,并且可以运行一些迭代过程来将版本 1 数据迁移到版本 2,然后迁移到版本 3,依此类推,但我很难想出一种方法用二进制数据来做到这一点..
只是在黑暗中一枪,以前有人有处理此类问题的经验吗?我们非常乐意使用其他形式的序列化,并且可以简单地将字符串(可能是压缩的)数据存储在缓存中并自己处理序列化。
谢谢,
I'm working on a project where I'll be using Membase (think Memcached + persistence) as our persistence layer with a multi-node cluster.
We're using the Enyim client to talk to the cache and we're using binary serialization to serialize/deserialize the objects to and from the cache.
One of the concerns we have is how do we effectively manage changes to our data model, if we were working with normal SQL database we can run an update script to update your tables.
Using Membase and dealing with cached binary objects we COULD grab all the cached objects and load both binaries:
- version of code which was used to serialize the cached objects
- new version of code which defines different properties
and effectively migrate the data like this, but that's hardly desirable when we could potentially have tens of millions of objects in cache.. Ideally we'd like to be able to migrate the data only when it's necessary and have some iterative process we can run to migrate a version 1 data to version 2 and then 3 and so on but I struggle to think of a way to do this with binary data..
Just a shot in the dark, has anyone had any experience dealing with this kind of problems before? We're more than happy to use other forms of serialization and could simply store string (compressed maybe) data in the cache instead and handle the serialization ourselves.
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
考虑修复读取范例,其中新版本的库了解如何识别 V1 或 V2 对象,根据对象存储的版本使用适当的反序列化器,但在接触 V1 对象后将其重新序列化为 V2 格式。
这样就不需要批量更新所有对象,但您最终会将所有对象迁移到 V2 格式。如果需要,您可以运行后台进程来缓慢抓取 V1 对象并转换为 V2 对象,以避免最终在读取修复算法中处理 V1 到 Vn 的复杂性。
Consider a repair on read paradigm where the new version of your library understands how to recognize V1 or V2 objects, uses an appropriate deserializer based on the version the object was stored as, but then reserializes V1 objects to V2 format after touching them.
That way there's no need to batch update all of your objects, but you will eventually migrate all objects to V2 format. You can run a background process to slowly grab V1 objects and convert to V2 objects if needed to avoid the complexity of eventually having V1 through Vn to deal with in the repair on read algorithm.