版本化和索引的数据存储
我需要以易于索引的方式存储实体的所有版本,并且想知道是否有人输入了要使用的系统。
如果没有版本控制,系统只是一个关系数据库,例如每个人一行。如果该人的状态发生变化,该行就会发生变化以反映这一点。通过版本控制,条目应该以这样的方式更新,以便我们始终可以返回到以前的版本。如果我可以使用时态数据库,这将是免费的,我将能够询问“截至昨天下午 2 点居住在都柏林、年龄为 30 岁的所有人的状况如何”。不幸的是,似乎没有任何成熟的开源项目可以做到临时性。
执行此操作的一种非常糟糕的方法是在每次状态更改时插入一个新行。这会导致重复,因为一个人可以有很多字段,但每次更新只能更改一个字段。然后,为给定时间戳的每个人选择正确的版本也相当慢。
理论上,应该可以使用关系数据库和版本控制系统来模拟时态数据库,但这听起来相当可怕。
所以我想知道是否有人以前遇到过类似的事情以及他们是如何处理的?
更新 正如 Aaron 所建议的,这是我们当前使用的查询(在 mysql 中)。在我们有超过 200k 行的表上,速度肯定很慢。 (id = 表键,person_id = 每个人的 id,如果该人有很多修订,则重复)
从人员 p 中选择姓名,其中 p.id =(从人员中选择 max(id),其中 person_id = p.person_id 和时间戳 <= :timestamp)
更新 看起来最好的方法是使用临时数据库,但考虑到没有任何开源数据库,下一个最佳方法是每次更新存储一个新行。唯一的问题是重复未更改的列和缓慢的查询。
I have a requirement to store all versions of an entity in a easily indexed way and was wondering if anyone has input on what system to use.
Without versioning the system is simply a relational database with a row per, for example, person. If the person's state changes that row is changed to reflect this. With versioning the entry should be updated in such a way so that we can always go back to a previous version. If I could use a temporal database this would be free and I would be able to ask 'what is the state of all people as of yesterday at 2pm living in Dublin and aged 30'. Unfortunately there doesn't seem to be any mature open source projects that can do temporal.
A really nasty way to do this is just to insert a new row per state change. This leads to duplication, as a person can have many fields but only one changing per update. It is also then quite slow to select the correct version for every person given a timestamp.
In theory it should be possible to use a relational database and a version control system to mimic a temporal database but this sounds pretty horrendous.
So I was wondering if anyone has come across something similar before and how they approached it?
Update
As suggested by Aaron here's the query we currently use (in mysql). It's definitely slow on our table with >200k rows. (id = table key, person_id = id per person, duplicated if the person has many revisions)
select name from person p where p.id = (select max(id) from person where person_id = p.person_id and timestamp <= :timestamp)
Update
It looks like the best way to do this is with a temporal db but given that there aren't any open source ones out there the next best method is to store a new row per update. The only problem is duplication of unchanged columns and a slow query.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有两种方法可以解决这个问题。两者都假设您总是插入新行。在每种情况下,您都必须插入一个时间戳(
创建
),它告诉您行何时被“修改”。第一种方法使用数字来计算您已有的实例数量。主键是对象键加上版本号。这种方法的问题似乎是您需要
select max(version)
来进行修改。实际上,这很少是一个问题,因为对于应用程序的所有更新,您必须首先加载人员的当前版本,对其进行修改(并增加版本),然后插入新行。因此,真正的问题是这种设计使得在数据库中运行更新变得困难(例如,将属性分配给许多用户)。下一个方法使用数据库中的链接。您为每个对象提供一个新键,而不是组合键,并且有一个包含下一个版本的键的
replacedBy
字段。这种方法可以轻松查找当前版本(...其中replacedBy 为 NULL
)。不过,更新是一个问题,因为您必须插入新行并更新现有行。为了解决这个问题,您可以添加一个后退指针(
previousVersion
)。这样,您可以插入新行,然后使用后退指针更新以前的版本。There are two ways to tackle this. Both assume that you always insert new rows. In every case, you must insert a timestamp (
created
) which tells you when a row was "modified".The first approach uses a number to count how many instances you already have. The primary key is the object key plus the version number. The problem with this approach seems to be that you'll need a
select max(version)
to make a modification. In practice, this is rarely an issue since for all updates from the app, you must first load the current version of the person, modify it (and increment the version) and then insert the new row. So the real problem is that this design makes it hard to run updates in the database (for example, assign a property to many users).The next approach uses links in the database. Instead of a composite key, you give each object a new key and you have a
replacedBy
field which contains the key of the next version. This approach makes it simple to find the current version (... where replacedBy is NULL
). Updates are a problem, though, since you must insert a new row and update an existing one.To solve this, you can add a back pointer (
previousVersion
). This way, you can insert the new rows and then use the back pointer to update the previous version.以下是对时态数据库文献的(有些过时)调查:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf
我建议花很多钱坐下来阅读这些参考文献和/或 Google 学术搜索,尝试找到一些适合您的数据模型的好技术。祝你好运!
Here is a (somewhat dated) survey of the literature on temporal databases: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf
I would recommend spending a good while sitting down with those references and/or Google Scholar to try to find some good techniques that fit your data model. Good luck!