有效使用 diff 的版本控制数据库
我有一个涉及网络投票系统的项目。 当前值和相关数据存储在多个表中。 历史数据将是该项目的一个重要方面,因此我还创建了审核表,当前数据将定期移至其中。
我发现这种策略效率极低。 即使我只每天归档数据,即使某一天只有 1 或 2 个用户进行更新,行数也会变得巨大。
我能想到的下一个替代方案是仅存储已更改的条目。 这意味着必须构建逻辑来自动创建给定日期的视图。 这意味着存储的行数较少,但复杂性相当高。
我的最终想法有点不那么传统。 由于历史数据将用于报告目的,因此网络用户无需快速访问。 我认为我的数据库中可能没有历史数据。 DB仅代表当前状态。 然后,每天,整个数据库可以加载到对象中(用户/数据数量相对较低),然后序列化为 XML 或 JSON 等内容。 这些文件可以与前一天进行比较并存储。 事实上,SVN 可以帮我做到这一点。 当我想要过去某一天的数据时,系统必须检索当天的版本并反序列化为对象。 这显然是一个成本高昂的操作,但性能在这里并不是太重要。 我正在考虑使用 LINQ 来实现这一点,我认为这会简化事情。 序列化过程必须组织得很好才能使 diff 正常工作。
您会采取哪种方法?
谢谢
I have a project involving a web voting system. The current values and related data is stored in several tables. Historical data will be an important aspect of this project so I've also created Audit Tables to which current data will be moved to on a regular basis.
I find this strategy highly inefficient. Even if I only archive data on a daily basis, the number of rows will become huge even if only 1 or 2 users make updates on a given day.
The next alternative I can think of is only storing entries that have changed. This will mean having to build logic to automatically create a view of a given day. This means less stored rows, but considerable complexity.
My final idea is a bit less conventional. Since the historical data will be for reporting purposes, there's no need for web users to have quick access. I'm thinking that my db could have no historical data in it. DB only represents current state. Then, daily, the entire db could be loaded into objects (number of users/data is relatively low) and then serialized to something like XML or JSON. These files could be diffed with the previous day and stored. In fact, SVN could do this for me. When I want the data for a given past day, the system has to retrieve the version for that day and deserialize into objects. This is obviously a costly operation but performance is not so much a concern here. I'm considering using LINQ for this which I think would simplify things. The serialization procedure would have to be pretty organized for the diff to work well.
Which approach would you take?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您基本上想知道数据的修订如何存储在关系数据库中,那么我会研究 wiki 是如何做到的。
Wiki 的目的就是保存详细的修订历史记录。 他们使用简单的关系数据库进行存储。
考虑一下维基百科的数据库架构。
If you're basically wondering how revisions of data are stored in relational databases, then I would look into how wikis do it.
Wikis are all about keeping detailed revision history. They use simple relational databases for storage.
Consider Wikipedia's database schema.
关于您的系统,您所告诉我们的只是它涉及投票。 只要您存储投票时的时间戳,您就应该能够生成一份描述任何时间点的投票状态统计的报告......不是吗?
例如,假设我有一个系统可以统计最喜欢的特征(眼睛、微笑、屁股……)。 如果我想知道截至特定日期某个特定功能有多少票,那么我只需计算时间戳小于或等于该日期的该功能的所有投票。
如果你想了解其他事物的历史,那么你会遵循类似的方法。
我认为这就是它的做法。
All you've told us about your system is that it involves votes. As long as you store timestamps for when votes were cast you should be able to generate a report describing the vote state tally at any point in time... no?
For example, say I have a system that tallies favorite features (eyes, smile, butt, ...). If I want to know how many votes there were for a particular feature as of a particular date, then I would simply tally all the votes for the feature with a timestamp smaller or equal to that date.
If you want to have a history of other things, then you would follow a similar approach.
I think this is the way it is done.
您是否考虑过使用真正的版本控制系统,而不是尝试硬塞数据库来代替它? 我自己比较偏爱git,但是有很多选择。 它们都对版本之间的差异提供了良好的支持,并且往往针对此类工作负载进行了很好的优化。
Have you considered using a real version control system rather than trying to shoehorn a database in its place? I myself am quite partial to git, but there are many options. They all have good support for differences between versions, and they tend to be well optimised for this kind of workload.