数据库 - 单表中的数据版本控制
我正在开发一个具有一些版本控制功能的CMS。它基于 MySQL 数据库。
这个想法是向公共网站访问者展示数据的“某些修订版”,并向后台用户展示“最新修订版”的预览。发布某些内容只是意味着将“某些修订版”设置为等于最新修订版(并且可能删除旧修订版的数据)。
我读过一些关于该主题的问答,其中大多数都建议将“旧”和“新”行放在同一个表中是不好的。但是,由于我需要联接表,所有这些表都是“版本化”的,因此在不同的表中分割旧表和新表也不理想(应用程序应该如何知道一个修订版中的“内容”是旧的还是新的,因此要是否在“_history”表中找到?)。
因此,我决定为每种“内容类型”仅使用一个表。
我使用的设计: 每个表都包含一个“revision INT NOT NULL”列(主键的一部分,以及 ID 列)。
修改某些内容意味着插入具有修改值的新行、增量修订版,但 ID 相同。
插入某些内容意味着插入一个具有递增 ID 和递增修订版本的新行。
删除某些内容意味着插入一个具有相同 ID、增量修订版且“thumbstone”标志设置为“true”的空行。
示例:有页面和“视图”(“视图不是 MVC 意义上的视图,是应用程序特定含义中的视图”)。“视图”是版本化的。一个页面有多个视图。 这是“视图”(的一部分)。
CREATE TABLE `_views` (
`_id` int(11) NOT NULL,
`_rev` int(11) NOT NULL,
`_ts` BIT(1) DEFAULT b'0',
`page` int(11) NOT NULL,
`order` int(11) NOT NULL,
PRIMARY KEY (`_id`,`_rev`)
)
我需要按照“order”指定的顺序选择页面包含的所有视图,直到“某些修订版”。
此查询有效:
SELECT * FROM (
SELECT *
FROM `_views`
WHERE `page` = :page
AND `_rev` <= :revision
ORDER BY `_rev` DESC
) AS `all`
GROUP BY `_id`
HAVING `_ts` = 0
ORDER BY `order`
子查询选择曾经“发布”的页面的所有视图(该版本小于或等于“已发布”版本)。外部查询将它们分组到其最新版本,删除具有拇指石的组并按应用程序特定条件对它们进行排序。
既然对于 CMS 而言,可扩展性和性能至关重要,那么是否有比子查询更好、更优雅的方法呢?
...或者我应该只关注缓存?
I'm developing a CMS which has some version control features. It's based on a MySQL Db.
The idea is to show public site visitors a "certain revision" of the data and backoffice users a preview of the "latest revision". Publishing something just means to set the "certain revision" equal to the latest one (and maybe deleting data of old revisions).
I've read some Q&As about the topic on SO, most of them suggest that holding "old" and "new" rows in the same table is bad. But, since I need to join tables, all of them "versioned", splitting old and new in different tables isn't ideal either (how should the app know if "content" from one revision is old or new, and hence to be found in a "_history" table or not?).
So I decided to use just one table for each "content type".
The design I used:
every table holds a "revision INT NOT NULL" column (part of primary key, together with an ID column).
Modifying something means inserting a new row with the modified values, an incremented revision, but the same ID.
Inserting something means inserting a new row with incremented ID and incremented revision.
Deleting something means inserting an empty row with same ID, incremented revision and a "thumbstone" flag set to "true".
Example: there are pages and there are "views" ("view not in MVC sense, view in an application specific meaning). "views" are versioned. One page has many views.
This is (part of) "Views".
CREATE TABLE `_views` (
`_id` int(11) NOT NULL,
`_rev` int(11) NOT NULL,
`_ts` BIT(1) DEFAULT b'0',
`page` int(11) NOT NULL,
`order` int(11) NOT NULL,
PRIMARY KEY (`_id`,`_rev`)
)
I need to select all views that a page contains, up to a "certain revision", in the order specified by "order".
This query works:
SELECT * FROM (
SELECT *
FROM `_views`
WHERE `page` = :page
AND `_rev` <= :revision
ORDER BY `_rev` DESC
) AS `all`
GROUP BY `_id`
HAVING `_ts` = 0
ORDER BY `order`
the subquery selects all views of a page, that were once "published" (which revision is less than or equal to the "published" revision). The outer query groups them to their latest revision, removes the groups that have a thumbstone and orders them by application specific criteria.
Since for a CMS scalability and performance is crucial, isn't there a better, more elegant, way than subqueries?
... or should I just focus on caching?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用子查询来确定当前修订版并不是最好的方法;你真的不想去那里。
一种更简单的方法是添加一个标志来告诉您最新的修订版:
这需要在添加新修订版或
_ts
时手动更新以设置_current
标志。标志改变了。但至少这可以避免在每个页面显示上执行子查询。作为替代方案,您仍然可以将数据拆分为
_current
和_history
表。然后,如果您需要再次加入结果集,您只需在这两种情况下创建一个视图:同样,如果您需要经常对它们进行分组,则可以创建所有活动(非拇指碑)修订的子表。尽管这会比 _current 标志或仅查看 _history 表产生更多的手动微观管理。
Using subqueries to determine the current revision is not the best approach; you really don't want to go there.
A simpler method is to add a flag which tells you about the most current revision:
This requires a manual UPDATE to set the
_current
flag whenever a new revision is added or the_ts
flag changed. But at least that avoids executing the subquery on each page display.As alternative you could still split your data into a
_current
and_history
table. You'd then instead just create a view on both for those cases were you need to join result sets again:Likewise it might be possible to create a subtable of all active (non-thumbstoned) revisions, if you need to group them frequently. Albeit that would incur even more manual micromanagement than a _current flag, or just a view over the _history table.