如何使用 ORM 设置修订后/历史记录跟踪?
我正在尝试弄清楚如何为帖子和其他内容设置修订系统。我认为这意味着它需要使用基本的belongs_to/has_one/has_many/has_many_though ORM(任何好的ORM都应该支持这一点)。
我在想我可以有一些像(带有匹配模型)的表,
[[POST]] (has_many (text) through (revisions)
id
title
[[Revisions]] (belongs_to posts/text)
id
post_id
text_id
date
[[TEXT]]
id
body
user_id
我可以在其中通过修订表加入以获得最新的文本正文。但我对这一切将如何运作有点模糊。有人设置过类似的东西吗?
基本上,我需要能够加载文章并请求最新的内容条目。
// Get the post row
$post = new Model_Post($id);
// Get the latest revision (JOIN through revisions to TEXT) and print that body.
$post->text->body;
能够及时回到之前的修订版本并删除修订版本也会有很大帮助。
无论如何,这些只是我认为某种历史追踪如何运作的想法。我愿意接受任何形式的跟踪,我只是想知道最佳实践是什么。
:编辑:
看来,继续前进,两张桌子似乎最有意义。由于我计划存储两份文本副本,这也有助于节省空间。第一个表 posts
将存储当前修订的数据,以便快速读取,无需任何连接。帖子 body
将是匹配修订版的 text
字段的值 - 但通过 markdown/bbcode/tidy/etc 进行处理。这将使我能够保留原始文本(用于下一次编辑),而不必在一个修订行中将该文本存储两次(或者每次显示它时都必须重新解析它)。
因此获取将是 ORM 友好的。然后,对于创建/更新,我将必须单独处理修订,然后只需使用新的当前修订值更新帖子对象。
CREATE TABLE IF NOT EXISTS `posts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) unsigned DEFAULT NULL,
`allow_comments` tinyint(1) unsigned DEFAULT NULL,
`user_id` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`body` text NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `published` (`published`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `postsrevisions` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`is_current` tinyint(1) unsigned DEFAULT NULL,
`date` datetime NOT NULL,
`title` varchar(100) NOT NULL,
`text` text NOT NULL,
`image` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
KEY `post_id` (`post_id`),
KEY `user_id` (`user_id`),
KEY `is_current` (`is_current`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
I am trying to figure out how to setup a revisions system for posts and other content. I figured that would mean it would need to work with a basic belongs_to/has_one/has_many/has_many_though ORM (any good ORM should support this).
I was thinking a that I could have some tables like (with matching models)
[[POST]] (has_many (text) through (revisions)
id
title
[[Revisions]] (belongs_to posts/text)
id
post_id
text_id
date
[[TEXT]]
id
body
user_id
Where I could join THROUGH the revisions table to get the latest TEXT body. But I'm kind of foggy on how it will all work. Has anyone setup something like this?
Basically, I need to be able to load an article and request the latest content entry.
// Get the post row
$post = new Model_Post($id);
// Get the latest revision (JOIN through revisions to TEXT) and print that body.
$post->text->body;
Having the ability to shuffle back in time to previous revisions and removing revisions would also be a big help.
At any rate, these are just ideas of how I think that some kind of history tracking would work. I'm open to any form of tracking I just want to know what the best-practice is.
:EDIT:
It seems that moving forward, two tables seems to make the most sense. Since I plan to store two copies of text this will also help to save space. The first table posts
will store the data of the current revision for fast reads without any joins. The posts body
will be the value of the matching revision's text
field - but processed through markdown/bbcode/tidy/etc. This will allow me to retain the original text (for the next edit) without having to store that text twice in one revision row (or having to re-parse it each time I display it).
So fetching will be be ORM friendly. Then for creates/updates I will have to handle revisions separately and then just update the post object with the new current revision values.
CREATE TABLE IF NOT EXISTS `posts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`published` tinyint(1) unsigned DEFAULT NULL,
`allow_comments` tinyint(1) unsigned DEFAULT NULL,
`user_id` int(11) NOT NULL,
`title` varchar(100) NOT NULL,
`body` text NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `published` (`published`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
CREATE TABLE IF NOT EXISTS `postsrevisions` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`post_id` int(10) unsigned NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`is_current` tinyint(1) unsigned DEFAULT NULL,
`date` datetime NOT NULL,
`title` varchar(100) NOT NULL,
`text` text NOT NULL,
`image` varchar(200) NOT NULL,
PRIMARY KEY (`id`),
KEY `post_id` (`post_id`),
KEY `user_id` (`user_id`),
KEY `is_current` (`is_current`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您所显示的
Revisions
表模拟了Posts
和Text
之间的多对多关系。这可能不是您想要的,除非Text
中的给定行可以为Posts
中的多行提供内容。这不是大多数 CMS 架构的工作方式。你当然不需要三张桌子。我不知道为什么你认为 3NF 需要这个。 3NF 的要点是一个属性不应该依赖于一个非键属性,它并不是说你应该不必要地拆分成多个表。
因此,您可能只需要两个表之间的一对多关系:
Posts
和Revisions
。也就是说,对于每个帖子,可以有多个修订,但给定的修订仅适用于一个帖子。其他人建议了两种查找当前帖子的替代方法:修订版中的标志列用于记录当前修订版。更改当前版本非常简单,只需将所需版本中的标志更改为 true,将之前的当前版本更改为 false。
Posts
中给定帖子当前修订版的外键。这甚至更简单,因为您可以在一次更新中更改当前版本,而不是两次。但是循环外键引用可能会导致备份和恢复方面的问题。恢复、级联更新等。您甚至可以使用单个表来实现修订系统:
不确定存储每个修订版的
标题
是否重复,因为标题可以像正文一样修改,不是吗?is_current
列应为 1 或 NULL。唯一约束不计算 NULL,因此您只能拥有is_current
为 1 的行,并且可以拥有无限数量的 NULL 行。这确实需要更新两行以使修订版成为最新版本,但通过将模型简化为单个表,您可以获得一些简单性。当您使用 ORM 时,这是一个很大的优势。
您可以创建一个视图来简化查询当前帖子的常见情况:
更新:回复您更新的问题:我同意正确的关系设计会鼓励使用两个表,以便您可以创建 < 的一些属性code>Post 对于该帖子的所有修订都是不变的。但大多数 ORM 工具都假设一个实体存在于单个表中,而 ORM 在连接多个表中的行以构成给定实体方面很笨拙。所以我想说,如果优先使用 ORM,那么您应该将帖子和修订存储在一个表中。牺牲一点关系正确性来支持 ORM 范式的假设。
另一个建议是考虑维度建模。这是一个支持OLAP和数据仓库的数据库设计流派。它明智地使用非规范化,因此您通常可以在星型模式中组织数据。主要实体(“事实表”)由单个表表示,因此这对于以 ORM 为中心的应用程序设计来说是一个胜利。
Your
Revisions
table as you have shown it models a many-to-many relationship betweenPosts
andText
. This is probably not what you want, unless a given row inText
may provide the content for multiple rows inPosts
. This is not how most CMS architectures work.You certainly don't need three tables. I have no idea why you think this is needed for 3NF. The point of 3NF is that an attribute should not depend on a non-key attribute, it doesn't say you should split into multiple tables needlessly.
So you might only need a one-to-many relationship between two tables:
Posts
andRevisions
. That is, for each post, there can be multiple revisions, but a given revision applies to only one post. Others have suggested two alternatives for finding the current post:A flag column in
Revisions
to note the current revision. Changing the current revision is as simple as changing the flag to true in the desired revision and to false to the formerly current revision.A foreign key in
Posts
to the revision that is current for the given post. This is even simpler, because you can change the current revision in one update instead of two. But circular foreign key references can cause problems vis-a-vis backup & restore, cascading updates, etc.You could even implement the revision system using a single table:
I'm not sure it's duplication to store the
title
with each revision, because the title could be revised as much as the text, couldn't it?The column
is_current
should be either 1 or NULL. A unique constraint doesn't count NULLs, so you can have only one row whereis_current
is 1 and an unlimited number of rows where it's NULL.This does require updating two rows to make a revision current, but you gain some simplicity by reducing the model to a single table. This is a great advantage when you're using an ORM.
You can create a view to simplify the common case of querying current posts:
update: Re your updated question: I agree that proper relational design would encourage two tables so that you could make a few attributes of a
Post
invariant for all that post's revisions. But most ORM tools assume an entity exists in a single table, and ORM's are clumsy at joining rows from multiple tables to constitute a given entity. So I would say if using an ORM is a priority, you should store the posts and revisions in a single table. Sacrifice a little bit of relational correctness to support the assumptions of the ORM paradigm.Another suggestion is to consider Dimensional Modeling. This is a school of database design to support OLAP and data warehousing. It uses denormalization judiciously, so you can usually organize data in a Star Schema. The main entity (the "Fact Table") is represented by a single table, so this would be a win for an ORM-centric application design.
在这种情况下,您可能最好将 CurrentTextID 放在您的 Post 表上,以避免必须弄清楚哪个修订版是当前的(另一种选择是修订版上的标志,但我认为帖子上的 CurrentTextID 会让您更容易查询)。
使用 Post 上的 CurrentTextID,您的 ORM 应该在您的 Post 类上放置一个属性 (CurrentText),这将允许您使用您提供的语句访问当前文本。
您的 ORM 还应该为您提供某种方式来加载基于帖子的修订;如果您想了解更多详细信息,那么您应该包含有关您正在使用的 ORM 以及如何配置它的信息。
You'd probably be better off in this case to put a CurrentTextID on your Post table to avoid having to figure out which revision is current (an alternative would be a flag on Revision, but I think a CurrentTextID on the post will give you easier queries).
With the CurrentTextID on the Post, your ORM should place a single property (CurrentText) on your Post class which would allow you to access the current text with essentially the statement you provided.
Your ORM should also give you some way to load the Revisions based on the Post; If you want more details about that then you should include information about which ORM you are using and how you have it configured.
我认为这里两张桌子就足够了。帖子表及其修订。如果您不担心重复数据,单个表(非规范化)也可以工作。
I think two tables would suffice here. A post table and it's revisions. If you're not worried about duplicating data, a single table (de-normalized) could also work.
对于任何感兴趣的人,以下是 WordPress 如何使用单个 MySQL posts 表处理修订。
For anyone interested, here is how wordpress handles revisions using a single MySQL posts table.