数据库内容版本控制

发布于 2024-11-16 01:13:14 字数 914 浏览 4 评论 0原文

我有兴趣保留数据库中某些表上发生的每个更改的运行历史记录，从而能够重建数据库的历史状态以进行分析。

我正在使用 Postgres，这个 MVCC 东西似乎我应该能够利用它来实现此目的，但我找不到任何文档来支持这一点。我可以做吗？有更好的办法吗？

任何意见表示赞赏！

UPD

我已将 Denis 的回复标记为答案，因为他实际上回答了 MVCC 是否是我想要的问题。然而，我已经确定的策略详述如下，以防有人觉得有用：

Postgres 功能可以实现我想要的功能：在线备份/时间点恢复。

http://www.postgresql.org/docs/8.1/static/backup -online.html 解释了如何使用此功能，但本质上您可以将此“预写日志”设置为存档模式，拍摄数据库快照（例如，在其上线之前），然后不断存档 WAL。然后，您可以使用日志重播随时调用数据库的状态，如果您选择的话，还可以享受热备用（通过在备用服务器上不断重播新的 WAL）的附带好处。

也许这种方法不如其他保存历史记录的方法那么优雅，因为您需要实际为您希望查询的每个时间点构建数据库，但是它看起来非常容易设置并且丢失零信息。这意味着当我有时间改进对历史数据的处理时，我将拥有一切，因此能够将我笨重的系统转变为更优雅的系统。

使其如此完美的一个关键事实是，我的“有效时间”与特定应用程序的“交易时间”相同 - 如果不是这种情况，我只会捕获“交易时间”。

在我发现 WAL 之前，我考虑只拍摄每日快照之类的东西，但大尺寸要求和涉及的数据丢失并不适合我。

对于一种快速启动和运行而不从一开始就影响数据保留的方法，这似乎是完美的解决方案。

原文

I am interested in keeping a running history of every change which has happened on some tables in my database, thus being able to reconstruct historical states of the database for analysis purposes.

I am using Postgres, and this MVCC thing just seems like I should be able to exploit it for this purpose but I cannot find any documentation to support this. Can I do it? Is there a better way?

Any input is appreciated!

UPD

I have marked Denis' response as the answer, because he did in fact answer whether MVCC is what I want which was the question. However, the strategy I have settled on is detailed below in case anyone finds it useful:

The Postgres feature that does what I want: online backup/point in time recovery.

http://www.postgresql.org/docs/8.1/static/backup-online.html explains how to use this feature but essentially you can set this "write ahead log" to archive mode, take a snapshot of the database (say, before it goes live), then continually archive the WAL. You can then use log replay to recall the state of the database at any time, with the side benefit of having a warm standby if you choose (by continually replaying the new WALs on your standby server).

Perhaps this method is not as elegant as other ways of keeping a history, since you need to actually build the database for every point in time you wish to query, however it looks extremely easy to set up and loses zero information. That means when I have the time to improve my handling of historical data, I'll have everything and will therefore be able to transform my clunky system to a more elegant system.

One key fact that makes this so perfect is that my "valid time" is the same as my "transaction time" for the specific application- if this were not the case I would only be capturing "transaction time".

Before I found out about the WAL, I was considering just taking daily snapshots or something but the large size requirement and data loss involved did not sit well with me.

For a quick way to get up and running without compromising my data retention from the outset, this seems like the perfect solution.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野味少女 2024-11-23 01:13:14

时间旅行

PostgreSQL曾经只有这个功能，并称之为“时间旅行”。请参阅旧文档。

您可以在 spi contrib 模块中找到一些类似的功能可能想查看一下。

复合类型审计触发器

我通常做的是使用触发器将更改连同时间戳一起记录到存档表中，并针对这些进行查询。如果表结构不会改变，您可以使用类似的东西：

CREATE TABLE sometable_history(
    command_tag text not null check (command_tag IN ('INSERT','DELETE','UPDATE','TRUNCATE')),
    new_content sometable,
    change_time timestamp with time zone
);

并且您的版本控制触发器可以insert into sometable_history(TG_OP,NEW,current_timestamp)（使用不同的CASE 用于DELETE，其中NEW未定义）。

hstore 审计触发器

不过，如果架构发生更改以添加新的 NOT NULL 列，这会变得很痛苦。如果您希望执行类似的操作，请考虑使用 hstore 来归档列，而不是复合类型。我已经在 PostgreSQL wiki 上添加了该实现。

PITR

如果您想避免对主数据库产生影响（不断增长的表等），您可以交替使用连续归档和时间点恢复来记录 WAL 文件，这些文件可以使用 recovery.conf 重播到任何时刻。请注意，WAL 文件很大，它们不仅包含您更改的元组，还包含 VACUUM 活动和其他详细信息。您需要通过 clearxlogtail 运行它们，因为如果它们是存档超时的部分片段，那么您将需要对其进行大量压缩以进行长期存储。

Time Travel

PostgreSQL used to have just this feature, and called it "Time Travel". See the old documentation.

There's somewhat similar functionality in the spi contrib module that you might want to check out.

Composite type audit trigger

What I usually do instead is to use triggers to log changes along with timestamps to archival tables, and query against those. If the table structure isn't going to change you can use something like:

CREATE TABLE sometable_history(
    command_tag text not null check (command_tag IN ('INSERT','DELETE','UPDATE','TRUNCATE')),
    new_content sometable,
    change_time timestamp with time zone
);

and your versioning trigger can just insert into sometable_history(TG_OP,NEW,current_timestamp) (with a different CASE for DELETE, where NEW is not defined).

hstore audit trigger

That gets painful if the schema changes to add new NOT NULL columns though. If you expect to do anything like that consider using a hstore to archive the columns, instead of a composite type. I've already added an implementation of that on the PostgreSQL wiki already.

PITR

If you want to avoid impact on your master database (growing tables, etc), you can alternately use continuous archiving and point-in-time recovery to log WAL files that can, using a recovery.conf, be replayed to any moment in time. Note that WAL files are big and they include not only the tuples you changed, but VACUUM activity and other details. You'll want to run them through clearxlogtail since they can have garbage data on the end if they're partial segments from an archive timeout, then you'll want to compress them heavily for long term storage.

回复收藏 0 原文