Postgresql PITR 备份:处理多个数据库的最佳实践?

发布于 2024-07-27 20:37:36 字数 988 浏览 8 评论 0原文

大家好,我有一个带有许多数据库的 postgresql 8.3 服务器。

实际上,我计划使用一个脚本来备份这些数据库,该脚本会将所有备份存储在与数据库同名的文件夹中,例如:

/mypath/backup/my_database1/
/mypath/backup/my_database2/
/mypath/backup/foo_database/

每天我每 2 小时进行 1 次转储,每天覆盖文件...例如,在 my_database1 文件夹中,我有:

my_database1.backup-00.sql  //backup made everyday at the 00.00 AM
my_database1.backup-02.sql  //backup made everyday at the 02.00 AM
my_database1.backup-04.sql  //backup made everyday at the 04.00 AM
my_database1.backup-06.sql  //backup made everyday at the 06.00 AM
my_database1.backup-08.sql  //backup made everyday at the 08.00 AM
my_database1.backup-10.sql  //backup made everyday at the 10.00 AM
[...and so on...]

这就是我实际上确保自己能够恢复丢失至少 2 小时数据的每个数据库的方式。

2个小时看起来还是太多了。

我查看了 postgresql pitr 的 WAL 文件,但是,这些文件似乎包含关于我的数据库的所有数据。

我需要分离这些文件,就像分离转储文件一样。

如何?

否则,还有另一个易于安装的备份程序,允许我仅恢复 10 秒前的 1 个备份,但无需每 10 秒创建一个转储文件?

Hy guys, i have a postgresql 8.3 server with many database.

Actually, im planning to backup those db with a script that will store all the backup in a folder with the same name of the db, for example:

/mypath/backup/my_database1/
/mypath/backup/my_database2/
/mypath/backup/foo_database/

Every day i make 1 dump each 2 hours, overwriting the files every day... for example, in the my_database1 folder i have:

my_database1.backup-00.sql  //backup made everyday at the 00.00 AM
my_database1.backup-02.sql  //backup made everyday at the 02.00 AM
my_database1.backup-04.sql  //backup made everyday at the 04.00 AM
my_database1.backup-06.sql  //backup made everyday at the 06.00 AM
my_database1.backup-08.sql  //backup made everyday at the 08.00 AM
my_database1.backup-10.sql  //backup made everyday at the 10.00 AM
[...and so on...]

This is how i actually assure myself to be able to restore everydatabase loosing at least 2 hours of data.

2 hours still looks too much.

I've got a look to the postgresql pitr trought the WAL files, but, those files seem to contain all the data about all my database.

I'll need to separate those files, in the same way i do separate the dump files.

How to?

Otherwise, there is another easy-to-install to have a backup procedure that allo me to restore just 1 backup at 10 seconds earlier, but without creating a dump file every 10 seconds?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

烟燃烟灭 2024-08-03 20:37:36

对于一个 PostgresSQL 实例来说这是不可能的。

您可以将 500 个表划分为多个实例,每个实例监听不同的端口,但这意味着它们不会有效地使用内存等资源(在一个实例中保留但未使用的内存不能被另一个实例使用)。

Slony 在这里也不起作用,因为它不复制 DDL 语句,例如删除表。

我建议同时执行这两项操作:

  • 继续进行 pg_dump 备份,但尝试使其平滑 - 限制 pg_dump io 带宽,这样它就不会损坏服务器,并连续运行它 - 当它完成最后一个数据库时,然后立即从第一个开始;

  • 另外设置PITR。

通过这种方式,您可以快速恢复单个数据库,但可能会丢失一些数据。 如果您决定无法承受丢失那么多数据,那么您可以将 PITR 备份恢复到临时位置(使用 fsync=off 并将 pg_xlog 符号链接到 ramdisk 以提高速度),从那里 pg_dump 受影响的数据库并将其恢复到您的主数据库数据库。

It is not possible with one instance of PostgresSQL.

You can divide your 500 tables between several instances, each listening on different port, but it would mean that they will not use resources like memory effectively (memory reserved but unused in one instance can not be used by another).

Slony will also not work here, as it does not replicate DDL statements, like dropping a table.

I'd recommend doing both:

  • continue to do your pg_dump backups, but try to smooth it - throttle pg_dump io bandwith, so it will not cripple a server, and run it continuously - when it finishes with the last database then immediately start with a first one;

  • additionally setup PITR.

This way you can restore a single database fast, but you can loose some data. If you'll decide that you cannot afford to loose that much data then you can restore your PITR backup to a temporary location (with fsync=off and pg_xlog symlinked to ramdisk for speed), pg_dump affected database from there and restore it to your main database.

蒗幽 2024-08-03 20:37:36

为什么要分离数据库?

PITR 的工作方式是不可能的,因为它在整个集群上工作。
在这种情况下,您可以做的是为每个数据库创建一个数据目录和一个单独的集群(但不推荐,因为它将需要不同的端口和 postmaster 实例)。

我相信使用 PITR 而不是常规转储的好处胜过为每个数据库单独备份,因此也许您可以重新考虑需要分离它的原因。

另一种方法可能是使用 Slony-I 设置一些复制,但这需要接收数据的单独机器(或实例)。 另一方面,这样您就可以拥有一个近乎实时的复制系统。

评论更新:

要从错误中恢复,例如删除表,PITR 将是完美的,因为您可以重播到特定时间。 然而,对于 500 个数据库,我知道这可能会产生很大的开销。 Slony-I 可能不会工作,因为它正在复制。 不确定它如何处理表删除。

我不知道你还有什么其他方法可以走。 我会做的可能仍然是 PITR,只是不犯任何错误;)。 抛开笑话不谈,根据错误发生的频率,这可能是一个解决方案:

  • 为 PITR 设置
  • 第二个实例以备用。
  • 当发生错误时,在第二个实例上重播恢复到该时间点。
  • 从该实例对受影响的数据库执行 pg_dump。
  • 在该数据库的生产实例上执行 pg_restore。

但是,它需要您准备好第二个实例,可以在同一服务器上,也可以在不同的服务器上(建议使用不同的服务器)。 此外,恢复时间会更长一些,因为它需要您进行一次额外的转储和恢复。

Why do you want to separate the databases?

The way the PITR works, it is not possible to do since it works on the complete cluster.
What you can do in that case is to create a data directory and a separate cluster for each of those databases (not recommended though since it will require different ports, and postmaster instances).

I believe that the benefits of using PITR instead of regular dumps outweigh having separate backups for each database, so perhaps you can re-think the reasons for why you need to separate it.

Another way could be to set up some replication with Slony-I but that would require a separate machine (or instance) that receives the data. On the other hand, that way you would have a replicated system in near real-time.

Update for comment:

To recover from mistakes, like deleting a table, PITR would be perfect since you can replay to a specific time. However, for 500 databases I understand that can be a lot of overhead. Slony-I would probably not work, since it is replicating. Not sure how it handles table deletions.

I am not aware of any other ways you can go. What I would do would probably still be going for PITR and just not do any mistakes ;). Jokes aside, depending how frequently mistakes are being made this could be a solution:

  • Set it up for PITR
  • have a second instance ready on standby.
  • When a mistake happens, replay the restore to the point in time on the second instance.
  • Do a pg_dump of the affected database from that instance.
  • Do a pg_restore on the production instance for that database.

However, it would require you to have a second instance ready, either on the same server or a different one (different is recommended). Also, the restore time would be a bit longer since it would require you to do one extra dump and restore.

小矜持 2024-08-03 20:37:36

我认为你这样做的方式是有缺陷的。 您应该拥有一个具有多个模式和角色的数据库。 然后就可以使用PITR了。 然而 PITR 并不能替代转储。

I think the way you are doing this is flawed. You should have one database with multiple schemas and roles. Then you can use PITR. However PITR is not a replacement for dumps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文