当前位置：文江博客话题详情

对 postgresql 数据库转储进行排序 (pg_dump)

发布于 2024-08-20 20:57:12 字数 205 浏览 15 评论 0原文

我正在创建 pg_dumps、DUMP1 和 DUMP2。

DUMP1 和 DUMP2 完全相同，只是 DUMP2 以 DUMP1 的相反顺序转储。

无论如何，我可以对两个转储进行排序，以便两个转储文件完全相同（使用差异时）？

我正在使用 PHP 和 Linux。我尝试在Linux中使用“排序”，但这不起作用......

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哑剧 2024-08-27 20:57:12

来自您上一个问题，我假设您真正想做的是与数据库进行比较，看看它们是否相同，包括数据。

正如我们所见，pg_dump 的行为不会具有确定性。一个文件与另一个文件相反这一事实可能只是巧合。

这是一种可以进行包括架构和数据在内的全面比较的方法。

首先，比较架构使用这个方法。

其次，通过按照一致的顺序将数据全部转储到文件中来比较数据。首先按名称对表进行排序，然后按主键列对每个表中的数据进行排序，从而保证顺序。

下面的查询生成 COPY 语句。

select
    'copy (select * from '||r.relname||' order by '||
    array_to_string(array_agg(a.attname), ',')||
    ') to STDOUT;'
from
    pg_class r,
    pg_constraint c,
    pg_attribute a
where
    r.oid = c.conrelid
    and r.oid = a.attrelid
    and a.attnum = ANY(conkey)
    and contype = 'p'
    and relkind = 'r'
group by
    r.relname
order by
    r.relname

运行该查询将为您提供一个语句列表，例如 copy (select * from test order by a,b) to STDOUT; 将这些全部放入一个文本文件中，并通过每个数据库的 psql 运行它们，然后比较输出文件。您可能需要调整输出设置以COPY。

From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data.

As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental.

Here is a way that you can do the total comparison including schema and data.

First, compare the schema using this method.

Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s).

The query below generates the COPY statements.

select
    'copy (select * from '||r.relname||' order by '||
    array_to_string(array_agg(a.attname), ',')||
    ') to STDOUT;'
from
    pg_class r,
    pg_constraint c,
    pg_attribute a
where
    r.oid = c.conrelid
    and r.oid = a.attrelid
    and a.attnum = ANY(conkey)
    and contype = 'p'
    and relkind = 'r'
group by
    r.relname
order by
    r.relname

Running that query will give you a list of statements like copy (select * from test order by a,b) to STDOUT; Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings to COPY.

回复收藏 0 原文

呆萌少年 2024-08-27 20:57:12

我的解决方案是为 pg_dump 输出编写一个自己的程序。请随意下载 PgDumpSort，它按主键对转储进行排序。 Java 默认内存为 512MB，每个表最多可处理 1000 万条记录，因为记录信息（主键值、文件偏移量）保存在内存中。

你使用这个小Java程序，例如，

java -cp ./pgdumpsort.jar PgDumpSort db.sql

你会得到一个名为“db-sorted.sql”的文件，或者指定输出文件名：

java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql

文件中

排序后的数据位于“db-2013-06-06.sql”这样的您可以使用 diff 创建补丁

diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff

这允许您创建通常更小的增量备份。要恢复文件，您必须使用

 patch -p1 < db-0506.diff

（源代码位于 JAR 文件内）将补丁应用到原始文件

My solution was to code an own program for the pg_dump output. Feel free to download PgDumpSort which sorts the dump by primary key. With the java default memory of 512MB it should work with up to 10 million records per table, since the record info (primary key value, file offsets) are held in memory.

You use this little Java program e.g. with

java -cp ./pgdumpsort.jar PgDumpSort db.sql

And you get a file named "db-sorted.sql", or specify the output file name:

java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql

And the sorted data is in a file like "db-2013-06-06.sql"

Now you can create patches using diff

diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff

This allows you to create incremental backup which are usually way smaller. To restore the files you have to apply the patch to the original file using

 patch -p1 < db-0506.diff

(Source code is inside of the JAR file)

回复收藏 0 原文

巴黎盛开的樱花 2024-08-27 20:57:12

这是该问题的另一种解决方案： https://github.com/tigra564/pgdump-sort

它允许对 DDL 和 DML 进行排序，包括将易失性值（如序列值）重置为一些规范值，以最大限度地减少结果差异。

回复收藏 0 原文

白日梦 2024-08-27 20:57:12

如果

性能不如顺序重要，
您只关心数据而不是架构
，并且您可以重新创建两个转储（您不必使用现有转储），

您可以按确定的顺序转储 CSV 格式的数据像这样：

COPY (select * from your_table order by some_col) to stdout
      with csv header delimiter ',';

请参阅 COPY (v14)

performance is less important than order
you only care about the data not the schema
and you are in a position to recreate both dumps (you don't have to work with existing dumps)

you can dump the data in CSV format in a determined order like this:

COPY (select * from your_table order by some_col) to stdout
      with csv header delimiter ',';

See COPY (v14)

回复收藏 0 原文

并安 2024-08-27 20:57:12

可能不值得花精力去解析转储。

将 DUMP2 恢复到临时数据库并按正确的顺序转储临时数据库会快得多。

回复收藏 0 原文

~没有更多了~

关于作者

旧人九事

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

对 postgresql 数据库转储进行排序 (pg_dump)

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

对 postgresql 数据库转储进行排序 (pg_dump)

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。