对 postgresql 数据库转储进行排序 (pg_dump)

发布于 2024-08-20 20:57:12 字数 205 浏览 6 评论 0原文

我正在创建 pg_dumps、DUMP1 和 DUMP2。

DUMP1 和 DUMP2 完全相同,只是 DUMP2 以 DUMP1 的相反顺序转储。

无论如何,我可以对两个转储进行排序,以便两个转储文件完全相同(使用差异时)?

我正在使用 PHP 和 Linux。我尝试在Linux中使用“排序”,但这不起作用......

谢谢!

I am creating to pg_dumps, DUMP1 and DUMP2.

DUMP1 and DUMP2 are exactly the same, except DUMP2 was dumped in REVERSE order of DUMP1.

Is there anyway that I can sort the two DUMPS so that the two DUMP files are exactly the same (when using a diff)?

I am using PHP and linux. I tried using "sort" in linux, but that does not work...

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

哑剧 2024-08-27 20:57:12

来自您上一个问题 ,我假设您真正想做的是与数据库进行比较,看看它们是否相同,包括数据。

正如我们所见,pg_dump 的行为不会具有确定性。一个文件与另一个文件相反这一事实可能只是巧合。

这是一种可以进行包括架构和数据在内的全面比较的方法。

首先,比较架构 使用这个方法

其次,通过按照一致的顺序将数据全部转储到文件中来比较数据。首先按名称对表进行排序,然后按主键列对每个表中的数据进行排序,从而保证顺序。

下面的查询生成 COPY 语句。

select
    'copy (select * from '||r.relname||' order by '||
    array_to_string(array_agg(a.attname), ',')||
    ') to STDOUT;'
from
    pg_class r,
    pg_constraint c,
    pg_attribute a
where
    r.oid = c.conrelid
    and r.oid = a.attrelid
    and a.attnum = ANY(conkey)
    and contype = 'p'
    and relkind = 'r'
group by
    r.relname
order by
    r.relname

运行该查询将为您提供一个语句列表,例如 copy (select * from test order by a,b) to STDOUT; 将这些全部放入一个文本文件中,并通过每个数据库的 psql 运行它们,然后比较输出文件。您可能需要调整输出设置以COPY

From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data.

As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental.

Here is a way that you can do the total comparison including schema and data.

First, compare the schema using this method.

Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s).

The query below generates the COPY statements.

select
    'copy (select * from '||r.relname||' order by '||
    array_to_string(array_agg(a.attname), ',')||
    ') to STDOUT;'
from
    pg_class r,
    pg_constraint c,
    pg_attribute a
where
    r.oid = c.conrelid
    and r.oid = a.attrelid
    and a.attnum = ANY(conkey)
    and contype = 'p'
    and relkind = 'r'
group by
    r.relname
order by
    r.relname

Running that query will give you a list of statements like copy (select * from test order by a,b) to STDOUT; Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings to COPY.

呆萌少年 2024-08-27 20:57:12

我的解决方案是为 pg_dump 输出编写一个自己的程序。请随意下载 PgDumpSort,它按主键对转储进行排序。 Java 默认内存为 512MB,每个表最多可处理 1000 万条记录,因为记录信息(主键值、文件偏移量)保存在内存中。

你使用这个小Java程序,例如,

java -cp ./pgdumpsort.jar PgDumpSort db.sql

你会得到一个名为“db-sorted.sql”的文件,或者指定输出文件名:

java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql

文件中

排序后的数据位于“db-2013-06-06.sql”这样的 您可以使用 diff 创建补丁

diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff

这允许您创建通常更小的增量备份。要恢复文件,您必须使用

 patch -p1 < db-0506.diff

(源代码位于 JAR 文件内)将补丁应用到原始文件

My solution was to code an own program for the pg_dump output. Feel free to download PgDumpSort which sorts the dump by primary key. With the java default memory of 512MB it should work with up to 10 million records per table, since the record info (primary key value, file offsets) are held in memory.

You use this little Java program e.g. with

java -cp ./pgdumpsort.jar PgDumpSort db.sql

And you get a file named "db-sorted.sql", or specify the output file name:

java -cp ./pgdumpsort.jar PgDumpSort db.sql db-$(date +%F).sql

And the sorted data is in a file like "db-2013-06-06.sql"

Now you can create patches using diff

diff --speed-large-files -uN db-2013-06-05.sql db-2013-06-06.sql >db-0506.diff

This allows you to create incremental backup which are usually way smaller. To restore the files you have to apply the patch to the original file using

 patch -p1 < db-0506.diff

(Source code is inside of the JAR file)

巴黎盛开的樱花 2024-08-27 20:57:12

这是该问题的另一种解决方案: https://github.com/tigra564/pgdump-sort

它允许对 DDL 和 DML 进行排序,包括将易失性值(如序列值)重置为一些规范值,以最大限度地减少结果差异。

Here is another solution to the problem: https://github.com/tigra564/pgdump-sort

It allows sorting both DDL and DML including resetting volatile values (like sequence values) to some canonical values to minimize the resulting diff.

白日梦 2024-08-27 20:57:12

如果

  • 性能不如顺序重要,
  • 您只关心数据而不是架构
  • ,并且您可以重新创建两个转储(您不必使用现有转储),

您可以按确定的顺序转储 CSV 格式的数据像这样:

COPY (select * from your_table order by some_col) to stdout
      with csv header delimiter ',';

请参阅 COPY (v14)

If

  • performance is less important than order
  • you only care about the data not the schema
  • and you are in a position to recreate both dumps (you don't have to work with existing dumps)

you can dump the data in CSV format in a determined order like this:

COPY (select * from your_table order by some_col) to stdout
      with csv header delimiter ',';

See COPY (v14)

并安 2024-08-27 20:57:12

可能不值得花精力去解析转储。

将 DUMP2 恢复到临时数据库并按正确的顺序转储临时数据库会快得多。

It's probably not worth the effort to parse out the dump.

It will be far, far quicker to restore DUMP2 into a temporary database and dump the temporary in the right order.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文