对 postgresql 数据库转储进行排序 (pg_dump)
我正在创建 pg_dumps、DUMP1 和 DUMP2。
DUMP1 和 DUMP2 完全相同,只是 DUMP2 以 DUMP1 的相反顺序转储。
无论如何,我可以对两个转储进行排序,以便两个转储文件完全相同(使用差异时)?
我正在使用 PHP 和 Linux。我尝试在Linux中使用“排序”,但这不起作用......
谢谢!
I am creating to pg_dumps, DUMP1 and DUMP2.
DUMP1 and DUMP2 are exactly the same, except DUMP2 was dumped in REVERSE order of DUMP1.
Is there anyway that I can sort the two DUMPS so that the two DUMP files are exactly the same (when using a diff)?
I am using PHP and linux. I tried using "sort" in linux, but that does not work...
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
来自您上一个问题 ,我假设您真正想做的是与数据库进行比较,看看它们是否相同,包括数据。
正如我们所见,pg_dump 的行为不会具有确定性。一个文件与另一个文件相反这一事实可能只是巧合。
这是一种可以进行包括架构和数据在内的全面比较的方法。
首先,比较架构 使用这个方法。
其次,通过按照一致的顺序将数据全部转储到文件中来比较数据。首先按名称对表进行排序,然后按主键列对每个表中的数据进行排序,从而保证顺序。
下面的查询生成
COPY
语句。运行该查询将为您提供一个语句列表,例如
copy (select * from test order by a,b) to STDOUT;
将这些全部放入一个文本文件中,并通过每个数据库的 psql 运行它们,然后比较输出文件。您可能需要调整输出设置以COPY
。From your previous question, I assume that what you are really trying to do is compare to databases to see if they are they same including the data.
As we saw there, pg_dump is not going to behave deterministically. The fact that one file is the reverse of the other is probably just coincidental.
Here is a way that you can do the total comparison including schema and data.
First, compare the schema using this method.
Second, compare the data by dumping it all to a file in an order that will be consistent. Order is guaranteed by first sorting the tables by name and then by sorting the data within each table by primary key column(s).
The query below generates the
COPY
statements.Running that query will give you a list of statements like
copy (select * from test order by a,b) to STDOUT;
Put those all in a text file and run them through psql for each database and then compare the output files. You may need to tweak with the output settings toCOPY
.我的解决方案是为 pg_dump 输出编写一个自己的程序。请随意下载 PgDumpSort,它按主键对转储进行排序。 Java 默认内存为 512MB,每个表最多可处理 1000 万条记录,因为记录信息(主键值、文件偏移量)保存在内存中。
你使用这个小Java程序,例如,
你会得到一个名为“db-sorted.sql”的文件,或者指定输出文件名:
文件中
排序后的数据位于“db-2013-06-06.sql”这样的 您可以使用 diff 创建补丁
这允许您创建通常更小的增量备份。要恢复文件,您必须使用
(源代码位于 JAR 文件内)将补丁应用到原始文件
My solution was to code an own program for the pg_dump output. Feel free to download PgDumpSort which sorts the dump by primary key. With the java default memory of 512MB it should work with up to 10 million records per table, since the record info (primary key value, file offsets) are held in memory.
You use this little Java program e.g. with
And you get a file named "db-sorted.sql", or specify the output file name:
And the sorted data is in a file like "db-2013-06-06.sql"
Now you can create patches using diff
This allows you to create incremental backup which are usually way smaller. To restore the files you have to apply the patch to the original file using
(Source code is inside of the JAR file)
这是该问题的另一种解决方案: https://github.com/tigra564/pgdump-sort
它允许对 DDL 和 DML 进行排序,包括将易失性值(如序列值)重置为一些规范值,以最大限度地减少结果差异。
Here is another solution to the problem: https://github.com/tigra564/pgdump-sort
It allows sorting both DDL and DML including resetting volatile values (like sequence values) to some canonical values to minimize the resulting diff.
如果
您可以按确定的顺序转储 CSV 格式的数据像这样:
请参阅 COPY (v14)
If
you can dump the data in CSV format in a determined order like this:
See COPY (v14)
可能不值得花精力去解析转储。
将 DUMP2 恢复到临时数据库并按正确的顺序转储临时数据库会快得多。
It's probably not worth the effort to parse out the dump.
It will be far, far quicker to restore DUMP2 into a temporary database and dump the temporary in the right order.