提取数据库样本 - Postgresql

发布于 2025-01-03 16:30:04 字数 234 浏览 1 评论 0原文

我们有一个半大型数据库,需要很长时间才能在我们的个人开发机器上本地导入。我想知道是否有任何工具可以连接到数据库,进行一些分析并提取数据库样本,同时通过传入的参数保留所有关系。类似于 pg_sample --rough_size 1000 --dbhost mydbhost --dbuname uname --dbpword pword 这将创建一个转储,我可以将其导入到我的数据库中以进行测试和开发。但不会花 45 分钟就能完成。

谢谢

we have a semi-large database that takes a long time to import locally on our individual dev machines. I was wondering if there were any tools that could connect to a database, do a little analysis and pull a sample of the database while keeping all the relations via a passed in parameter. Something like pg_sample --rough_size 1000 --dbhost mydbhost --dbuname uname --dbpword pword
this creates a dump that i can import into my database for testing and development. But won't take 45 minutes to do it.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

放我走吧 2025-01-10 16:30:04

我写了这样一个工具: https://github.com/mla/pg_sample

从自述文件中:

pg_sample 是一个实用程序,用于从
更大的 PostgreSQL 数据库。输出和命令行选项
与 pg_dump 备份实用程序非常相似(尽管只有
支持纯文本格式)。

生成的示例数据库包括原始数据库中的所有表,
保持引用完整性,并支持循环依赖。

I wrote such a tool: https://github.com/mla/pg_sample

From the README:

pg_sample is a utility for exporting a small, sample dataset from a
larger PostgreSQL database. The output and command-line options
closely resemble the pg_dump backup utility (although only the
plain-text format is supported).

The sample database produced includes all tables from the original,
maintains referential integrity, and supports circular dependencies.

爺獨霸怡葒院 2025-01-10 16:30:04

我建议您研究 pg_dump --format=custompg_restore -j 12custom 格式允许并行恢复(如果需要,可以转储到原始 SQL,因此您不会失去任何功能)。 pg_restore-j 参数使其并行运行恢复。这可以提供显着的加速。特别是如果你已经把你的maintenance_work_mem调大了。

由于 Erwin 上面概述的原因,它不会直接解决您提出的问题,并且因为它需要一定程度的知识来了解什么是重要的,而什么是不重要的,而这些根本不在模式中。这会占用开发人员的时间,通常应该在构建架构的同时完成。

I suggest you investigate pg_dump --format=custom and pg_restore -j 12. The custom format allows parallel restores (and can be dumped to raw SQL if desired, so you aren't loosing any functionality here). The -j parameter to pg_restore causes it to run the restore in parallel. This can provide a substantial speed up. Especially if you have cranked your maintenance_work_mem up nice and large.

It won't solve the problem you asked about directly for the reasons outlined by Erwin above, and because it requires a level of knowledge about what is important to represent and what isn't that simply isn't in the schema. It's something that takes up developer time and should generally be done at the same time the schema is being built in the first place.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文