提取数据库样本 - Postgresql
我们有一个半大型数据库,需要很长时间才能在我们的个人开发机器上本地导入。我想知道是否有任何工具可以连接到数据库,进行一些分析并提取数据库样本,同时通过传入的参数保留所有关系。类似于 pg_sample --rough_size 1000 --dbhost mydbhost --dbuname uname --dbpword pword 这将创建一个转储,我可以将其导入到我的数据库中以进行测试和开发。但不会花 45 分钟就能完成。
谢谢
we have a semi-large database that takes a long time to import locally on our individual dev machines. I was wondering if there were any tools that could connect to a database, do a little analysis and pull a sample of the database while keeping all the relations via a passed in parameter. Something like pg_sample --rough_size 1000 --dbhost mydbhost --dbuname uname --dbpword pword
this creates a dump that i can import into my database for testing and development. But won't take 45 minutes to do it.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我写了这样一个工具: https://github.com/mla/pg_sample
从自述文件中:
I wrote such a tool: https://github.com/mla/pg_sample
From the README:
我建议您研究
pg_dump --format=custom
和pg_restore -j 12
。custom
格式允许并行恢复(如果需要,可以转储到原始 SQL,因此您不会失去任何功能)。pg_restore
的-j
参数使其并行运行恢复。这可以提供显着的加速。特别是如果你已经把你的maintenance_work_mem调大了。由于 Erwin 上面概述的原因,它不会直接解决您提出的问题,并且因为它需要一定程度的知识来了解什么是重要的,而什么是不重要的,而这些根本不在模式中。这会占用开发人员的时间,通常应该在构建架构的同时完成。
I suggest you investigate
pg_dump --format=custom
andpg_restore -j 12
. Thecustom
format allows parallel restores (and can be dumped to raw SQL if desired, so you aren't loosing any functionality here). The-j
parameter topg_restore
causes it to run the restore in parallel. This can provide a substantial speed up. Especially if you have cranked your maintenance_work_mem up nice and large.It won't solve the problem you asked about directly for the reasons outlined by Erwin above, and because it requires a level of knowledge about what is important to represent and what isn't that simply isn't in the schema. It's something that takes up developer time and should generally be done at the same time the schema is being built in the first place.