当前位置：文江博客话题详情

如何生成各种数据库转储

发布于 2024-10-01 14:32:23 字数 267 浏览 1 评论 0原文

我有一个 CSV 文件，想要生成 sqlite、mysql、postgres、oracle 和 mssql 的数据转储。

是否有通用的 API（最好是基于 Python 的）来执行此操作？

我可以使用 ORM 将数据插入每个数据库，然后导出转储，但这需要安装每个数据库。这似乎也浪费了资源 - 这些 CSV 文件很大。

由于每个数据库的差异，我对尝试自己编写 SQL 持谨慎态度。理想情况下，有人已经完成了这项艰巨的工作，但我还没有找到。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

满天都是小星星 2024-10-08 14:32:23

SQLAlchemy 是一个数据库库（以及 ORM 功能）支持 SQL 生成 http://www.sqlalchemy.org/docs/dialects/index.html">您提到的所有不同数据库（以及更多）的方言。

在正常使用中，您可以创建一个 SQL 表达式/指令（使用 schema.Table 对象），创建一个数据库引擎，然后将指令绑定到引擎上，生成SQL。

然而，发动机并不是绝对必需的；每个方言都有一个编译器，可以在没有连接的情况下生成 SQL；唯一需要注意的是，您需要阻止它像默认情况下那样生成绑定参数：

from sqlalchemy.sql import expression, compiler
from sqlalchemy import schema, types
import csv

# example for mssql
from sqlalchemy.dialects.mssql import base
dialect = base.dialect()
compiler_cls = dialect.statement_compiler
class NonBindingSQLCompiler(compiler_cls):
    def _create_crud_bind_param(self, col, value, required=False):
        # Don't do what we're called; return a literal value rather than binding
        return self.render_literal_value(value, col.type)

recipe_table = schema.Table("recipe", schema.MetaData(), schema.Column("name", types.String(50), primary_key=True), schema.Column("culture", types.String(50)))

for row in [{"name": "fudge", "culture": "america"}]: # csv.DictReader(open("x.csv", "r")):
    insert = expression.insert(recipe_table, row, inline=True)
    c = NonBindingSQLCompiler(dialect, insert)
    c.compile()
    sql = str(c)
    print sql

上面的示例实际上有效；它假设您知道目标数据库表架构；它应该很容易适应从 CSV 导入并为多种目标数据库方言生成。

SQLAlchemy is a database library that (as well as ORM functionality) supports SQL generation in the dialects of the all the different databases you mention (and more).

In normal use, you could create a SQL expression / instruction (using a schema.Table object), create a database engine, and then bind the instruction to the engine, to generate the SQL.

However, the engine is not strictly necessary; the dialects each have a compiler that can generate the SQL without a connection; the only caveat being that you need to stop it from generating bind parameters as it does by default:

from sqlalchemy.sql import expression, compiler
from sqlalchemy import schema, types
import csv

# example for mssql
from sqlalchemy.dialects.mssql import base
dialect = base.dialect()
compiler_cls = dialect.statement_compiler
class NonBindingSQLCompiler(compiler_cls):
    def _create_crud_bind_param(self, col, value, required=False):
        # Don't do what we're called; return a literal value rather than binding
        return self.render_literal_value(value, col.type)

recipe_table = schema.Table("recipe", schema.MetaData(), schema.Column("name", types.String(50), primary_key=True), schema.Column("culture", types.String(50)))

for row in [{"name": "fudge", "culture": "america"}]: # csv.DictReader(open("x.csv", "r")):
    insert = expression.insert(recipe_table, row, inline=True)
    c = NonBindingSQLCompiler(dialect, insert)
    c.compile()
    sql = str(c)
    print sql

The above example actually works; it assumes you know the target database table schema; it should be easily adaptable to import from a CSV and generate for multiple target database dialects.

回复收藏 0 原文

孤独患者 2024-10-08 14:32:23

我不是数据库专家，但据我所知，Python 中没有一个通用的 API 可以开箱即用地满足您的要求。 PEP 249 定义了访问数据库的模块应该使用的 API并且 AFAIK 至少被 MySQL 和 Postgre python 模块使用（此处和此处），这也许是一个起点。

然而，我自己尝试走的路是另一条路：

将 CVS n 导入 MySQL（这只是因为 MySQL 是我最了解的，而且网上有大量材料，例如这个非常简单的食谱，但是你可以开始执行相同的过程来自另一个数据库）。
生成 MySQL 转储。
处理 MySQL 转储文件，以便修改它以满足 SQLite（和其他）语法。

用于处理转储文件的脚本可能非常紧凑，但如果您使用正则表达式来解析行，它们可能会有些棘手。下面是一个示例脚本 MySQL → SQLite，我只是从此页面粘贴：

#!/bin/sh 
mysqldump --compact --compatible=ansi --default-character-set=binary mydbname | 
grep -v ' KEY "' | 
grep -v ' UNIQUE KEY "' | 
perl -e 'local $/;$_=<>;s/,\n\)/\n\)/gs;print "begin;\n";print;print "commit;\n"' | 
perl -pe ' 
if (/^(INSERT.+?)\(/) { 
$a=$1; 
s/\\'\''/'\'\''/g; 
s/\\n/\n/g; 
s/\),\(/\);\n$a\(/g; 
} 
' | 
sqlite3 output.db

您可以用 python 编写脚本（在这种情况下，您应该查看 re.编译以提高性能）。

我选择背后的理由是：

我得到了 mysql 为我完成的繁重工作[导入并因此进行数据一致性检查+生成起始 SQL 文件]，
我只需要安装了一个数据库。
我可以完全控制正在发生的事情以及微调流程的可能性。
我可以以这样的方式构建我的脚本，这样就可以很容易地将其扩展到其他数据库（基本上我会像解析器一样构建它，它可以识别各个字段+一组语法 - 每个数据库一个 - 我可以通过选择命令行选项）
关于 SQL 风格之间差异的文档比单个数据库导入/导出库的文档多得多。

编辑：基于模板的方法

如果出于某种原因您没有足够的信心自己编写 SQL，您可以使用某种基于模板的脚本。我将这样做：

在您计划使用的所有 4 个数据库中导入并生成表的转储。
对于每个数据库，保存转储的初始部分（带有模式声明和所有其余部分）和单个插入指令。
编写一个 python 脚本，对于每个数据库导出，将输出转储的“标头”以及相同的“已保存行”，您将在其中以编程方式替换 CVS 文件中每一行的值。

这种方法的明显缺点是您的“模板”仅适用于一张表。它的最大优点是编写这样的脚本将非常容易和快速。

至少有一点！

I am no database wizard, but AFAIK in Python there's not a common API that would do out-of-the-box what you ask for. There is PEP 249 that defines an API that should be used by modules accessing DB's and that AFAIK is used at least by the MySQL and Postgre python modules (here and here) and that perhaps could be a starting point.

The road I would attempt to follow myself - however - would be another one:

Import the CVS nto MySQL (this is just because MySQL is the one I know best and there are tons of material on the net, as for example this very easy recipe, but you could do the same procedure starting from another database).
Generate the MySQL dump.
Process the MySQL dump file in order to modify it to meet SQLite (and others) syntax.

The scripts for processing the dump file could be very compact, although they might somehow be tricky if you use regex for parsing the lines. Here's an example script MySQL → SQLite that I simply pasted from this page:

#!/bin/sh 
mysqldump --compact --compatible=ansi --default-character-set=binary mydbname | 
grep -v ' KEY "' | 
grep -v ' UNIQUE KEY "' | 
perl -e 'local $/;$_=<>;s/,\n\)/\n\)/gs;print "begin;\n";print;print "commit;\n"' | 
perl -pe ' 
if (/^(INSERT.+?)\(/) { 
$a=$1; 
s/\\'\''/'\'\''/g; 
s/\\n/\n/g; 
s/\),\(/\);\n$a\(/g; 
} 
' | 
sqlite3 output.db

You could write your script in python (in which case you should have a look to re.compile for performance).

The rationale behind my choice would be:

I get the heavy-lifting [importing and therefore data consistency checks + generating starting SQL file] done for me by mysql
I only have to have one database installed.
I have full control on what is happening and the possibility to fine-tune the process.
I can structure my script in such a way that it will be very easy to extend it for other databases (basically I would structure it like a parser that recognises individual fields + a set of grammars - one for each database - that I can select via command-line option)
There is much more documentation on the differences between SQL flavours than on single DB import/export libraries.

EDIT: A template-based approach

If for any reason you don't feel confident enough to write the SQL yourself, you could use a sort of template-based script. Here's how I would do it:

Import and generate a dump of the table in all the 4 DB you are planning to use.
For each DB save the initial part of the dump (with the schema declaration and all the rest) and a single insert instruction.
Write a python script that - for each DB export - will output the "header" of the dump plus the same "saved line" into which you will programmatically replace the values for each line in your CVS file.

The obvious drawback of this approach is that your "template" will only work for one table. The strongest point of it is that writing such script would be extremely easy and quick.

HTH at least a bit!

回复收藏 0 原文