当前位置：文江博客话题详情

将数据从一种数据库方案迁移到另一种数据库方案时的最佳实践？

发布于 2024-08-12 04:39:27 字数 197 浏览 3 评论 0原文

很多时候，当我在处理一个项目时，我发现自己正在查看数据库方案，并且必须导出数据才能使用新方案。

很多时候，数据库中存储的数据相当粗糙。我的意思是它存储了许多未经过滤的字符。我发现我自己编写了自定义 php 脚本来过滤这些信息并创建一个漂亮干净的 UTF-8 CSV 文件，然后将其重新导入到我的新数据库中。

我想知道是否有更好的方法来处理这个问题？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

栀梦 2024-08-19 04:39:27

我建议使用 ETL 工具，或者至少在移动数据时遵循 ETL 实践。考虑到您已经在进行清洁，您可以遵循整个 ECCD 路径——提取、清洁、整合、交付。如果您自己进行清理，请考虑保存中间 csv 文件以用于调试和审核目的。

1. Extract (as is, junk included) to file_1
2. Clean file_1 --> file_2
3. Conform file_2 --> file_3
4. Deliver file_3 --> DB tables

如果您将文件 1-3 和脚本的文档版本存档，您将能够在出现错误时回溯。

ETL 工具（例如 Microsoft SSIS、Oracle Data Integrator、Pentaho Data Integrator）连接到各种数据源并提供大量转换和分析任务。

I would suggest using an ETL tool, or at least following ETL practices when moving data. Considering that you are already cleaning, you may follow the whole ECCD path -- extract, clean, conform, deliver. If you do your own cleaning, consider saving intermediate csv files for debug and audit purpose.

1. Extract (as is, junk included) to file_1
2. Clean file_1 --> file_2
3. Conform file_2 --> file_3
4. Deliver file_3 --> DB tables

If you archive files 1-3 and document versions of your scripts, you will be able to backtrack in case of a bug.

ETL tools -- like Microsoft SSIS, Oracle Data Integrator, Pentaho Data Integrator -- connect to various data sources and offer plenty of transformation and profiling tasks.

回复收藏 0 原文

多情出卖 2024-08-19 04:39:27

没有人回答这个问题，但我曾经需要快速迁移数据库并最终使用 sqlautocode，这是一个从现有数据库自动生成（python orm）模型的工具 - 该模型使用伟大的 sqlalchemy< /a> orm 库。它甚至生成一些示例代码，以开始使用......（见下文）

令人惊讶的是，它开箱即用。您没有完整的迁移，但有一种以编程方式访问所有表的简单方法（在 python 中）。

我没有在那个项目中这样做，但是您当然也可以为目标数据库自动生成 orm 层，然后编写一个脚本，将正确的行传输到所需的结构中。

一旦你将数据库内容导入Python，你将能够处理u'unicode'，即使这需要一些尝试，具体取决于实际的粗糙度......

示例代码：

# some example usage
if __name__ == '__main__':
db = create_engine(u'mysql://username:password@localhost/dbname')
metadata.bind = db

# fetch first 10 items from address_book
s = customers.select().limit(10)
rs = s.execute()
for row in rs:
    print row

No one answer to this one, but i once needed to quickly migrate a database and ended up using sqlautocode, which is a tool to autogenerate a (python orm) model from an existing database - the model uses the great sqlalchemy orm library. It even generates some sample code, to get started ... (see below)

Amazingly, it worked out of the box. You do not have a full migration, but an easy way to programmatically access all your tables (in python).

I didn't do it at that project, but you could of course autogenerate your orm layer for the target DB as well, then write a script, which transfers the right rows over into the desired structure.

Once you get your DB content into python, you will be able to deal with u'unicode', even if it will take some attepts, dependent on the actual crudeness ...

Example code:

# some example usage
if __name__ == '__main__':
db = create_engine(u'mysql://username:password@localhost/dbname')
metadata.bind = db

# fetch first 10 items from address_book
s = customers.select().limit(10)
rs = s.execute()
for row in rs:
    print row

回复收藏 0 原文