将数据从一种数据库方案迁移到另一种数据库方案时的最佳实践?

发布于 2024-08-12 04:39:27 字数 197 浏览 3 评论 0原文

很多时候,当我在处理一个项目时,我发现自己正在查看数据库方案,并且必须导出数据才能使用新方案。

很多时候,数据库中存储的数据相当粗糙。我的意思是它存储了许多未经过滤的字符。我发现我自己编写了自定义 php 脚本来过滤这些信息并创建一个漂亮干净的 UTF-8 CSV 文件,然后将其重新导入到我的新数据库中。

我想知道是否有更好的方法来处理这个问题?

Often times when I am working on a project, I find my self looking at the database scheme and having to export the data to work with the new scheme.

Lots of times there has been a database where the data stored was fairly crude. What I mean by that is that its stored with lots of unfiltered characters. I find my self writing custom php scripts to filter through this information and create a nice clean UTF-8 CSV file that I then reimport into my new database.

I'd like to know if there are better ways to handle this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

栀梦 2024-08-19 04:39:27

我建议使用 ETL 工具,或者至少在移动数据时遵循 ETL 实践。考虑到您已经在进行清洁,您可以遵循整个 ECCD 路径——提取、清洁、整合、交付。如果您自己进行清理,请考虑保存中间 csv 文件以用于调试和审核目的。

1. Extract (as is, junk included) to file_1
2. Clean file_1 --> file_2
3. Conform file_2 --> file_3
4. Deliver file_3 --> DB tables

如果您将文件 1-3 和脚本的文档版本存档,您将能够在出现错误时回溯。

ETL 工具(例如 Microsoft SSIS、Oracle Data Integrator、Pentaho Data Integrator)连接到各种数据源并提供大量转换和分析任务。

I would suggest using an ETL tool, or at least following ETL practices when moving data. Considering that you are already cleaning, you may follow the whole ECCD path -- extract, clean, conform, deliver. If you do your own cleaning, consider saving intermediate csv files for debug and audit purpose.

1. Extract (as is, junk included) to file_1
2. Clean file_1 --> file_2
3. Conform file_2 --> file_3
4. Deliver file_3 --> DB tables

If you archive files 1-3 and document versions of your scripts, you will be able to backtrack in case of a bug.

ETL tools -- like Microsoft SSIS, Oracle Data Integrator, Pentaho Data Integrator -- connect to various data sources and offer plenty of transformation and profiling tasks.

多情出卖 2024-08-19 04:39:27

没有人回答这个问题,但我曾经需要快速迁移数据库并最终使用 sqlautocode,这是一个从现有数据库自动生成(python orm)模型的工具 - 该模型使用伟大的 sqlalchemy< /a> orm 库。它甚至生成一些示例代码,以开始使用......(见下文)

令人惊讶的是,它开箱即用。您没有完整的迁移,但有一种以编程方式访问所有表的简单方法(在 python 中)。

我没有在那个项目中这样做,但是您当然也可以为目标数据库自动生成 orm 层,然后编写一个脚本,将正确的行传输到所需的结构中。

一旦你将数据库内容导入Python,你将能够处理u'unicode',即使这需要一些尝试,具体取决于实际的粗糙度......

示例代码:

# some example usage
if __name__ == '__main__':
db = create_engine(u'mysql://username:password@localhost/dbname')
metadata.bind = db

# fetch first 10 items from address_book
s = customers.select().limit(10)
rs = s.execute()
for row in rs:
    print row

No one answer to this one, but i once needed to quickly migrate a database and ended up using sqlautocode, which is a tool to autogenerate a (python orm) model from an existing database - the model uses the great sqlalchemy orm library. It even generates some sample code, to get started ... (see below)

Amazingly, it worked out of the box. You do not have a full migration, but an easy way to programmatically access all your tables (in python).

I didn't do it at that project, but you could of course autogenerate your orm layer for the target DB as well, then write a script, which transfers the right rows over into the desired structure.

Once you get your DB content into python, you will be able to deal with u'unicode', even if it will take some attepts, dependent on the actual crudeness ...

Example code:

# some example usage
if __name__ == '__main__':
db = create_engine(u'mysql://username:password@localhost/dbname')
metadata.bind = db

# fetch first 10 items from address_book
s = customers.select().limit(10)
rs = s.execute()
for row in rs:
    print row
榆西 2024-08-19 04:39:27

您可以考虑Logstash

logstash 是一个用于管理事件和日志的工具。您可以使用它来收集日志、解析日志并存储它们以供以后使用(例如用于搜索)

Logstash 像管道一样操作每个事件/日志:过滤器|输出。
Logstash 有许多输入插件来接受不同的源/格式,您可以使用过滤器来解析源数据,然后输出到您需要的多种输出/格式。

You can consider Logstash.

logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching)

Logstash operate every single event/log like pipe: input | filter | output.
Logstash have many input plugins to accept different sources/formats, and you can use filter to parse your source data then output to multiple outputs/formats which you need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文