应用程序之间数据转换的最佳实践是什么
我想知道这对于 stackoverflow 来说是否是一个过于主观的问题,但无论如何我都会尝试一下。
应用程序之间的数据迁移是否有通用/最佳实践?假设我有用 Java/J2EE 编写的应用程序 A 并连接到 PostgreSQL 数据库,以及用 Ruby/Rails 编写的应用程序 B 并连接到 MySQL 数据库。
我想将我的数据从应用程序A迁移到应用程序B,A的表结构和数据模型与B完全不同。所以我想从A中提取信息,更改其结构并将其插入到B中。
另外我还有现有的应用程序 B 中的信息与应用程序 A 中的信息相关,例如基于两个应用程序中常见的 ID
我尝试编写一些花哨的 sql 脚本,但进展缓慢。
上次我遇到这样的项目时,我只是编写了大量代码来处理迁移。我想知道这可能有最佳实践吗?我认为这是开发人员经常做的工作。也许有可用的工具或框架?
I wonder if this might be a too subjective question for stackoverflow but ill give it a go anyway.
Is there a common/best practice for data migration between applications? Lets say I have Application A written in Java/J2EE and connected to a PostgreSQL database and Application B written in Ruby/Rails and connected to a MySQL database.
I want to migrate my data from Application A to Application B, the table structure and thus the datamodel of A is completely different from B. So I want to extract information from A, change its structure and insert it into B.
Also I have existing information in application B which has a relation with information from application A, based for example on a ID common in both applications
I tried writing a few fancy sql scripts but that goes nowhere fast.
The last time I faced a project like this I just wrote big chuck of code to handle the migration. Is there maybe a best practice for this I wonder? I reckon this is a job done quite often by developers. Maybe there are tools or frameworks available?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
可能不是单一的最佳实践,但一旦您选择了一种方法,就会出现一系列最佳实践。
一种策略是将目标平台中的数据引入相同(或非常接近)的模型,然后在目标平台内进行转换。
例如,如果目标是 SQL Server,我将在目标服务器上创建另一个数据库,并使用从表到表的直接数据副本(数据类型是您要注意的主要内容),并简单地使用针对 database2.user 的查询。 table_names 以填充目标数据模型。
这消除了您可能使用的任何 ETL 工具中的异构源/目标问题,并允许您在数据库2上创建一些可能最适合您的转换的附加索引。
此外,您的转换将采用直接 SQL,允许同时连接到源和目标,而无需任何服务器间延迟或带宽。
如果表中有二进制数据或类似数据,显然事情会变得更加复杂。
Probably not a single best practice, but once you pick an approach, a collection of best practices.
One strategy is to bring the data in the same model (or very close) in the destination platform and then transform within the destination platform.
For instance, if the destination was SQL Server, I would create another database on the destination server with straight data copies from tables to tables (data types are the main thing you are watching out for there) and simply use queries against database2.user.table_names in order to populate the destination data model.
This eliminates issues with heterogenous source/destination in whatever choice of ETL tool you might be using and allows you to make some additional indexes on the database2 which might be optimal for your conversion.
Also, your conversion would be in straight SQL, allowing a joins to both source and destination simultaneously without any inter-server latency or bandwidth.
If you have binary data in your tables or anything like that, obviously things get a lot more complicated.
广泛的问题,广泛的答案?
到您的心中内容通过允许步骤 1 基于原始内容和副本的当前内容,可扩展到自动化流程。
Broad question, broad answer?
Scales to automated processes by allowing step 1 to be based on the current contents of both the original and copy.