比较不同类型的两个数据库表并比较它们的数据的最佳方法?
我有两个数据库表,一张在 MYSQL 中,一张在 MSSQL 中。两者都有相似的数据,并且一个基于另一个的数据。它们位于两个不同的数据库中,因为一个是受管理的远程系统,而本地系统是 Drupal 安装,我使用它通过自定义模块以更友好的方式显示数据。
例如,我在 MSSQL 中有一个这种结构的表:
ID | Title | Description | Other fields I don't care about
基于从该表提取数据,我在 MYSQL 中生成一个表:
local_id | remote_id | title | description
当模块初始化时,它会从 MSSQL 表中进行选择并生成记录并填充本地数据库。 Remote_id是MSSQL数据库中的ID字段,因此我们可以一起引用这两条记录。
我需要同步这些数据,在本地删除远程表中不再存在的记录并创建本地不存在的新记录,并更新所有行信息。
问题是,这种情况需要至少 2 个不同的事务以及可能的按行事务。示例:
将本地同步到远程并删除不存在的远程记录:
Select remote_id from local_table;
For Each remote_id ( select ID, title, description FROM remote_table where ID = remote_id )
If record exists
UPDATE local_table WHERE remote_id = row_id
Else
DELETE FROM local_table where remote_id = row_id
然后我们至少需要一个其他事务来获取新记录(如果我没有这样做,我也可以在这里更新)在上一个循环中):
Select ID, title, description from remote_table;
For each ID ( Select remote_id from local_table )
If does not exist
INSERT INTO local_table (VALUES)
所以这是大量的数据库活动。如果表的类型相同,那就更容易了,但事实上这是我知道如何做到这一点的唯一方法。有更好的办法吗?我是否可以将两个结果集拉入关联数组并以这种方式进行比较,并且仅执行删除和创建所需的事务?我不确定。
I have two database tables, one in MYSQL and one in MSSQL. Both have similar data, and one is based on data from another. They're in two different databases because one is a remote system that is administered and the local system is a Drupal installation which I'm using to show the data in a more friendly manner through a custom module.
For example, I have a table of this sort of structure in MSSQL:
ID | Title | Description | Other fields I don't care about
And based on pulling data from this table I generate a table in MYSQL:
local_id | remote_id | title | description
When the module is initialized, it goes out and does a select from the MSSQL table and generates records and populates the local database. Remote_id is the ID field in the MSSQL database so we can reference the two records together.
I need to sync up this data, deleting records locally which no longer exist on the remote table and creating new records which do not exist locally, and also update all rows information.
Problem is, that sort of requires at least 2 different transactions with possible by-row transactions as well. Example:
To sync local to Remote and remove non-existent remote records:
Select remote_id from local_table;
For Each remote_id ( select ID, title, description FROM remote_table where ID = remote_id )
If record exists
UPDATE local_table WHERE remote_id = row_id
Else
DELETE FROM local_table where remote_id = row_id
Then we need at least one other transaction to get new records (I could update here too if I didn't do it in the previous loop):
Select ID, title, description from remote_table;
For each ID ( Select remote_id from local_table )
If does not exist
INSERT INTO local_table (VALUES)
So that's a lot of db activity. It would be easier if the tables were of the same type but as it is that's the only way I know how to do it. Is there a better way? Could I just pull both result sets into an associative array and compare that way and only do the transactions necessary to remove and create? I'm unsure.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据您所使用的系统,有很多方法可以做到这一点。
我所做的第一个假设是您有 2 个数据库并且您想要在这 2 个数据库之间同步数据
也就是说,MSSQL 数据库必须从 MySQL 中提取数据,反之亦然。
您使用关联数组的方法很好,但是如果表中有 100 列怎么办? (在你的情况下不是,但该方法不能证明未来)
因此,要更新 1 行,您需要进行“n”列比较,如果有 100 行,则将进行 100*n 比较
看看 MySQL REPLACE, INSERT INTO .. ON DUPLICATE KEY 子句可能对您有帮助 - 我不知道知道MSSQL中是否有这样的子句
您可以做其他事情,例如 - 在每个数据库表中都有一个“last_updated”列 - 每当表中的列被更新时,这个时间戳字段必须更新
通过这种方式,您可以判断任一数据库表中的行是否已更新(通过将其与旧时间戳值进行比较),并且仅更新这些行
逻辑将在这些行中
there are a lot of ways to do this based on the system you house.
The first assumption i am making is that you have 2 databases and you want to sync data between these 2
that is MSSQL db must pull data from MySQL and vice versa
Your approach of using associative arrays is good but what if there are 100 columns in the table? ( in your case it is not but the approach is not future proof)
So to update 1 row you need to make "n" column comparisons if there are 100 rows, then there will be 100*n comparisons
Have a look at MySQL REPLACE, INSERT INTO .. ON DUPLICATE KEY clauses that might help you - i dont know if there are such clauses in MSSQL
You can do other things like - have a "last_updated" column in each database table - whenever a column in the table gets updated, this time-stamp field must be updated
This way you can tell if a row in either database table was updated ( by comparing it to your old timestamp value) and only update those rows
logic would be in these lines
您可以执行基于集合的操作,而不是执行逐行操作。例如,
为此,您需要添加链接服务器,请参阅 sp_addlinkedserver。您可以创建从 SQL Server 到页面上列出的任何服务器的链接。这包括任何具有 MySQL 那样的 ODBC 驱动程序的数据库。
我不知道 MySQL 是否能够做相反的事情。
Rather than do a row by row operations you could do set based operations. e.g.
In order to do that you'll need to add a linked server See sp_addlinkedserver. You can create a link from SQL Server to any server listed on the page. This includes any Database that has an ODBC driver which MySQL does.
I am not aware if MySQL is capable of doing the reverse.