将mysql表数据从client同步到master
我遇到了一个小问题。故事是这样的:
我有一个文档归档系统(用 PHP 编写),在多个客户端(目前有 23 个)上运行。在他们的系统上他们只有他们的文档。每天晚上,它们都需要“同步”到现场的主数据库(中央服务器)。我可以从中央服务器访问每个 MySQL 数据库,因此连接到它们没有问题。
我有一个连接到客户端数据库的脚本,从表中选择同步列 = '0000-00-00 00:00:00' (默认表示未同步)的所有条目。然后,我将迭代每个记录,将其插入到中央服务器,并将客户端数据库记录上的同步时间设置为执行脚本的时间。这是可行的,但显然多个查询会产生很大的开销,我现在才注意到这些问题。
每个客户每天最多可以生成 2000 - 3000 份文档。对于这些大量的数据,花费的时间太长(1 秒/2 个文档)。
对于我的问题有更好的解决方案吗?最好是 PHP 脚本解决方案,因为我需要记录日志来检查一切是否成功。
谢谢
编辑: 我当前的流程是:
- 选择所有未同步的数据
- 开始事务
- 将记录插入中央数据库服务器
- 从客户端选择文档记录
- 将文档插入中央数据库服务器
- 更新客户端上的同步列
- 更新服务器上的同步列
- 提交事务
这是一个脚本在中央服务器上运行。 现在我想起来了,我可以删除步骤 7 并将其作为步骤 5 的一部分,但这不会减少太多处理时间。
I have run in to a slight problem. The story goes as follows:
I have a document archive system (written in PHP) which runs at multiple clients (23 at present). On their system they only have their documents. Every night, they all need to be 'synced' to a master database on site (central server). I have access to each MySQL database from the central server, so connecting to them is no problem.
I have a script that connects to the client database, selects all the entries from a table where the sync column = '0000-00-00 00:00:00' (default to indicate it wasnt synced). I would then iterate through each record, insert it to the central server, and set the sync time on the client database record to the time the script was executed. This works, but obviously has a large overhead with the multiple queries and I have just noticed the problems now.
Each client can generate up to 2000 - 3000 odd documents a day. With these large numbers it is taking way too long (1sec / 2documents).
Is there a better solution to my problem? Preferably a PHP scripted solution as I need to do logs to check if everything was succesful.
Thanks
EDIT:
My current process is:
- Select all the un-synced data
- Begin transaction
- Insert record into central database server
- Select the document record from the client
- Insert the document into the central database server
- Update sync column on client
- Update sync column on server
- Commit transaction
This is a script run on the central server.
Now that I come to think of it, i can remove step 7 and have it part of step 5, but that wont reduce the processing time by much.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我建议使用 auto_increment_increment保持所有 id 在所有服务器上都是唯一的。然后,您需要做的就是
SELECT * FROM blah WHEREsync = '0000-00-00 00:00:00'
,然后生成插入语句并执行它们。您不必处理冲突主键的任何类型的冲突解决方案...至于长查询时间,您需要查看数据的大小。如果每条记录都很大(几百 kb 以上),则需要时间...
一种选择可能是创建一个 每个子服务器表的联合 表。然后在master上用SQL完成整个事情。
INSERT INTO master_table SELECT * FROM child_1_table WHEREsync = '0000-00-00 00:00:00'
...您可以避免将所有数据提取到 PHP 中。您仍然可以运行一些检查以确保一切顺利,并且您仍然可以登录,因为所有内容仍然是从 PHP 执行的......I'd suggest using auto_increment_increment to keep all the ids unique over all of the servers. Then, all you need to do is a
SELECT * FROM blah WHERE sync = '0000-00-00 00:00:00'
, and then generate the insert statements and execute them. You won't have to deal with any kind of conflict resolution for conflicting primary keys...As for the long query times, you need to look at the size of your data. If each record is sizable (a few hundred kb +), it's going to take time...
One option may be to create a federated table for each child server's table. Then do the whole thing in SQL on the master.
INSERT INTO master_table SELECT * FROM child_1_table WHERE sync = '0000-00-00 00:00:00'
... You get to avoid pulling all of the data into PHP. You can still run some checks to make sure everything went well, and you can still log since everything is still executed from PHP land...基本方法听起来不错 - 但执行一项操作花费 0.5 秒的时间实在是太长了 - 您通过网络拉取了多少数据?整个图像?手术中你还做其他事情吗?同步列上有索引吗?
通过在数据库上导出未同步的数据,您可以获得一个小小的好处:
这将需要 3 个脚本 - 2 个位于源服务器上(一个用于准备和发送数据,一个用于标记为完成),另一个位于源服务器上。复制服务器轮询数据并通知结果。
但是,如果您仅复制有关图像的元数据(而不是图像本身),这可能不会对性能产生重大影响,因为性能似乎高得离谱。
C.
The basic method sounds OK - but taking 0.5 seconds to do one operation is ridiculously excessive - how much data are you pulling across the network? The entire image? Are you doing anything else in the operation? Is there an index on the sync column?
You could get a small benefit by doing an export of the un-synced data on the database:
This would require 3 scripts - 2 on the origin server (one to prepare and send the data, one to flag as complete) and one on the replicated server to poll the data AND notify outcome.
But this is probably not going to make big inroads into the performance which seems absurdly high if you are only replicating meta-data about the image (rather than the image itself).
C.
我知道您更喜欢基于 PHP 的解决方案,但您可能想查看 Microsoft Sync Framework -
http://msdn.microsoft.com/en-in/sync/default(en-us).aspx
这将需要在.net中编写同步模块,但是有在同步逻辑和异常处理(网络故障、同步冲突等)方面具有巨大优势,这将减少您的时间。
只要有 .net 的数据库连接器,该框架也可以处理非 SQL Server 数据库。 Mysql 应该很容易支持 - 只需从以下链接中获取示例 -
http://code.msdn.microsoft.com/sync/Release/ProjectReleases.aspx?ReleaseId=4835
并将其适配到 mysql。
I know you prefer a PHP based solution, but you might want to check out Microsoft Sync Framework -
http://msdn.microsoft.com/en-in/sync/default(en-us).aspx
This will necessitate the sync module to be written in .net, but there is a huge advantage in terms of sync logic and exception handling (network failure, sync conflicts, etc), which will reduce time for you.
The framework handles non-sql server databases as well, as long as there is a database connector for .net. Mysql should be supported quite easily - just take a sample from the following link -
http://code.msdn.microsoft.com/sync/Release/ProjectReleases.aspx?ReleaseId=4835
and adapt the same to mysql.
如果您无法使用同步框架,还有另一种可能性 -
您是否可以在一天中分配负载,而不是在一天结束时?比如说,每次有 10 个新文档进入或完成 10 个编辑时触发同步? (如果从客户端发起同步,则可以完成此操作)。
如果您想将同步逻辑带到服务器端,您可以考虑使用消息队列在客户端需要同步时从客户端向服务器发送通知。然后服务器可以提取数据。为此,您可以使用内部服务总线或 azure appfabric/Amazon SQS 等按需平台。
Theres another possibility if you cant use sync framework -
Is it possible for you to distribute the load throughout the day, instead of end of day? Say, trigger synchronization every time 10 new documents come in or 10 edits are done? (this can be done if the synchronization is initiated from client side).
In case you want to take the sync logic to server side, you can consider using messaging queues to send notifications to server from clients, whenever client needs to synchronize. The server can then pull the data. You can use in-house service bus or on-demand platforms like azure appfabric/Amazon SQS for this.