如何在实时数据库上运行大更新?
我正在编写的一个 Web 项目使用复杂的 CSV 到 MySQL 转换器来创建数据库。这意味着,要使用 CSV 中的最新更改更新数据库内容,需要运行一个转换器,该转换器会截断相关表(但保留通过网站填充的其他表),并使用 CSV 中的数据再次填充它们。
是的,这不是一个很好的过程,但是选择这种方法而不是标准的“处理实际数据库”方法是有充分理由的。
我正在努力解决的是找出运行此更新过程的最佳方法,而不会对用户体验造成太大影响。需要记住的几个数字:
1)此过程必须定期运行,范围是每隔几周/每月一次
2) 数据库转换器目前大约需要一个小时,并且将来可能需要长达 15 个小时,至少如果对数据库增长的预测是正确的话(是的,哎呀!)
3) 完整数据库的 sql 转储目前低于 20MB(可以通过 phpmyadmin 轻松导入),但很快就会突破这一障碍。我想这应该不是问题,因为我可以使用 SSH 上传。
以下是我想到的一些替代方案,所有这些都使用具有全局设置的单独数据库(这些设置会针对站点上的每次读/写进行检查)。替代方案 2 似乎是最糟糕的,因为它在整个转换过程中阻止读取访问,正如我所说,这可能会很长。它们都阻止写入访问大约相同的时间,这很好,但它不会阻止用户注册或类似的任何关键操作。我对第三种替代方案的可行性很好奇,因为从理论上讲,它可以缩短读取功能的停机时间,因为我不必上传大转储。
有人做过这样的事吗?如果有更好的替代方案,或者有任何有关如何改进这些方案以及是否选择 1 或 3 的反馈,我将不胜感激。提前致谢:)
替代方案 1
1)将globalsettings_booleans_writeable设置为0
2)下载当前数据库(SQL转储)
3)将下载的DB导入本地
4) 在本地数据库上运行转换器(以更新模式)
5) 导出本地数据库
6)将globalsettings_booleans_read设置为0
7) 在线导入导出的数据库
8) 将 globalsettings_booleans_read 设置为 1
9) 将 globalsettings_booleans_writeable 设置为 1
替代方案 2
1)将globalsettings_booleans_writeable设置为0
2)将globalsettings_booleans_read设置为0
3)在实时数据库上运行转换器(以更新模式)
4)将globalsettings_booleans_read设置为1
5)将globalsettings_booleans_writeable设置为1
替代方案3
1)将globalsettings_booleans_writeable设置为0
2) 创建数据库的远程副本
3) 在远程复制上运行转换器(以更新模式)
4)将globalsettings_booleans_read设置为0
5)用远程复制替换远程原始文件(容易吗?)
6)将globalsettings_booleans_read设置为1
7)将globalsettings_booleans_writeable设置为1
A web project I'm coding for uses a complex CSV to MySQL converter to create their database. What this means is that to update the db content with the latest changes from the CSVs, a converter is run which truncates the relevant tables (but leaves others that are filled through the website) and fills them again with the data from the CSVs.
Yes, this is not a great process but there are good reasons why this approach was chosen over a standard "working on the actual db" approach.
What I'm struggling with is working out the best way to run this update process without hurting the user experience too much. A few numbers to keep in mind:
1) This process has to be run regularly, in the range of every few weeks / once a month
2) The db converter currently takes about an hour and will probably take up to 15 hours in the future, at least if the predictions about the database growth are right (yes, ouch!)
3) The sql dump of the full database is currently under 20MB (which allows easy import through phpmyadmin) but will break that barrier soon enough. I guess this shouldn't be a problem as I can just use SSH upload instead.
Here are some of the alternatives I thought of, all using a separate database with global settings as well (these settings are checked for every read/write on the site). Alternative 2 seems to be the worst as it prevents read access for the whole time of the conversion, which can be quite long as I said. All of them block write access for about the same long time which is fine though, it does not prevent users from signing up or anything critical like that. I'm quite curious about the feasibility of the third alternative as it theoratically allows to shorted downtime of the read functionality as I don't have to upload a big dump.
Has anyone done something like this? I'd appreciate superior alternatives if they're out there or any feedback on how to improve these and whether to chose 1 or 3. Thanks in advance :)
Alternative 1
1) Set globalsettings_booleans_writeable to 0
2) Download current DB (SQL dump)
3) Import downloaded DB locally
4) Run converter (in update mode) on local database
5) Export local DB
6) Set globalsettings_booleans_readable to 0
7) Import exported DB online
8) Set globalsettings_booleans_readable to 1
9) Set globalsettings_booleans_writeable to 1
Alternative 2
1) Set globalsettings_booleans_writeable to 0
2) Set globalsettings_booleans_readable to 0
3) Run converter (in update mode) on live database
4) Set globalsettings_booleans_readable to 1
5) Set globalsettings_booleans_writeable to 1
Alternative 3
1) Set globalsettings_booleans_writeable to 0
2) Create a remote copy of the database
3) Run converter (in update mode) on remote copy
4) Set globalsettings_booleans_readable to 0
5) Replace remote original with remote copy (easy?)
6) Set globalsettings_booleans_readable to 1
7) Set globalsettings_booleans_writeable to 1
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在我看来,通过检查 CSV 以查看哪些记录实际上会导致数据库发生更改,可以避免很多排他性。看起来 CSV 生成器才是数据的实际来源,而数据库只是它的镜像,对吧?
如果是这样,则可以忽略不会导致任何更改的 CSV 记录,不会截断 d/b 表,并且可以使用替代方案 2 运行剩余的 CSV 记录,这大概只需要几分钟。
这种方法的主要弱点是,如果记录在源头被删除,并且没有迹象表明 d/b 需要在本地删除它们。
It seems to me that a lot of exclusivity could be avoided by examining the CSV to see what records actually would result in a change to the database. It seems like the CSV generator is the actual source of the data, and the database is simply a mirror of it, right?
If so, the CSV records which result in no change could be ignored, the d/b tables not truncated, and the remaining CSV records could be run using Alternative 2, which presumably would take only a few minutes.
The major weakness of this approach is if records are deleted at the source and there's no indication that the d/b needs to have them deleted locally.