在“上线”后调整数据库有多容易(或以其他方式)?

发布于 2024-08-25 00:50:01 字数 729 浏览 5 评论 0原文

看起来越来越像我必须在我有时间调整所有查询/表等之前,在我上线网站之前(已经比计划晚了 6 个月,所以尽管这不是理想的情况) - 事情就是这样)。

现在是不得不硬着头皮的情况了。这只是一个尝试计算出当我们“咬住它”时子弹有多大的例子。一旦数据库上线,显然我们不能随意更改数据,因为它是实时数据。我对大多数数据库模式相当有信心 - 例如,表大部分采用第三范式和第四范式,并且使用约束来确保数据完整性。我还在某些列上放入了一些索引,(我认为)将在查询中大量使用,尽管这是非常匆忙完成的并且没有经过测试 - 这是我担心的一点。

需要澄清的是,我并不是在谈论批发结构的变化。表格本身不太可能改变(如果有的话),但是几乎可以保证我必须在某个阶段调整表格(无论是亲自调整还是通过雇用某人)。

我想知道这是一项多大的任务。具体来说,假设数据库有几 GB(到目前为止大约有 300 个表)

假设 50% 的表需要在接下来的几个月内进行调整:

  1. 执行调整需要多长时间(我知道这是一个“如何long 是一段字符串”类型的问题) - 但是所需工作的主要决定因素是什么,以便我可以计算出可能需要多长时间?

  2. 是否可以在重新处理索引时锁定数据库的某些部分(或特定表),或者整个数据库是否需要离线? (我使用 mySQL 5.x 作为数据库)

  3. 我所描述的(在所有表完美调整之前上线)是否风险极高/不明智? (这是否证明到目前为止这给我带来的几个月的不眠之夜是合理的)?

It is looking increasingly like I'll have to go live before I have had the time to tweak all the queries/tables etc, before I go live with a website (already 6 months behind schedule, so all though this is not the ideal scenario - thats how things are).

Its now a case of having to bite the bullet. Its just a case of trying to work out how big that bullet will be when we come to 'biting it'. Once the databse goes live obviously we cant change the data on a whim, because its live data. I am fairly confident on the most of db schema - for e.g. the tables are in most 3 and 4th normal form, and constraints are used to ensure data integrity. I have also put in some indexes on some column that (I think) will be used a lot in queries though this was done quite hurridly and not tested - this is the bit I am worried about.

To clarify, I am not talking about wholesale structure change. The tables themselves are unlikely to change (if ever), however it is almost guaranteed that I will have to tune the tables at some stage (either personally or by hiring someone).

I want to know how much of a task this is. Specifically, assuming a database of a few gigabytes (so far roughly 300 tables)

Assuming 50% of the tables need tuning in the next few months:

  1. How long will it take to perform the tuning (I know this is a "how long is a piece of string" type question) - but what are the main determinants of the effort required, so I can work out how long it is likely to take?

  2. Is it possible to either lock sections of the database, (or specific tables) whilst the indexes are being reworked, or does the entire databse need to go offline? (I am using mySQL 5.x as the db)

  3. Is what I describe (going live before ALL tables perfectly tuned) outrageously risky/inadvisable ? (Does it justify the months of sleepless nights this has caused me so far) ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

爱你不解释 2024-09-01 00:50:02
  1. 这取决于您要调整的内容。假设您要向几个表添加索引,或者将表类型从 MyISAM 更改为 InnoDB 等,然后使用足够大的表,这些事情可以在 5 到 10 分钟内完成,具体取决于您的硬件。不需要几个小时。也就是说,最好还是在半夜进行实时数据库调整。

  2. 您可以通过调用FLUSH TABLES WITH READ LOCK来获取读锁,但最好在您的应用程序中放置一条“我们正在维护”消息,持续 15-30 分钟。为了安全起见,

  3. 风险是情况所固有的,以及如果出现严重问题会发生什么。我通常采取更牛仔的方法并现场进行工作,特别是如果它们没有处于高负载下,这样我就可以轻松找到痛点并修复它们。如果这是一个关键任务系统,那么不可以,首先进行负载测试或其他任何措施,以确保您已做好准备。另外,请记住,您无法预见将遇到的所有问题。如果您的指数良好,那么您可能可以将其投入使用,看看需要做什么。

  1. It depends on what you're tuning. Let's say you're adding an index to a couple tables, or changing a table type from MyISAM to InnoDB or something, then with a large enough table, those things could be done in 5 to 10 minutes depending on your hardware. It won't take hours. That said, it's still best to do any live-db tuning in the middle of the night.

  2. You can grab a read lock by calling FLUSH TABLES WITH READ LOCK but it's probably better to put up a "we're doing maitenance" message in your app for the 15-30 mins you're doing it, just to be safe.

  3. The risk is inherent to the situation and what happens if there are serious problems. I usually take a more cowboy approach and take stuff live, especially if they aren't under high load so I can easily find pain points and fix them. If this is a mission critical system, then no, load test or whatever you can first to be sure you're as ready as you can be. Also, keep in mind that you cannot foresee all the issues you'll have. If your indexes are good, then you're probably okay to take it live and see what needs to be worked on.

隔纱相望 2024-09-01 00:50:01

一般来说,修复在上线后导致性能问题的不良数据库设计要困难得多,因为您必须处理现有记录。更糟糕的是,糟糕的设计可能要到上线几个月后才会显现出来,当时记录很多而不是很少。这就是为什么数据库在设计时应该考虑到性能(不,这不是过早的优化,有一些已知的技术通常比其他技术表现更好,并且应该在设计中考虑它们)并且数据库应该针对一组测试记录进行测试接近或超过几年后您将拥有的预期记录水平。

至于完全修复一个设计糟糕的数据库需要多长时间,几个月或几年。通常,最糟糕的部分是设计的核心部分(例如 EAV 表),并且几乎需要每个查询/sp/视图。 UDF要进行调整以转向更好的结构。然后,您必须确保所有记录都移动到新的更好的结构中。像这样的错误越早改正越好。将几千条记录迁移到新结构比迁移 1 亿条记录要好得多。

如果你的结构没问题,但你的查询很糟糕,那么你的情况会更好,因为你可以选择前十个表现最差的(选择不仅基于总运行时间,还基于时间 X 运行次数)并修复、冲洗和重复。

如果您正在修复一个糟糕的数据库,这本书可能会派上用场:

http ://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1268158669&sr=8-1

In general it is much harder to fix a poor database design that is causing performance issues after going live becasue you have to deal with the existing records. Even worse, the poor design may not become apparent until months after going live when there are many records instead of a few. This is why databses should be designed with performance in mind (no this is not premature optimization, there are known techniques which generally perform better than other techniques and they shoulod be considered inthe design) and databases should be tested against a test set of records that is close to or more than the expected level of records you would have after a couple of years.

As to how long it will take to completely fix a badly designed database, months or years. Often the worst part is something that is central to the design (like say an EAV table) and which will require almost every query/sp/view. UDF to be adjusted to move to a better structure. You then have to ensure all records are moved to the new better structure. The sooner you can fix a mistake like this the better. Far better to move a couple of thousand records to a new structure than 100,000,000.

If your structure is ok but your queries are bad, you are better off as you can take the top ten worst performing (Choose based not just on total time to run but time X no of times run) and fix, rinse and repeat.

If you are in the midst of fixing a poor database, this book might come in handy:

http://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1268158669&sr=8-1

世界等同你 2024-09-01 00:50:01

我会尝试至少在上线之前量化数据库的限制,以便至少您知道应用程序生成的活动何时接近该阈值。

您可能希望从应用程序模拟(尽可能自动地)数据库的典型使用情况,并检查它在中断之前可以处理多少并发用户/会话/事务等。这至少应该可以让您解决“不眠之夜”的问题。

至于原来的“这有多容易……?”问题的答案显然取决于很多因素。然而,上述分析无疑会有所帮助,因为至少您将能够判断您的数据库是否需要调整。

I would try at least to quantify the limits of the database before going live, so that at least you would know when the activity generated from your application is getting near to that threshold.

You may want to simulate (automatically as much as possible) the typical usage of the database from your application, and check how many concurrent users/sessions/transactions, etc it can handle before it breaks. This, at least, should allow you to solve the "sleepless nights" issue.

As for the original "How easy is it...?" question, the answer obviously depends on many factors. However, the above analysis would undoubtedly help, as at the very least you will be in a position to say whether your database requires tweaking or not.

凉世弥音 2024-09-01 00:50:01

为了回答标题问题,我想说在部署到生产环境后调整数据库相当容易。

在部署到任何环境后提高性能是一个好主意。生产工作和日程安排都增加了一些压力。我建议部署到 Prod,并让它按预期执行。然后开始测量:

  • 在不同时间(高峰时段与下班时段,如果您的应用程序中有这样的概念)运行 Report X 需要多长时间。
  • 用户在这些关键用例中使用该应用程序时的体验如何?

然后备份您的生产环境,并为自己创建一个预生产环境。在那里,您将能够运行升级脚本,以便能够衡量您遇到的“多长时间”类型的问题。索引创建、升级停机时间等。在调整查询等时,您将非常清楚它如何处理生产数据和数据。卷。当然,您不会因为让这些用户同时执行这些插入而受益。

保留多次迭代、失败的升级、新的/未准备好的问题等的备份。

在每次部署后继续进行备份,以便您可以测试数据库的下一轮改进。

To answer the title question, I'd say it's fairly easy to tune your DB after deploying into Production.

It's a great idea to be improving performance after deploying to any environment. Being Production adds a bit of pressure, along with the schedule. I'd suggest deploying to Prod, and let it perform as it will. Then start measuring:

  • how long to run Report X in different times (peak vs after-hours, if there is such a concept in your app).
  • what's the user's experience when using the app for those critical use-cases?

Then take a backup of your Prod environment, and create yourself a pre-Prod environment. There you'll be able to run your upgrade scripts to be able to measure the 'how long' type questions you have. Index creation, upgrade down-times, etc. When tuning queries, etc, you'll have a great idea of how it performs with production data & volumes. Granted, you won't have the benefits of having those users performing those inserts at the same time.

Keep that backup for multiple iterations, failed upgrades, new/unprepared-for issues, etc.

Keep making backups after each deployment, so that you can test the next round of improvements to your DB.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文