如何通过软删除项目来解决此缩放问题?
我有一个数据库,其中大多数表都有表的删除标志。 因此,系统会软删除项目(因此,除非管理员,否则它们将不再可访问)。
令我担心的是,几年后,当表变得更大时,系统的整体速度将会降低。
我能做些什么来抵消这样的影响。
- 我是否对删除字段建立索引?
- 我是否将已删除的数据移至相同的删除表并在取消删除时移回?
- 随着时间的推移,我是否会将数据分散到几台 MySQL 服务器上? (基于成长)
我很感激任何和所有的建议或故事。
更新:
所以分区似乎是关键。 但是分区不会只是创建两个“表”,一张包含已删除的项目,另一张没有删除的项目。
因此,随着时间的推移,删除的分区会变得很大,并且偶尔从中获取数据会很慢(并且随着时间的推移会变慢)
速度差异是我应该担心的吗? 由于我通过某个键值获取大多数(如果不是全部)数据(有些是搜索,但对于此设置来说它们可能很慢)
I have a database where most tables have a delete flag for the tables. So the system soft deletes items (so they are no longer accessible unless by admins for example)
What worries me is in a few years, when the tables are much larger, is that the overall speed of the system is going to be reduced.
What can I do to counteract effects like that.
- Do I index the delete field?
- Do I move the deleted data to an identical delete table and back when undeleted?
- Do I spread out the data over a few MySQL servers over time? (based on growth)
I'd appreciate any and all suggestions or stories.
UPDATE:
So partitioning seems to be the key to this. But wouldn't partitioning just create two "tables", one with the deleted items and one without the deleted items.
So over time the deleted partition will grow large and the occasional fetches from it will be slow (and slower over time)
Would the speed difference be something I should worry about? Since I fetch most (if not all) data by some key value (some are searches but they can be slow for this setup)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我将在
DELETE
标志上对表进行分区。删除的行将物理地保存在其他位置,但从
SQL
的角度来看,表保持不变。I'd partition the table on the
DELETE
flag.The deleted rows will be physically kept in other place, but from
SQL
's point of view the table remains the same.哦,是的,索引删除字段。 你会一直查询它,对吧? 与您经常查询的其他字段(例如父 ID)复合索引也可能是一个好主意。
Oh, hell yes, index the delete field. You're going to be querying against it all the time, right? Compound indexes with other fields you query against a lot, like parent IDs, might also be a good idea.
可以说,当且仅当性能问题确实出现时,才可以稍后做出此决定。 这在很大程度上取决于以什么速率添加多少行、您的盒子规格等。显然,应用程序中的抽象级别(以及您正在使用的任何库的限制)将有助于确定这种更改的难度。
如果它成为问题,或者您确定它会成为问题,请首先在两个表之间的已删除标志上进行分区,一个保存当前数据,另一个保存历史/已删除数据。 如果正如您所说,“已删除”的数据仅可供管理员使用,则可以合理地假设(在大多数应用程序中)用户总数(此处仅限于管理员)不足以引起问题。 这意味着您的管理员在搜索该特定表时可能需要等待一段时间,但您的用户群(在大多数应用程序中可能更重要)将体验到更少的延迟。 如果性能对于管理员来说变得不可接受,您可能需要对访问已删除记录的 user_id (或 transaction_id 或其他)字段进行索引(我通常对访问表的每个字段进行索引,但在一定规模下可能会存在交易) -关于哪些索引最有价值的offs)。
根据数据访问方式的不同,您还可以使用其他简单的技巧。 如果管理员大多数时间都在查找特定记录(而不是读取用户活动的“历史记录”或“日志”),则通常可以假设较新的记录会比旧记录更频繁地被查看记录。 某些数据库包含调整选项,使最近的记录比旧记录更容易查找,但您必须针对特定数据库进行查找。 如果失败,您可以手动执行此操作。 最简单的方法是拥有一个cient_history 表,其中包含早于n 天、周或月的所有记录,具体取决于您的限制和可疑的使用模式。 然后,较新的数据存储在一个小得多的表中。 即使管理员要“浏览”所有记录而不是搜索特定记录,您也可以从显示前 n 天开始,并提供一个链接来查看所有天数(如果他们找不到什么内容)他们正在寻找(例如,大多数在线银行应用程序允许您浏览交易,但仅显示前 30 天的历史记录,除非您另有要求。)
希望您可以避免更进一步,并在 user_id 或某些此类方案上进行分片。 根据应用程序其余部分的规模,您可能无论如何都必须这样做。 除非您确信需要这样做,否则我强烈建议首先使用垂直分区(例如,将 forum_posts 保存在与 sales_records 不同的计算机上),因为它更容易设置和维护。 如果您最终需要对 user_id 进行分片,我建议使用 google ;-]
祝您好运。 顺便说一句,我不是 DBA,所以对此持保留态度。
Arguably, this decision could be made later if and only if performance problems actually appear. It very much depends on how many rows are added at what rate, your box specs, etc. Obviously, the level of abstraction in your application (and the limitations of any libraries you are using) will help determine how difficult such a change will be.
If it becomes a problem, or you are certain that it will be, start by partitioning on the deleted flag between two tables, one that holds current data and one that holds historical/deleted data. IF, as you said, the "deleted" data will only be available to administrators, it is reasonable to suppose that (in most applications) the total number of users (here limited only to admins) will not be sufficient to cause a problem. This means that your admins might need to wait a little while longer when searching that particular table, but your user base (arguably more important in most applications) will experience far less latency. If performance becomes unacceptable for the admins, you will likely want to index the user_id (or transaction_id or whatever) field you access the deleted records by (I generally index every field by which I access the table, but at certain scale there can be trade-offs regarding which indexes are most worthwhile).
Depending on how the data is accessed, there are other simple tricks you can employ. If the admin is looking for a specific record most of the time (as opposed to, say, reading a "history" or "log" of user activity), one can often assume that more recent records will be looked at more often than old records. Some DBs include tuning options for making recent records easier to find than older records, but you'll have to look it up for your particular database. Failing that, you can manually do it. The easiest way would be to have an ancient_history table that contains all records older than n days, weeks or months, depending on your constraints and suspected usage patterns. Newer data then lives inside a much smaller table. Even if the admin is going to "browse" all the records rather than searching for a specific one, you can start by showing the first n days and have a link to see all days should they not find what they are looking for (eg, most online banking applications that lets you browse transactions but shows only the first 30 days of history unless you request otherwise.)
Hopefully you can avoid having to go a step further, and sharding on user_id or some such scheme. Depending on the scale of the rest of your application, you might have to do this anyway. Unless you are positive that you will need to, I strongly suggest using vertical partitioning first (eg, keeping your forum_posts on a separate machine than your sales_records), as it is FAR easier to setup and maintain. If you end up needing to shard on user_id, I suggest using google ;-]
Good luck. BTW, I'm not a DBA so take this with a grain of salt.