该数据模型的性能优势?
我有一个 MySQL(innodb) 表“items”,具有以下特征:
- 大量行,并且不断增加。
- 大量各种数据类型的列,包括“文本”;
- 主键“item_id”存在。
还有如下额外需求:
- 需要根据状态查询项目
- 需要更新状态
以上两种操作发生的频率相当高。
鉴于上述情况,我有两个问题:
- 制作一个包含两列(即
item_id
和status
)并以item_id
作为主键的单独表可以提高性能吗? - 如果上述情况属实,我将如何处理基于状态的 item_ids 查询?
我在处理数据库方面缺乏经验。我希望你能容忍我:)
I have a MySQL(innodb) table 'items' with the following characteristics
- Large number of rows, and keeps on increasing.
- Large number of columns of various data-types including 'text';
- primary key 'item_id' is present.
There are additional requirements as follows:
- Need to query items based on their status
- Need to update status
The above two operations happen quite frequently.
Given the above scenario I have two questions
- Would making a separate table with two columns namely
item_id
andstatus
withitem_id
as primary key provide increased performance? - If the above is true, how am I going to tackle querying item_ids based on status?
I am inexperienced in handling databases. I hope you will bear with me :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这称为垂直分割。当数据实体具有多种访问模式,以不同的频率访问实体属性(表列)的不同子集时,通常会使用它。如果一个功能需要每秒访问一到两列 100 次,而另一个应用程序功能需要访问所有其他列,但每天只访问一次或两次,那么这种方法是有道理的,并将获得显着的性能改进。
基本上,正如您所建议的,您将表“拆分”为两个表,两个表都具有相同的键,并且具有一对一的 FK/PK->PK 关系。在一个表中,您仅放置访问频率较高的几列,而将其余的列放置在另一个表中访问频率较低的列中。然后,您可以根据每个表的实际访问模式分别更适当地对每个表应用索引。
This is called vertical segmentation. It is often used when a data entity has multiple access patterns which access different subsets of the entities attributes (table columns), with different frequencies. If one function needs access to only one or two columns 100s of times per second, and another application function needs access to all the other columns, but only once or twice a day, then this approach is warrented, and will garner substantial perfomance improvement.
Basically, as you suggested, you "split" the table into two tables, both with the same key, with a one-to-one FK/PK->PK relationship. In one table you put only those few columns that are accessed more frequently, and you put the rest of the columns in the other table that will be accessed less frequently. You can then apply indexing to each table more appropriately based on the actual access pattern for each table separately.
如果您需要获取的唯一列,则在您的状态和 item_id 上创建索引会更有意义。
然后,您可以查询将使用此索引的结果:
请记住,如果您没有许多不同的状态,您的查询最终可能会返回大量行,并且速度可能会很慢。如果您可以受到更具“选择性”的列(例如日期时间)的限制,那就更好了。
Would make more sense to create an index on your status and your item_id if its the only columns you need to fetch.
You can then query your result that will use this index:
Keep in mind that if you don't have many different statuses your query may ends up returning a lot of row and could be slow. If you can be constrained by a more 'selective' column like a datetime it would be better.
首先回答第 2 部分,您将对两个表进行内部联接:
不过,为了回答第 1 部分,我认为这样做不会为您带来任何性能优势。
Answering part 2 first, you'd do an inner join of your two tables:
To answer part 1, though, I don't think doing this would gain you any performance advantage.