使用innodb时的表模式设计

发布于 2024-09-26 16:23:39 字数 2052 浏览 2 评论 0原文

我在为我们的系统设计表模式时遇到了问题。

情况是这样的:

  1. 我们的系统有很多项目(超过 2000 万个),每个项目都有一个唯一的 id,但是每个项目可能有很多记录。例如,对于 id 1 的项目,大约有 5000 条记录,每条记录有 20 多个属性。需要通过其 id 和一个或多个属性的状态来标识,以便在 selectupdatedelete 中使用。

  2. 我想使用innodb

但问题是使用innodb时,必须要有簇索引。 由于上述情况看来必须要找到一个簇索引所以我只能使用一个auto_increment int作为key

目前的设计如下:

create table record (
item_key int(10) unsigned NOT NULL AUTO_INCREMENT,
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_key`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1

sql语句:

select * from records 
  where item_id=1 and attribute_1='a1' and attribute_2 between 10 and 1000;

update 和 delete 语句类似。

我不认为这是一个好的设计,但我也想不出其他的了;欢迎所有建议。


抱歉,如果我没有把问题说清楚。

  1. 我想要访问(选择、更新、删除、插入)的是记录,而不是项目。 项目有自己的属性,但在上面的描述中,我提到的属性属于记录。

  2. 每个项目可以有很多记录,例如项目 1 大约有 5000 条记录。

  3. 每条记录有42个属性,其中一些可以为NULL,每条记录都有一个唯一的id,这个id在不同的项目中是唯一的,但这个id是一个字符串而不是数字

  4. 我想以这种方式访问​​记录:

    A.我只会在一次或一次查询中获取(或更新或删除)属于一项特定项目的记录

    B.我将获取或更新查询中所有属性或某些特定属性的值

    C.查询条件中的属性可能与我想要的属性不一样。

所以可能会有这样的 SQL 语句:

Select attribute_1, attribute_N from record_table_1 where item_id=1 and attribute_K='some value' and attribute_M between 10 and 100

而我认为原来的设计不好的原因是:

  1. 我不能选择一个属性或记录 id 作为主键,因为它没有用,在每次查询,我都必须指定项目id和一些属性作为查询条件(例如“where item_id=1 and attribute_1='value1' and attribte_2 between 2 and 3),所以我只能使用auto_increment int数字作为主要key。这样做的结果是每个查询都必须扫描两个b树,并且看起来辅助索引的扫描无效。

  2. 复合键似乎也没用,因为查询条件可能会因许多属性而异。

  3. 按照原来的设计,好像我加了很多索引来满足不同的查询,不然就得处理全表扫描的问题,但是很明显索引太多不利于update、delete ,插入操作。

I have encountered a problem when designing the table schema for our system.

Here is the situation:

  1. our system has a lot of items ( more than 20 millions ), each item has an unique id, but for each item there can be lots of records. For example for the item with id 1 there are about 5000 records and each record has more than 20 attributes. The needs to be identified by its id and status of one or more of its attributes for use in select, update or delete.

  2. I want to use innodb

But the problem is when using innodb, there must be an cluster index.
Due to the situation described above it seems had to find a cluster index so I can only use an auto_increment int as the key

The current design is as follows:

create table record (
item_key int(10) unsigned NOT NULL AUTO_INCREMENT,
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_key`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1

the sql statement:

select * from records 
  where item_id=1 and attribute_1='a1' and attribute_2 between 10 and 1000;

the update and delete statement are similar.

I don't think this a good design, but I can't think of anything else; all suggestions welcome.


Sorry if I didn't make the question clear.

  1. What I want to access ( select, update, delete, insert) is the records, not the items.
    The items have there own attributes, but in the descriptions above, the attributes that I mentioned are belongs to the records.

  2. Every item can have many records, like item 1 have about 5000 records.

  3. Every record have 42 attributes, some of them can be NULL, every record has an unique id, this id is unique among different items, but this id is an string not an number

  4. I want to access the records in this way:

    A. I will only get(or update or delete) the records that belongs to one specific item at on time or in one query

    B. I will get or update the values of all attributes or some specific attributes in the query

    C. The attributes that in the condition of the query may not the same as the attributes that I want.

So there could be some SQL statements like:

Select attribute_1, attribute_N from record_table_1 where item_id=1 and attribute_K='some value' and attribute_M between 10 and 100

And the reasons that why I think the original design is not good are:

  1. I can't choose an attribute or the record id as the primary key, because it is no use, in every query, I have to assign the item id and some attributes as the query condition ( like "where item_id=1 and attribute_1='value1' and attribte_2 between 2 and 3), so I can only use an auto_increment int number as the primary key. The result of this is that each query have to scan two b-trees, and it look like that scan of the secondary index is not effective.

  2. Also compound keys seems useless, because the condition of the query could vary among many attributes.

  3. With the original design, it seems that I have add a lot of indexes to satisfy different queries, otherwise I have to deal with the full table scan problem, but it is obviously that too many indexes is not good for update, delete, insert operations.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

温柔一刀 2024-10-03 16:23:39

如果您想要集群索引并且不想使用 myisam 引擎,那么听起来您应该使用两个表:一个用于项目的唯一属性,另一个用于项目的每个实例(具有指定的属性)。

If you want a cluster index and don't want to use the myisam engine, it sounds like you should use two tables: one for the unique properties of the items and the other for each instance of the item (with the specified attributes).

迷离° 2024-10-03 16:23:39

你是对的,架构是错误的。将属性 1..20 作为表中的字段并不是执行此操作的方法,您需要一个单独的表来存储此信息。该表将包含此记录中的 item_key 以及它自己的 keyvalue,因此第二个表将具有允许更好的索引寻找。

类似于以下内容:

simple database diagram

查看该图,很明显出现问题,因为 record< /code> 表太空了,它对我来说看起来不正确,所以也许我在原来的问题中遗漏了一些东西......

复合键

我想也许你正在寻找复合键而不是聚集索引,这是不同的事情。您可以通过以下方式实现这一目标:

create table record (
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_id`,`attribute_1`,`attribute_2`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1

You're right the schema is wrong. Having the attribute 1..20 as fields within the table is not the way to do this, you need a separate table to store this information. This table would have the item_key from this record together with its own key and a value and therefore this second table would have indexes that allow much better searching.

Something like the following:

simple database diagram

Looking at the diagram it is obvious that something is wrong because the record table is too empty, it doesn't look right to me so maybe I'm missing something in the original question....

Compound Keys

I think maybe you are looking to have compound key rather than a clustered index which is a different thing. You can achieve this by:

create table record (
item_id int(10) unsigned NOT NULL,
attribute_1 char(32) NOT NULL,
attribute_2 int(10) unsigned NOT NULL,
.
.
.
.
.
attribute_20 int(10) unsigned NOT NULL,
PRIMARY KEY (`item_id`,`attribute_1`,`attribute_2`),
KEY `iattribute_1` (`item_id`,`attribute_1`),
KEY `iattribute_2` (`item_id`,`attribute_2`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=latin1
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文