mysql 7列 pk 与 1 列 md5 唯一约束
我有一个非常大的表,目前大约有 7000 万行,并且每天都在以数千行的速度增长,这个模式现在每天都会发生变化,所以我正在转向分区表并重新设计 ddl。
该表基本上是 NOT NULL INTEGERS 的集合(有些中等,有些 INT,有些很小) 需要对一组 7 列(表中的列更多)有一个唯一约束,这对于每次插入的计算非常昂贵,并且会进一步增加索引文件大小,因为我从不通过它检索,我更愿意删除它并以某种方式 md5/也许简单地连接值...还不知道。
问题是唯一可以容纳如此大的唯一数字的列类型是 varchar 我怀疑这个 PK 是否实际上会更好? 同样,因为我将有一个主键“part_key”(site_id,id),所以我必须 在分区设计中采取独特的约束,总结一下...... 我确信这不是一个新问题,但我无法找到任何比较两者的基准/文档,有人对这个问题有任何经验吗? 问题是,当我从来没有通过 pk 或只是唯一字段的哈希值检索时,PK 是否真的应该是整个 8 个字段(请记住,该表可能有超过 100M 行) PS:检索主要由7列中的两列完成 磁盘大小不是问题 谢谢 。
i have a very large table which is currently approx 70M rows and growing daily by the thousands , this schema is tipping over every day now so i'm moving to a partitioned table and redesigning the ddl .
the table is basicly a collection of NOT NULL INTEGERS(some medium some INT some tiny)
which need to have a unique constraint for a set of 7 columns (the are more columns in the table) this is very expensive to compute per insert and increases the index file size very much further more since i never retrieve by it i would prefer to drop it and somehow md5/maybe simple concat the values... don't know yet .
the problem is that the only column type that can hold such a large unique number is a varchar i'm questioning whether or not this PK will actually be better ?
allso since i will have a PRIMARY KEY 'part_key' (site_id,id) i will have to
take the unique constraint in the designing of the partition , to summarize ...
i'm sure this is not a new problem but i wasn't able to find any benchmarks/documents comparing the two , does anyone have any experience with this problem ?
the question is realy should the PK be the whole 8 fields (keep in mind this table will probably have more then 100M rows) when i'm not ever retrieving by the pk or just a hashed value of the unique fields
P.S : retrieving is mainly done by two out of the 7 columns
Disk size is not an issue
thanks .
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在 mysql 进行分区修剪之前,我建议(gulp)对表进行非规范化以伪造分区。做一些类似对第一个值取模 32 并制作 32 个表的事情。
更新:显然mysql 5.1.6及更高版本确实支持修剪(http://dev.mysql.com/doc/refman/5.1/en/partitioning-pruning.html)所以我更强烈的建议是升级,然后允许mysql处理分区您,可能使用 7 列之一的哈希值。
until mysql gets partition pruning, i suggest (gulp) denormalizing your tables to fake partitioning. do something like take the modulo 32 of your first value and make 32 tables.
update: apparently mysql 5.1.6 and later do support pruning (http://dev.mysql.com/doc/refman/5.1/en/partitioning-pruning.html) so my stronger advice is to upgrade, then allow mysql to handle the partitioning for you, possibly using a hash value of one of your 7 columns.
如果您可以找到与您的记录查找相匹配的良好散列,那么对每个分区应用唯一约束应该不是什么大问题。较小的分区大小将使您的独特约束更便宜。 (如果我错了,我确信这里有人会教育我)。
我被困在 MySQL 5.0 上。我面临着手动分区几个超过 40M 行的表。我有一个可以在应用程序中进行哈希处理的文档 ID:
floor(docID/10)%100
。这可以给我 100 个分区,这应该可以显着减小我的索引大小。我对表进行了查询,并通过哈希计算了行数:幸运的是,我在第一次尝试时发现了一个非常均匀的分布。你自己的公式会不同,我不知道你的分布会是什么样的。您是否担心您的独特约束在分区时无法成立?
如果您可以利用 MySQL 分区,它将更强大并且对您的应用程序的影响更小。
If you can find a good hash that matches your record lookup, then applying your unique constraint on each partition shouldn't be that big of a deal. Smaller partition sizes will make your unique constraint less expensive. (If I'm wrong, someone here will school me I'm sure).
I'm stuck on MySQL 5.0. I'm facing manual partitioning a few tables over 40M rows. I have a document id that I can hash in my application:
floor(docID/10)%100
. This can give me 100 partitions and that should keep my index size down significantly. I did a query on the table and counted up the number of rows by hash:Luckily, I found a very even distribution on my first try. Your own formula will be different, I have no idea what your distribution would be like. Are you concerned that your unique constraint will not hold up in the face of partitioning?
If you can take advantage of MySQL partitioning, it will be more powerful and less of an impact on your application.