对于长位串来说,最有效的 MySQL 数据类型是什么?
我需要将很长的位串存储到 MySQL 表中,该位串可能长达 32768 位。任何时候都不需要对这些数据进行索引或全文搜索。如果我没看错的话,这个大小应该完全在我的 max_packet_size 以及行大小限制 @ 65k 之内。
理想情况下,我想以 0b 格式存储字符串(并插入它们),但这不是必需的...任何能够在磁盘上提供基本上 1:1 数据/大小的东西都很棒。
BLOB 似乎做得不够好,因为仅由 1 和 0 组成的字符串 ('010101010101') 看起来与普通文本没有什么不同,并且花费了我 L 个字节 + 2。 BIT() 是完美的,但它是有限的最大长度仅为 64 位。
虽然大部分数据 (90%+) 可以在无符号 Bigint 中充分表示,但剩余 10% 的行吸引我去寻找一个比逻辑拆分它们更优雅的解决方案(即,如果在主表中找不到,则搜索辅助表)首先,辅助表对剩余 10% 的行使用 BLOB,等等)。
一个额外的好处是任何允许按位运算的类型,但如果不允许,这也可以在 MySQL 服务器之外轻松完成。
为此目的最有效的数据类型是什么?
I have a need to store into a MySQL table lengthy bit-strings which could be as long as 32768 bits. This data need will not need to be indexed or full-text searched at any time. If I have read correctly, this size should be well within both my max_packet_size as well as the row-size limit @ 65k.
Ideally I would like to store the strings (and INSERT them) in 0b format, but this is not a requirement...anything that will give me essentially 1:1 data/size on disk would be great.
BLOBs do not seem to do the job well enough, as a string comprised of only ones and zeroes ('010101010101') is seen no different than normal text and costs me L bytes + 2. BIT() would be perfect, but is limited only to 64 bits max length.
Though much of the data (90%+) would be sufficiently represented within an unsigned Bigint, the remaining 10% of rows entice me to find a more elegant solution than splitting them up logically (i.e., searching a secondary table if not found in the first, secondary table using BLOBs for remaining 10% rows, etc.).
An added bonus would be any type that permits bitwise operations, but if not, this is just as easily done outside the MySQL server.
What is the most efficient data type for this purpose?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想说这主要取决于您的访问模式。如果您有能力在每次访问时读/写整个位串,那么 varbinary(4096) 将工作得很好并且非常紧凑(整个字段只有 2 个字节的开销)。在此模型中,应用程序端的一位实际上由数据存储中的一位表示,并且由客户端应用程序将其解释为位串(执行按位运算等...)
。再多一点,您可以想象一个带有 bigint 和 varbinary(4096) 的表:
对于给定记录,只有两个字段之一不为空。如果bit1不为空,那么它可以存储64位。对于较大的位串,bit1 为空,而使用 bit2。客户端应用程序必须足够智能,能够处理所有按位运算(特别注意 bit1 的有符号/无符号问题)。
I would say it mainly depends on your access pattern. If you can afford to read/write the whole bitstring at each access, then a varbinary(4096) will work fine and be quite compact (only 2 bytes of overhead for the whole field). In this model, one bit on application side is really represented by one bit in the data storage, and it is up to the client application to interpret it as a bitstring (performing bitwise operations, etc ...)
If you want to optimize a bit more, you can imagine a table with a bigint and a varbinary(4096):
Only one of the two fields is not null for a given record. If bit1 is not null, then it can store 64 bits. For larger bitstrings, bit1 is null, and bit2 is used instead. The client application has to be smart enough to handle all bitwise operations (paying special attention to signed/unsigned issues with bit1).
我想 BLOB 类型就是您所需要的。它可以表示最多 2^16字节 的二进制字符串,并且每个记录的开销为 2字节(如果 L 是以字节为单位的长度)值 L+2 字节 是其在磁盘上的大小)。
然后,如果你确实想优化,请使用两个表,一个带有 BLOB,另一个带有 TINYBLOB(字符串最多 2^8 字节,1 字节开销),然后在 VIEW 中或在 SELECT 期间将它们合并在一起。
如果您想进一步优化,请使用带有 BIGINT 的第三个表(这将允许存储最多 58 位的二进制字符串,因为需要剩余的 6 位来存储二进制字符串的长度)。
I guess the BLOB type is what you need. It can represent binary strings up to 2^16 bytes and has an overhead of 2 bytes per record (if L is the length in bytes of the value, L+2 bytes is its size on disk).
Then, if you really want to optimize, use two tables, one with BLOB and the other with TINYBLOB (strings up to 2^8 bytes, 1 byte overhead), then UNION them together in a VIEW or during SELECT.
If you want to optimize even more, use a third table with BIGINT (this will allow storing binary strings up to 58 bits, since the remaining 6 will be needed to store the length of the binary string).