如何为列[innodb 特定]选择优化的数据类型?

发布于 2024-09-10 09:50:23 字数 696 浏览 4 评论 0原文

我正在学习数据库数据类型的用法。

例如:

  • 电子邮件哪个更好? varchar[100]、char[100] 或tinyint(开玩笑)
  • 哪个对用户名更好?我应该使用 int、bigint 还是 varchar? 解释。我的一些朋友说,如果我们使用 int、bigint 或其他数字数据类型会更好(facebook 就是这样做的)。就像 u=123400023 指的是用户 123400023,而不是 user=thenameoftheuser。因为获取数字所需的时间更少。
  • 电话号码哪个更好?帖子(例如博客或公告中)?或者也许是日期(我使用日期时间)?也许有些人做了研究并想分享。
  • 产品价格(我用的是decimal(11,2),不知道你们怎么样)?
  • 或者您想到的任何其他内容,例如“我使用 blablabla 的串行数据类型”。

为什么我要专门提到innodb呢?

除非你使用的是InnoDB表 类型(参见第 11 章“高级 MySQL,”了解更多信息),CHAR 列的访问速度比 VARCHAR。

Inno db 有一些我不知道的差异。 我是从此处读到的。

I'm learning about the usage of datatypes for databases.

For example:

  • Which is better for email? varchar[100], char[100], or tinyint (joking)
  • Which is better for username? should I use int, bigint, or varchar?
    Explain. Some of my friends say that if we use int, bigint, or another numeric datatype it will be better (facebook does it). Like u=123400023 refers to user 123400023, rather then user=thenameoftheuser. Since numbers take less time to fetch.
  • Which is better for phone numbers? Posts (like in blogs or announcments)? Or maybe dates (I use datetime for that)? maybe some have make research that would like to share.
  • Product price (I use decimal(11,2), don't know about you guys)?
  • Or anything else that you have in mind, like, "I use serial datatype for blablabla".

Why do I mention innodb specifically?

Unless you are using the InnoDB table
types (see Chapter 11, "Advanced
MySQL," for more information), CHAR
columns are faster to access than
VARCHAR.

Inno db has some diffrence that I don't know.
I read that from here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

苦笑流年记忆 2024-09-17 09:50:23

简要摘要:

(只是我的意见)

  1. 电子邮件地址 - VARCHAR(255)
  2. 用户名 - VARCHAR(100)VARCHAR(255)
  3. 表示 id_username - 使用 INT(除非您计划在系统中拥有超过 20 亿用户)
  4. 电话号码 - INTVARCHARCHAR(取决于您是否要存储格式)
  5. 帖子 - TEXT
  6. 日期 - DATE 或 < code>DATETIME (绝对包括帖子或电子邮件等内容的时间)
  7. 金钱 - DECIMAL(11,2)
  8. 其他 - 参见下文

至于使用 InnoDB,因为 VARCHAR 应该更快,我不会担心这一点,或者一般的速度。使用 InnoDB 因为您需要执行事务和/或想要使用外键约束 (FK) 来保证数据完整性。此外,InnoDB 使用行级锁定,而 MyISAM 仅使用表级锁定。因此,InnoDB 可以比 MyISAM 更好地处理更高级别的并发。使用 MyISAM 可以使用全文索引并减少一些开销。

对于速度而言,比引擎类型更重要的是:将索引放在需要快速搜索的列上。始终在 ID/PK 列上放置索引,例如我提到的 id_username。

更多详细信息:

这里有一堆有关 MySQL 数据类型和数据库设计的问题(警告,比您要求的要多):

还有一些关于何时使用 InnoDB 引擎的问题:

我只是使用 tinyint 来处理几乎所有事情(认真的)。

编辑 - 如何存储“帖子:”

下面是一些包含更多详细信息的链接,但这是简短的版本。为了存储“帖子”,您需要空间来容纳长文本字符串。 CHAR 最大长度为 255,因此这不是一个选项,当然 CHAR 会浪费未使用的字符,而 VARCHAR 是可变长度的CHAR

在 MySQL 5.0.3 之前,VARCHAR 最大长度为 255,因此您将留下 TEXT。但是,在较新版本的 MySQL 中,您可以使用 VARCHARTEXT。选择取决于偏好,但也存在一些差异。 VARCHARTEXT 最大长度现在均为 65,535,但您可以在 VARCHAR 上设置自己的最大值。假设您认为您的帖子最多只需 2000 个,您可以设置 VARCHAR(2000)。如果您每次都遇到限制,您可以稍后ALTER您的表并将其更改为VARCHAR(3000)。另一方面,TEXT 实际上将其数据存储在 BLOB 中 (1)。我听说 VARCHARTEXT 之间可能存在性能差异,但我还没有看到任何证据,因此您可能想更多地研究这一点,但您将来总是可以改变这个小细节。

更重要的是,使用全文索引而不是 LIKE 搜索此“帖子”列会快得多 (2)。但是,您必须使用 MyISAM 引擎才能使用全文索引,因为 InnoDB 不支持它。在 MySQL 数据库中,您可以为每个表使用异构混合的引擎,因此您只需使“posts”表使用 MyISAM。但是,如果您绝对需要“posts”来使用 InnoDB(用于事务),则设置一个触发器来更新“posts”表的 MyISAM 副本,并使用 MyISAM 副本进行所有全文搜索。

请参阅底部的一些有用的引用。

(3) “VARCHAR 列中的值是
可变长度字符串。长度
可以指定为 0 到
MySQL 5.0.3 之前为 255,0 到
5.0.3及更高版本中为65,535。

MySQL 5.0.3之前,如果需要数据
尾随空格不存在的类型
已删除,请考虑使用 BLOB 或 TEXT
类型。

当存储 CHAR 值时,它们是
右侧填充空格
指定长度。当 CHAR 值为
检索到,尾随空格是
已删除。

MySQL 5.0.3之前,尾随空格
当它们是时从值中删除
存储到 VARCHAR 列中;这
意味着空格也不存在
从检索到的值。”

与 TEXT 优缺点的精彩文章。它还谈到了性能问题:

Brief Summary:

(just my opinions)

  1. for email address - VARCHAR(255)
  2. for username - VARCHAR(100) or VARCHAR(255)
  3. for id_username - use INT (unless you plan on over 2 billion users in you system)
  4. phone numbers - INT or VARCHAR or maybe CHAR (depends on if you want to store formatting)
  5. posts - TEXT
  6. dates - DATE or DATETIME (definitely include times for things like posts or emails)
  7. money - DECIMAL(11,2)
  8. misc - see below

As far as using InnoDB because VARCHAR is supposed to be faster, I wouldn't worry about that, or speed in general. Use InnoDB because you need to do transactions and/or you want to use foreign key constraints (FK) for data integrity. Also, InnoDB uses row level locking whereas MyISAM only uses table level locking. Therefore, InnoDB can handle higher levels of concurrency better than MyISAM. Use MyISAM to use full-text indexes and for somewhat less overhead.

More importantly for speed than the engine type: put indexes on the columns that you need to search on quickly. Always put indexes on your ID/PK columns, such as the id_username that I mentioned.

More details:

Here's a bunch of questions about MySQL datatypes and database design (warning, more than you asked for):

And a couple questions on when to use the InnoDB engine:

I just use tinyint for almost everything (seriously).

Edit - How to store "posts:"

Below are some links with more details, but here's the short version. For storing "posts," you need room for a long text string. CHAR max length is 255, so that's not an option, and of course CHAR would waste unused characters versus VARCHAR, which is variable length CHAR.

Prior to MySQL 5.0.3, VARCHAR max length was 255, so you'd be left with TEXT. However, in newer versions of MySQL, you can use VARCHAR or TEXT. The choice comes down to preference, but there are a couple differences. VARCHAR and TEXT max length is now both 65,535, but you can set you own max on VARCHAR. Let's say you think your posts will only need to be 2000 max, you can set VARCHAR(2000). If you every run into the limit, you can ALTER you table later and bump it to VARCHAR(3000). On the other hand, TEXT actually stores its data in a BLOB (1). I've heard that there may be performance differences between VARCHAR and TEXT, but I haven't seen any proof, so you may want to look into that more, but you can always change that minor detail in the future.

More importantly, searching this "post" column using a Full-Text Index instead of LIKE would be much faster (2). However, you have to use the MyISAM engine to use full-text index because InnoDB doesn't support it. In a MySQL database, you can have a heterogeneous mix of engines for each table, so you would just need to make your "posts" table use MyISAM. However, if you absolutely need "posts" to use InnoDB (for transactions), then set up a trigger to update the MyISAM copy of your "posts" table and use the MyISAM copy for all your full-text searches.

See bottom for some useful quotes.

(3) "Values in VARCHAR columns are
variable-length strings. The length
can be specified as a value from 0 to
255 before MySQL 5.0.3, and 0 to
65,535 in 5.0.3 and later versions.

Before MySQL 5.0.3, if you need a data
type for which trailing spaces are not
removed, consider using a BLOB or TEXT
type.

When CHAR values are stored, they are
right-padded with spaces to the
specified length. When CHAR values are
retrieved, trailing spaces are
removed.

Before MySQL 5.0.3, trailing spaces
are removed from values when they are
stored into a VARCHAR column; this
means that the spaces also are absent
from retrieved values."

Lastly, here's a great post about the pros and cons of VARCHAR versus TEXT. It also speaks to the performance issue:

奈何桥上唱咆哮 2024-09-17 09:50:23

有多个角度来解决你的问题。

从设计 POV 来看,最好选择能够最好地表达您想要建模的数量的数据类型。也就是说,首先要确定数据域和数据大小,以便非法数据不能存储在数据库中。但这并不是 MySQL 的强项,尤其是默认的 sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html)。如果它适合您,请尝试传统的 sql_mode,它是许多所需标志的简写。

从表演的角度来看,问题是完全不同的。例如,关于电子邮件正文的存储,您可能需要阅读 http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ 然后想想。

消除冗余并使用短键可能是一个巨大的胜利。例如,在我见过的一个项目中,一个日志表一直在存储http User-Agent信息。通过简单地将日志表中的每个用户代理字符串替换为查找表中用户代理字符串的数字 ID,数据集大小显着减小(超过 60%)。通过进一步解析用户代理,然后存储一堆 ids(操作系统、浏览器类型、版本索引),数据集大小减少到原始大小的 1%。

最后,有许多规则可以帮助您发现模式设计中的错误。

例如,名称中包含 id 并且不是无符号整数类型的任何内容都可能是一个错误(尤其是在 innodb 的上下文中)。

例如,任何名称中包含价格或成本且未签名的内容都是潜在的欺诈来源(欺诈者创建负价商品并购买)。

例如,任何适用于货币数据且未使用适当大小的 DECIMAL 数据类型的内容都可能会执行数学错误(DECIMAL 执行的是 BCD、具有正确精度和舍入的十进制纸质数学,而 DOUBLE 和 FLOAT 则不会)。

There are multiple angles to approach your question.

From a design POV it is always best to chose the datatype which expresses the quantity you want to model best. That is, get the data domain and data size right so that illegal data cannot be stored in the database in the first place. But that is not where MySQL is strong in the first place, and especially not with the default sql_mode (http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html). If it works for you, try the TRADITIONAL sql_mode, which is a shorthand for many desireable flags.

From a performance POV, the question is entirely different. For example, regarding the storage of email bodies, you might want to read http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ and then think about that.

Removing redundancies and having short keys can be a big win. For example, in a project that I have seen, a log table has been storing http User-Agent information. By simply replacing each user agent string in the log table with a numeric id of a user agent string in a lookup table, data set size was considerably (more than 60%) reduced. By parsing the user agent further and then storing a bunch of ids (operating system, browser type, version index) data set size was reduced to 1% of the original size.

Finally, there is a number of rules that can help you spot errors in schema design.

For example, anything that has id in the name and is not an unsigned integer type is probably a bug (especially in the context of innodb).

For example, anything that has price or cost in the name and is not unsigned is a potential source of fraud (fraudster creates article with negative price, and buys that).

For example, anything that works on monetary data and is not using the DECIMAL data type of the appropriate size is probably doing math wrong (DECIMAL is doing BCD, decimal paper math with correct precision and rounding, DOUBLE and FLOAT do not).

司马昭之心 2024-09-17 09:50:23

SQLyog 具有计算最佳数据类型功能,有助于根据插入表中的记录找出最佳数据类型。
它使用

SELECT * FROMtable_name` PROCEDURE ANALYSE(1, 10);

查询找出最佳数据类型

SQLyog has Calculate optimal datatype feature which helps in finding out optimal datatype based on records inserted in a table.
It uses

SELECT * FROMtable_name` PROCEDURE ANALYSE(1, 10);

query to find out optimal datatype

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文