为什么要为字符变化类型指定长度

发布于 2024-12-02 22:48:12 字数 1008 浏览 3 评论 0原文

参考关于字符类型的Postgres文档,我不清楚为字符变化 (varchar) 类型指定长度的要点。

假设:

  • 字符串的长度对应用程序来说并不重要。
  • 你不在乎有人在数据库中放入最大大小
  • 你有无限的硬盘空间

它确实提到:

短字符串(最多126字节)的存储要求是1字节 加上实际的字符串,其中包括大小写中的空格填充 的性格。较长的字符串有 4 个字节的开销,而不是 1 个。 长字符串会被系统自动压缩,所以 对磁盘的物理要求可能会更少。很长的值也是 存储在后台表中,以便它们不会干扰快速 访问较短的列值。无论如何,尽可能长的 可存储的字符串约为1GB。 (最大值 数据类型声明中允许 n 小于 那。改变这个是没有用的,因为对于多字节 字符编码的字符数和字节数可以相当 不同。

这讨论的是字符串的大小,而不是字段的大小(即听起来它总是会压缩大 varchar 字段中的大字符串,但不会压缩大 varchar 字段中的小字符串?)

我问这个问题,因为它会指定更大的大小会更容易(而且更懒),因此您永远不必担心字符串太大。例如,如果我为地名指定 varchar(50),我将获得具有更多字符的位置(例如 Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch),但如果我指定 varchar(100) 或 varchar(500),我不太可能遇到该问题。

那么,如果最大的字符串长度为 400 个字符,那么 varchar(500) 和(任意)varchar(5000000) 或 text() 之间的性能会受到影响吗?

同样出于兴趣,如果有人知道这个问题的答案并且知道其他数据库的答案,也请添加。

我用谷歌搜索过,但没有找到足够的技术解释。

Referring to the Postgres Documentation on Character Types, I am unclear on the point of specifying a length for character varying (varchar) types.

Assumption:

  • the length of string doesn't matter to the application.
  • you don't care that someone puts that maximum size in the database
  • you have unlimited hard disk space

It does mention:

The storage requirement for a short string (up to 126 bytes) is 1 byte
plus the actual string, which includes the space padding in the case
of character. Longer strings have 4 bytes of overhead instead of 1.
Long strings are compressed by the system automatically, so the
physical requirement on disk might be less. Very long values are also
stored in background tables so that they do not interfere with rapid
access to shorter column values. In any case, the longest possible
character string that can be stored is about 1 GB. (The maximum value
that will be allowed for n in the data type declaration is less than
that. It wouldn't be useful to change this because with multibyte
character encodings the number of characters and bytes can be quite
different.

This talks about the size of string, not the size of field, (i.e. sounds like it will always compress a large string in a large varchar field, but not a small string in a large varchar field?)

I ask this question as it would be much easier (and lazy) to specify a much larger size so you never have to worry about having a string too large. For example, if I specify varchar(50) for a place name I will get locations that have more characters (e.g. Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch), but if I specify varchar(100) or varchar(500), I'm less likley to get that problem.

So would you get a performance hit between varchar(500) and (arbitrarily) varchar(5000000) or text() if your largest string was say 400 characters long?

Also out of interest if anyone has the answer to this AND knows the answer to this for other databases, please add that too.

I have googled, but not found a sufficiently technical explanation.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浮生面具三千个 2024-12-09 22:48:12

我的理解是,约束对于数据完整性很有用,因此我使用列大小来验证较低层的数据项,并更好地描述数据模型。

有关此事的一些链接:

My understanding is that having constraints is useful for data integrity, therefore I use column sizes to both validate the data items at the lower layer, and to better describe the data model.

Some links on the matter:

昇り龍 2024-12-09 22:48:12

我的理解是,这是旧数据库的遗留问题,其存储不如 Postgres 灵活。有些人会使用固定长度的结构来轻松查找特定记录,并且由于 SQL 是一种某种程度上标准化的语言,因此即使它没有提供任何实际好处,这种遗留问题仍然存在。

因此,对于 Postgres,您的“做大”方法应该是一种完全合理的方法,但它可能无法很好地转移到其他不太灵活的 RDBMS 系统。

My understanding is that this is a legacy of older databases with storage that wasn't as flexible as that of Postgres. Some would use fixed-length structures to make it easy to find particular records and, since SQL is a somewhat standardized language, that legacy is still seen even when it doesn't provide any practical benefit.

Thus, your "make it big" approach should be an entirely reasonable one with Postgres, but it may not transfer well to other less flexible RDBMS systems.

半世晨晓 2024-12-09 22:48:12

该文档对此进行了解释:

如果在没有长度说明符的情况下使用字符变化,则该类型接受任何大小的字符串。后者是 PostgreSQL 扩展。

SQL 标准要求所有类型都有长度规范。这可能主要是出于遗留原因。在 PostgreSQL 用户中,偏好往往是省略长度规范,但如果您想编写可移植代码,则必须包含它(并在许多情况下选择任意大小)。

The documentation explains this:

If character varying is used without length specifier, the type accepts strings of any size. The latter is a PostgreSQL extension.

The SQL standard requires a length specification for all its types. This is probably mainly for legacy reasons. Among PostgreSQL users, the preference tends to be to omit the length specification, but if you want to write portable code, you have to include it (and pick an arbitrary size, in many cases).

花想c 2024-12-09 22:48:12

还有两个想法:

  1. Postgres 文档说“非常长的值也存储在后台表中”。因此,将所有字符串定义为无界可能会将它们推入后台表中——这肯定会影响性能。

  2. 将所有内容声明为非常长会干扰数据库预测查询执行计划的工作,因为它对数据的了解较少。

  3. 构建 B 树来包含索引也会被放弃,因为它无法猜测合理的打包策略。例如,如果性别是 TEXT,您怎么知道它只是 M 或 F?

Two more thoughts:

  1. The Postgres doc says that 'very long values are also stored in background tables'. Thus, defining all strings as unbounded likely pushes them into background tables -- for sure a performance hit.

  2. Declaring everything as very long interferes with the DB's efforts to predict a query execution plan, because it has less knowledge of the data.

  3. Building a b-tree to contain an index would also be thrown off because it would not be able to guess a reasonable packing strategy. For example if gender was TEXT, how would you know it's all only M or F?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文