为 (n)varchar 列选择什么大小?

发布于 2024-07-30 19:09:32 字数 1433 浏览 4 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

笑,眼淚并存 2024-08-06 19:09:32

一个重要的区别是指定任意大的限制 [例如 VARCHAR(2000)] 和使用不需要限制的数据类型 [例如 VARCHAR(MAX) 或 <代码>文本]。

PostgreSQL 的所有固定长度 VARCHAR 都基于其无限的 TEXT 类型,并动态决定每个值如何存储该值,包括存储它页外。 在这种情况下,长度说明符实际上只是一个约束,实际上不鼓励使用它。 (ref)

其他 DBMS 要求用户选择是否需要“无限”的页外存储,通常会带来便利性和/或性能方面的相关成本。

如果使用 VARCHAR() 优于 VARCHAR(MAX)TEXT,那么您必须选择一个设计表格时 的值。 假设表行或索引条目存在某个最大宽度,则必须应用以下约束:

  1. 必须小于或等于 >
  2. 如果<代码>= <最大宽度>,表/索引
  3. 一般只能有 1 列,表/索引只能有 列,其中(平均)= <最大宽度> /

因此, 的值仅充当约束,并且 的选择 必须是设计的一部分。 (即使您的 DBMS 中没有硬性限制,也可能出于性能原因将宽度保持在一定限制内。)

您可以使用上述规则来指定最大值 ,基于表的预期架构(考虑到未来更改的影响)。 但是,根据每列中的预期数据定义 最小值更有意义。 最有可能的是,您将扩展到最接近的“整数” - 例如,您将始终使用 VARCHAR(10)VARCHAR(50)VARCHAR(200) )VARCHAR(1000),以最合适的为准。

One important distinction is between specifying an arbitrarily large limit [e.g. VARCHAR(2000)], and using a datatype that does not require a limit [e.g. VARCHAR(MAX) or TEXT].

PostgreSQL bases all its fixed-length VARCHARs on its unlimitted TEXT type, and dynamically decides per value how to store the value, including storing it out-of-page. The length specifier in this case really is just a constraint, and its use is actually discouraged. (ref)

Other DBMSs require the user to select if they require "unlimitted", out-of-page, storage, usually with an associated cost in convenience and/or performance.

If there is an advantage in using VARCHAR(<n>) over VARCHAR(MAX) or TEXT, it follows that you must select a value for <n> when designing your tables. Assuming there is some maximum width of a table row, or index entry, the following constraints must apply:

  1. <n> must be less than or equal to <max width>
  2. if <n> = <max width>, the table/index can have only 1 column
  3. in general, the table/index can only have <x> columns where (on average) <n> = <max width> / <x>

It is therefore not the case that the value of <n> acts only as a constraint, and the choice of <n> must be part of the design. (Even if there is no hard limit in your DBMS, there may well be performance reasons to keep the width within a certain limit.)

You could use the above rules to assign a maximum value of <n>, based on the expected architecture of your table (taking into account the impact of future changes). However, it makes more sense to define the minimum value of <n>, based on the expected data in each column. Most likely, you will expand to the nearest "round number" - e.g. you will always use either VARCHAR(10), VARCHAR(50), VARCHAR(200), or VARCHAR(1000), whichever is the best fit.

裂开嘴轻声笑有多痛 2024-08-06 19:09:32

我只能代表甲骨文发言。 如果您输入值“SMITH”,则 VARCHAR2(50) 和 VARCHAR2(255) 占用完全相同的空间量并且执行相同的操作。

然而,将所有文本列声明为 VARCHAR2(4000) 通常不是一个好主意,因为列长度实际上是另一个约束。 约束是业务规则的数据库实现,因此它们绝对应该在数据库方面定义。

举个例子。 您在列上定义 CHECK 约束,以便它只能接受“Y”和“N”值。 这使您的应用程序不必处理“y”和“n”,甚至“1”和“0”。 检查约束可确保您的数据符合预期标准。 然后,您的应用程序代码可以对其必须处理的数据的性质做出有效的假设。

列长度​​定义也是同样的情况。 您将某些内容声明为 VARCHAR2(10) 因为您不希望它接受“ABC123ZYX456”条目(无论出于何种原因!)

在澳大利亚,我将 STATE 列定义为 varchar2(3) 因为我不希望人们输入“新南威尔士州”或“南澳大利亚州”。 列定义几乎强制将它们输入为“NSW”和“SA”。 从这个意义上说,VARCHAR2(3) 几乎与实际指定 CHECK IN('NSW'、'SA'、'VIC' 等)约束一样是检查约束。

简而言之,适当的列长度是编码业务规则的一种方式。 它们是另一种形式的约束。 它们带来了约束的所有优点(并遭受许多相同的缺点)。 它们在一定程度上确保了一定程度的“数据清洁度”,“适当的”约束也有帮助。

我也不同意这种观点,即最好将此类内容保留在客户端应用程序中,因为那里更容易更改。 有 20,000 人使用某个应用程序,即 20,000 次更新。 您拥有一个数据库,即一次更新。 如果“更容易更改客户端应用程序”这一论点为真,则可能意味着数据库将被视为一个巨大的位桶,所有巧妙的逻辑都在客户端代码中处理。 这是一个值得讨论的话题,但由于所有 RDBMS 都允许您在数据库本身中定义约束等,因此很明显,至少有一个有价值的案例表明这种基本逻辑属于后端。

I can only speak for Oracle. A VARCHAR2(50) and a VARCHAR2(255) take up exactly the same amount of space and perform identically, if you enter the value 'SMITH'.

However, the reason why it is generally not a good idea to go around declaring all your textual columns as VARCHAR2(4000) is that column length is, effectively, another constraint. And constraints are database implementation of business rules, so they are definitely something that should be defined on the database side of things.

As a for-example. You define a CHECK constraint on a column so that the values it can accept are only 'Y' and 'N'. That saves your application from having to deal with 'y' and 'n' or even '1' and '0'. The check constraint ensures your data conforms to expected standards. Your application code can then make valid assumptions about the nature of the data it has to deal with.

Column length definition is in the same boat. You declare something to be a VARCHAR2(10) because you don't want it accepting an entry of 'ABC123ZYX456' (for whatever reason!)

In Australia, I define STATE columns to be a varchar2(3) because I don't want people typing in 'New South Wales' or 'South Australia'. The column definition pretty much forces them to be entered as 'NSW' and 'SA'. In that sense, a VARCHAR2(3) is almost as much a check constraint as actually specifying a CHECK IN ('NSW','SA','VIC' etc) constraint.

In short, proper column lengths are a way of encoding business rules. They're another form of constraint. They bring all the advantages of constraints (and suffer from many of the same drawbacks). And they ensure, to a small extent, a degree of 'data cleanliness' that "proper" constraints help with, too.

I don't buy the argument, either, that it's best to stick these sorts of things in the client app because it's easier to change there. You have 20,000 people using an app, that's 20,000 updates. You have one database, that's one update. The 'easier to change the client app' argument, if true, would potentially mean the database just gets treated as a giant bit bucket with all the clever logic being handled in client code. It's a big discussion to have, but since all RDBMSes let you define constraints and so on in the database itself, it's pretty clear that there's at least a worthwhile case to be made that such fundamental logic belongs in the backend.

倾城°AllureLove 2024-08-06 19:09:32

我听说查询优化器确实考虑了 varchar 长度,但我找不到参考。

定义 varchar 长度有助于传达意图。 定义的约束越多,数据就越可靠。

I have heard the query optimizer does take varchar length into consideration, though I can't find a reference.

Defining a varchar length helps communicate intent. The more contraints defined, the more reliable the data.

鲜血染红嫁衣 2024-08-06 19:09:32

那么为什么人们试图使他们的列尽可能小?我不相信使它们尽可能小,而是适当调整它们的大小。 使 (n)varchar 更小而不是更大的一些原因:

1) 对于较大的字段,所有使用数据库的客户端都必须能够处理完整大小。 例如,采用一个包含每个字段 255 个字符的美国地址的系统:(我相信,与您引用的 TDWTF 类似。)

  • 名字 姓氏
  • 地址
  • 行 1
  • 地址行 2
  • 城市
  • 邮政编码

现在是您的数据输入屏幕每个字段需要允许并显示 255 个字符。 不难,但对于较大的字段不太可能看起来很好打印发票,您将需要换行逻辑来处理大字段。 取决于工具,没那么难。

但我不希望出现格式化信封地址的问题,该信封的每个字段或其中任何一个字段可能有 255 个字符。 如果字段太长而无法容纳,您是否要截断? 太棒了,某人的地址第 1 行是“门牌号、街号……等等……公寓号 111”。 而且你会删掉重要的公寓号码。 你要包起来吗? 多少? 如果您无法将其放入信封上的小盒子中怎么办? 提出例外情况并让人手写信函吗?

2) 虽然 varchar(50) 与 varchar(255) 中保存的 10 个字符的数据不会影响大小或速度,但允许 255 个字符可以占用更多空间。 如果所有字段都那么大,您可能会达到 SQL Server 2000 中的大小限制。(我还没有阅读 2005 和 2008 来看看它们是否可以处理大于一页的行。)而使用 Oracle,您可以允许更大的大小如果有人实际上使用了所有可用的字符,就会发生行链接。

3) 索引的大小限制比叶页更严格。 如果您创建的 varchar 太大,您可能会排除索引,尤其是复合索引。


另一方面,我的地址有很长的第 1 行,并且对不允许输入完整内容的网站感到沮丧。

So why do people try to make their columns as small as possible? I don't believe in making them as small as possible, but sizing them appropriately. Some reasons for making (n)varchars smaller rather than larger:

1) With a larger field, all clients that use the database must be able to handle the full size. For example, take a system that holds a United States address with 255 characters per each field: (Similar to TDWTF that you reference, I believe.)

  • First Name
  • Last Name
  • Address Line 1
  • Address Line 2
  • City
  • State
  • ZIP Code

Now your data entry screens will need to allow and show 255 characters per field. Not hard, but unlikely to look nice with larger fields Printing invoices, you will need line breaking logic to handle the large fields. Depending on tool, not that hard.

But I would not want the problem of formatting the address for an envelope that could have 255 characters for each of those fields or just any one of those fields. Are you going to truncate if the field is too long to fit? Great someone has Address Line 1 of "House Number Streat Number ... blah blah blah ... Appartment number 111." And you'll lop off the important apartment number. Are you going to wrap? How much? What if you just can't fit it in the little box of space on the envelop? Raise an exception and have someone hand letter it?

2) While 10 characters of data held in a varchar(50) versus varchar(255) does not impact size or speed, allowing 255 characters allows for more space to be taken. And if all fields are that large you could hit size limits in SQL Server 2000. (I haven't read up on 2005 & 2008 to see if they can handle rows greater than one page.) And with Oracle you the larger sizes allows row chaining to happen if someone actually uses all the available characters.

3) Indexes have stricter size limits then leaf pages. You may preclude indexes, especially composite indexes, if you create your varchars too big.


On the other hand, I have a long line 1 for my address, and have been frustrated by web sites that don't allow the full thing to be typed.

尽揽少女心 2024-08-06 19:09:32

在我看来,对此问题的简单回答是,您不能使用该列作为索引键,如果您需要任何索引,您基本上被迫使用全文......这是关于使用 varchar(max) 列。 无论如何,每当您[可能]想要应用任何索引时,“调整大小”列都非常有意义; 更新可变长度列可能是一项代价高昂的操作,因为这些操作没有到位,并且可能/将导致一定程度的碎片。

所有这些都与 MS SQ-Server 有关。

Simple answer to this in my opinion is the fact that you cannot use that column as an index key, if you require any indexing you are basically forced to use fulltext... this is with regards to using a varchar(max) column. In any case 'right-sizing' columns makes a lot of sense whenever you [may] want to apply any indexing; updating variable length columns may be a costly maneuver as these are not done in place and can/will cause some amount of fragmentation.

All with regard to MS SQ-Server.

狂之美人 2024-08-06 19:09:32

如果要打印标签,您通常希望字符串长度不超过 35 个字符。 这就是为什么您需要对要用来接受的 Varchar 的大小进行一些控制
将用于打印标签的行。

If you are going to print labels you usually want the string to be no longer than 35 characters. This is why you want some control on the size of the Varchar that you are going to use to accept
the lines that are going to be used to print labels.

木有鱼丸 2024-08-06 19:09:32

我会用一个问题来回答你的问题:如果 varchar(50) 和 varchar(255) 之间的 DBMS 没有区别,为什么 DBMS 让你做出区分? 为什么 DBMS 不简单地说“使用 varchar 来表示最多 xxx 个字符,使用 text/clob/等等来表示超过这个字符的字符”。 当然,也许 Microsoft/Oracle/IBM 可能会出于历史原因保留长度定义,但是像 MySQL 这样具有多个存储后端的 DBMS 又如何呢?为什么每个后端都实现可定义的字符列长度呢?

I'll answer your question with a question: If there is no difference to the DBMS between a varchar(50) and a varchar(255), why would the DBMS let you make a distinction? Why wouldn't a DBMS simply say "use varchar for up to xxx characters, and text/clob/etc. for anything over that." Sure, perhaps Microsoft/Oracle/IBM might keep the length definition for historical reasons, but what about DBMS' like MySQL which has multiple storage backends- why does every one implement definable character column lengths?

对岸观火 2024-08-06 19:09:32

如果您允许数据​​长度超过 255,并且有人通过 MS Access 链接到数据,则数据将无法用于连接表(作为备注字段出现)。 如果数据导出到 Excel,每个字段将限制为 255 个字符。 创建数据集时应考虑与其他程序的兼容性。
数据质量控制就是控制进入您环境的数据。 您需要存储超过 255 个字符的什么内容? 有时数据需要超过 255 个字符,但它们之间应该相差甚远,并且应该用作可用于分析的字段的支持性补充信息

If you allow the data length to be over 255 and someone links to the data through MS Access the data is not able to be used to join tables (comes in as a memo field). If the data is exported to excel it will be limited to 255 characters per field. Compatibility with other programs should be considered when creating data sets.
Data quality control is all about controlling the data entering your environment. What do you need to store that is over 255 characters? There are times that data needs to be over 255 characters, but they should be far and few between and should be used as supportive supplemental information for a field that can be used for analysis

单调的奢华 2024-08-06 19:09:32

尺寸确实很重要,而且它会影响性能! 在 mssql 中,执行规划器会做出假设,这种假设可能会造成伤害,当规划一种 varchar 列时,规划器预测平均行大小为声明长度的 50%,+ 一点开销,因此如果您声明 varchar(200)执行计划器估计行大小约为 110 字节,如果行填充到 80%,执行计划将被迫使用 tempdb 对行进行排序,而不是在内存中排序。 我已在此处提供证明和完整示例: https://kisunu.no/index.php? pid=20130

Size do matter, and it CAN affect performance! In mssql the execution planner makes assumptions and this assumption can hurt, when planning a sort of a varchar column, the planner predict average row size to 50% of declared length, + a little for overhead, so if you declare varchar(200) the execution planner estimate row size to around 110 bytes, and if the rows is filled to 80% the execution plan will be forced to use tempdb to sort the rows instead of in memory sort. I have made proof and full example available here: https://kisunu.no/index.php?pid=20130

南渊 2024-08-06 19:09:32

我没有在其他地方看到这一点,所以我将补充一下:过度分配 VARCHAR 大小时可能会对性能产生影响。

请记住,VARCHAR 的“变量”方面与磁盘上的存储相关。 但是,当数据库需要将数据读入内存时,它必须根据列中值的最大可能大小来分配内存。 这是因为数据库在从磁盘读取数据之前并不知道数据有多大。

这种效应会随着数据量的增加而放大。 如果您的数据库/工作负载较小,您可能甚至不会注意到。 但是,如果您有数十亿行、数百个带有数十个最大宽度 VARCHAR 列的表和大量查询,您可能会注意到数据库内存资源的压力。

I didn't see this mentioned elsewhere, so I will add: there can be a performance impact when overallocating VARCHAR size.

Remember that the "variable" aspect of VARCHAR is related to storage on disk. But when the DB needs to read the data into memory it has to allocate memory based on the max possible size of a value in the column. This is because the DB doesn't know how big the data is until it is read off disk.

This effect gets amplified by the volume of data. If you have a small database/workload, you probably won't even notice. However, if you have billions of rows, across hundreds of tables with dozens of max-width VARCHAR columns and a lot of queries, you'll probably notice a strain on the DB's memory resources.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文