varchar 值如何存储在 SQL Server 数据库中?
我的程序员同事向他的团队领导提出了一个奇怪的要求;他坚持创建长度为 16*2n 的 varchar
列。
这样的限制有什么意义呢?
我可以假设短字符串(例如少于 128 个字符)直接存储在表的记录中,从这个角度来看,限制将有助于对齐记录中的字段,较大的字符串存储在数据库“堆”中并且只有对该字符串的引用保存在表记录中。
是这样吗?
这个要求有合理的背景吗?
顺便说一句,DBMS 是 SQL Server 2008。
My fellow programmer has a strange requirement from his team leader; he insisted on creating varchar
columns with a length of 16*2n.
What is the point of such restriction?
I can suppose that short strings (less than 128 chars for example) a stored directly in the record of the table and from this point of view the restriction will help to align fields in the record, larger strings are stored in the database "heap" and only the reference to this string is saved in the table record.
Is it so?
Is this requirement has a reasonable background?
BTW, the DBMS is SQL Server 2008.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
据我所知,这是完全毫无意义的限制。假设标准
FixedVar
格式(与行/页压缩或稀疏列使用的格式相反)并假设您正在谈论varchar(1-8000)
列所有
varchar
数据存储在行末尾的可变长度部分中(如果无法容纳在行中,则存储在行外页中)。它在该部分中消耗的空间量(以及它是否最终超出行)完全取决于实际数据的长度而不是列声明的长度。SQL Server 在分配内存时(例如
排序
操作)将使用列声明中声明的长度。在该实例中所做的假设是varchar
列将为 平均填充到其声明大小的 50%,因此在选择大小时这可能是更好的选择。Completely pointless restriction as far as I can see. Assuming standard
FixedVar
format (as opposed to the formats used with row/page compression or sparse columns) and assuming you are talking aboutvarchar(1-8000)
columnsAll
varchar
data is stored at the end of the row in a variable length section (or in offrow pages if it can't fit in row). The amount of space it consumes in that section (and whether or not it ends up off row) is entirely dependant upon the length of the actual data not the column declaration.SQL Server will use the length declared in the column declaration when allocating memory (e.g. for
sort
operations). The assumption it makes in that instance is thatvarchar
columns will be filled to 50% of their declared size on average so this might be a better thing to look at when choosing a size.我以前听说过这种做法,但在研究了这个问题之后,我认为 varchar 值是 16 的倍数没有实际原因。我认为这个要求可能来自于尝试优化每个页面上使用的空间。在 SQL Server 中,页设置为每页 8 KB。行存储在页中,因此可能的想法是,如果每行的大小均匀划分为 8 KB,则可以节省页上的空间(有关 SQL Server 如何存储数据的更详细说明,请参阅 此处)。但是,由于 varchar 字段使用的空间量是由其实际内容决定的,因此我看不出使用 16 倍数的长度或任何其他方案如何帮助您优化页面上每行使用的空间量。 varchar 字段的长度应根据业务要求设置。
此外,这个问题涵盖了类似的基础,结论似乎也相同:
基于字符的数据的数据库列大小
I have heard of this practice before, but after researching this question a bit I don't think there is a practical reason for having varchar values in multiples of 16. I think this requirement probably comes from trying to optimize the space used on each page. In SQL Server, pages are set at 8 KB per page. Rows are stored in pages, so perhaps the thinking is that you could conserve space on the pages if the size of each row divided evenly into 8 KB (a more detailed description of how SQL Server stores data can be found here). However, since the amount of space used by a varchar field is determined by its actual content, I don't see how using lengths in multiples of 16 or any other scheme could help you optimize the amount of space used by each row on the page. The length of the varchar fields should just be set to whatever the business requirements dictate.
Additionally, this question covers similar ground and the conclusion also seems to be the same:
Database column sizes for character based data
您应该始终以与所存储的数据匹配的数据大小来存储数据。这是数据库维护完整性的一部分。例如,假设您正在存储电子邮件地址。如果您的数据大小是允许的最大电子邮件地址的大小,那么您将无法存储大于该值的不良数据。这是一件好事。有些人希望将所有内容都设为 nvarchar(max) 或 varchar(max)。然而,这只会导致索引问题。
就我个人而言,我会去找提出这个要求的人并询问原因。然后我会提出我的理由来解释为什么这可能不是一个好主意。我绝不会盲目地实施这样的事情。在提出这样的要求时,我首先会研究 SQL Server 如何组织磁盘上的数据,这样我就可以显示该要求可能对性能产生的影响。我什至可能会惊讶地发现这个要求是有意义的,但我现在对此表示怀疑。
You should always store the data in the data size that matches the data being stored. It is part of how the database can maintain integrity. For instance suppose you are storing email addresses. If your data size is the size of the maximum allowable emailaddress, then you will not be able to store bad data that is larger than that. That is a good thing. Some people want to make everything nvarchar(max) or varchar(max). However, this causes only indexing problems.
Personally I would have gone back to the person who make this requirement and asked for a reason. Then I would have presented my reasons as to why it might not be a good idea. I woul never just blindly implement something like this. In pushing back on a requirement like this, I would first do some research into how SQL Server organizes data on the disk, so I could show the impact of the requirement is likely to have on performance. I might even be surprised to find out the requirement made sense, but I doubt it at this point.