在 SQL Server 中是否可以一次以多种方式进行分区?

发布于 2024-08-13 09:54:48 字数 265 浏览 9 评论 0原文

我正在考虑在 SQL Server 中对数据进行分区的各种方法。我正在考虑的一种方法是将一个特定的大表分区为 8 个分区,然后在每个分区内的不同分区列上进行分区。这在 SQL Server 中是否可能,或者我是否仅限于为每个表定义一个分区列+函数+方案?

我对更一般的答案感兴趣,但这个策略是我正在考虑的分布式分区视图策略,其中我将使用 DPV 在第一个方案下对数据进行分区,以将大量数据分布到 8 台机器上,然后在每台机器上将完整表的该部分分区到另一个分区键上,以便能够根据需要删除(例如)子分区。

I'm considering various ways to partition my data in SQL Server. One approach I'm looking at is to partition a particular huge table into 8 partitions, then within each of these partitions to partition on a different partition column. Is this even possible in SQL Server, or am I limited to definining one parition column+function+scheme per table?

I'm interested in the more general answer, but this strategy is one I'm considering for Distributed Partitioned View, where I'd partition the data under the first scheme using DPV to distribute the huge amount of data over 8 machines, and then on each machine partition that portion of the full table on another parition key in order to be able to drop (for example) sub-paritions as required.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

阿楠 2024-08-20 09:54:48

您认为无法计算分区键是错误的。使用计算的、持久的列作为键:

ALTER TABLE MYTABLE ADD PartitionID AS ISNULL(Column1 * Column2,0) persisted

我一直这样做,非常简单。

You are incorrect that the partitioning key cannot be computed. Use a computed, persisted column for the key:

ALTER TABLE MYTABLE ADD PartitionID AS ISNULL(Column1 * Column2,0) persisted

I do it all the time, very simple.

2024-08-20 09:54:48

跨一组分区表的 DPV 是实现此目的的唯一干净选项,类似于跨 tblSales2007、tblSales2008、tblSales2009 的 DPV,然后每个相应的销售表再次分区,但可以通过不同的键对它们进行分区。就操作弹性而言,这样做有一些非常好的好处(一个分区表脱机不会导致 DPV 下降 - 它仍然可以满足对其他时间线的查询)

黑客选项是创建 2 列的任意哈希,并且将其存储在每个记录中,并按其分区。您必须为每个查询/插入等生成此哈希,因为无法计算分区键,它必须是存储的值。这是一个黑客行为,我怀疑它会损失比你获得的更多的性能。

不过,您确实必须考虑数据量的具体管理问题/灾难恢复,如果数据量非常大并且您以主要读取机制访问它,那么您应该考虑 SQL“Madison”,它将在两个数字上都大幅扩展行数以及数据的总体大小。但它确实只适合99.9%读取类型的数据仓库,不适合OLTP。

我的生产数据集位于“数十亿”范围内,它们驻留在分区表系统上并提供非常好的性能 - 尽管其中大部分基于系统底层的硬件,而不是数据库本身。扩展到这个水平不是问题,而且我知道其他人也已经远远超出了这些数量。

每个表的最大分区仍然是 1000,根据我对有关此问题的对话的记忆,这是由执行的测试设置的数字 - 由于技术限制而不是适当的数字。

The DPV across a set of Partitioned Tables is your only clean option to achieve this, something like a DPV across tblSales2007, tblSales2008, tblSales2009, and then each of the respective sales tables are partitioned again, but they could then be partitioned by a different key. There are some very good benefits in doing this in terms of operational resiliance (one partitioned table going offline does not take the DPV down - it can satisfy queries for the other timelines still)

The hack option is to create an arbitary hash of 2 columns and store this per record, and partition by it. You would have to generate this hash for every query / insertion etc since the partition key can not be computed, it must be a stored value. It's a hack and I suspect would lose more performance than you would gain.

You do have to be thinking of specific management issues / DR over data quantities though, if the data volumes are very large and you are accessing it in a primarily read mechanism then you should look into SQL 'Madison' which will scale enormously in both number of rows as well as overall size of data. But it really only suits the 99.9% read type data warehouse, it is not suitable for an OLTP.

I have production data sets sitting in the 'billions' bracket, and they reside on partitioned table systems and provide very good performance - although much of this is based on the hardware underlying a system, not the database itself. Scalaing up to this level is not an issue and I know of other's who have gone well beyond those quantities as well.

The max partitions per table remains at 1000, from what I remember of a conversation about this, it was a figure set by the testing performed - not a figure in place due to a technical limitation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文