将表分区到多个物理节点

发布于 2024-08-23 03:30:46 字数 588 浏览 10 评论 0原文

因此,我目前正在开展一个项目,涉及收集和存储一些巨大的数据集(就我习惯使用的数据集而言)。数据本质上由元信息和实际值(其中值随时间变化的趋势)组成。

元信息本身相对较大,但没什么大的,我可能会说它会在未来几年内增长到 10-5000 万行大小。这对我来说似乎是可以管理的,如果索引正确的话,一个强大的 SQL Server 应该足以提供对此数据的快速访问(并且数据很容易索引,具有非常明确的边界)...

但是,趋势数据是一个完全不同的故事。一年之内,我们每天很容易就会拉入 40-5000 万行,并且在接下来的 3 到 4 年内,这一数字实际上每年都会翻倍。

这种趋势数据也有非常明确的边界,可以将其分成更易于管理的大小块。我希望我可以建立某种分区机制,将这些数据分布在多个物理数据库节点上。数据基本上全部包含在一个表中。我研究了 SQL Server 表分区,但找不到将数据分布到多个服务器上的方法。

我的问题是是否有一些“相对简单”的方法来实现多个物理节点上的表分区。我也花了一些时间研究 Sql Server PDW,但是很难在网上找到信息,并且我不想继续下去,直到我确定没有简单的方法可以使用构建的功能来实现此类解决方案进入 SQL Server。

任何建议将不胜感激...

So I'm currently working on a project that involves the collection and storing of some huge datasets (as far as what I'm used to working with). The data essentially consists of meta information, and then actual values (where the values are trended over time).

The meta information itself is relatively large, but nothing huge, I would probably say its going to grow the the 10-50 million row size over the next couple of years. This seems manageable to me, and a single beefy SQL Server should be enough to provide quick access to this data if it is decently indexed (and the data is very easy to index, with very defined boundaries)...

However, the trending data is a completely different story. Within a year, we are VERY easily going to be pulling in 40-50 million rows every day, and that could realistically double yearly for the next 3 or 4 years.

This trending data also has very defined boundaries that would split it into MUCH more manageable sized chunks. I'm hoping I can set up some sort of partitioning mechanism that would spread this data across multiple physical database nodes. The data is essentially all contained in a single table. I looked into SQL Server table partitioning, but couldn't find a way to spread the data over multiple servers.

My question is whether there is some "relatively simple" way of implementing table partitioning over multiple physical nodes. I've also spent some time looking at Sql Server PDW, but its difficult to find information online, and I don't want to pursue that until I've established that there is not simple way of implementing this sort of solution using features built into SQL Server.

Any advice would be greatly appreciated...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

策马西风 2024-08-30 03:30:46

我不是这方面的专家,但我相信您可能正在寻找的是数据库“分片”。 此处对分片的问题和好处进行了有趣的分析。

最终,“分片”设计的实施可能会非常昂贵,但如果您的数据在单个数据库中难以管理,那么这可能是一个很好的解决方案。

Wikipedia 页面上还有少量信息,其中包括以下列表:支持分片的软件(例如 Hibernate ORM)

I'm no expert on this but I believe what you may be looking for is database 'sharding'. There's an interesting analysis of the problems and benefits of sharding here.

Ultimately, implementation of a 'sharded' design is likely to be very costly but if your data is going to be unmanageable in a single database then this could be a good solution.

There is also a small amount of information on the Wikipedia page which includes a list of software which supports shards (e.g. the Hibernate ORM)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文