当前位置：文江博客话题详情

如何设置新的 SQL Server 数据库以允许将来可能的复制？

发布于 2024-08-20 21:06:43 字数 375 浏览 4 评论 0原文

我正在构建一个系统，该系统可能需要支持 500 多个并发用户，每个用户每分钟进行数十次查询（选择、插入和更新）。基于这些要求和具有数百万行的表，我怀疑将来需要使用数据库复制来减少一些查询负载。

过去没有使用过复制，我想知道在模式设计中是否需要考虑什么？

例如，我曾经被告知必须使用 GUID 作为主键才能启用复制。这是真的吗？
对于将要复制的数据库，数据库设计有哪些特殊注意事项或最佳实践？

由于项目的时间限制，我不想在不需要时实施复制，从而浪费任何时间。（我目前有足够多的明确问题需要克服，而不必担心必须解决可能的问题。）但是，当/如果将来需要复制时，我不想进行可能可以避免的模式更改。

关于这个主题的任何其他建议，包括学习如何实现复制的好地方，也将不胜感激。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忘你却要生生世世 2024-08-27 21:06:43

虽然每行都必须有一个 rowguid 列，但您不需要使用 Guid 作为主键。实际上，您甚至不需要拥有主键（尽管您会因为未能创建主键而被石头砸死）。即使您将主键定义为 guid，不将其设为 rowguid 列也会导致 Replication Services 为您创建一个附加列。您绝对可以这样做，这不是一个坏主意，但它绝不是必要的，也不是特别有利。

以下是一些提示：

保持表格（或者更确切地说，行）尺寸较小；除非您使用列级复制，否则即使只有一列发生更改，您也将下载/上传一行的全部内容。此外，较小的表使冲突解决更加容易且频率较低。
不要使用顺序或确定性算法驱动的主键。 这包括身份列。是的，复制服务将自行处理标识列并分配密钥分配，但这是一个您不想处理的令人头痛的问题。仅此一点就是使用 Guid 作为主键的一个很好的论据。
不要让您的应用程序执行不必要的更新。一开始这显然是一个坏主意，但从带宽使用和冲突解决的角度来看，这个问题在复制场景中会呈指数级恶化。

回复收藏 0 原文

热风软妹 2024-08-27 21:06:43

您可能希望使用 GUID 作为主键 - 在复制系统中，行在整个拓扑中必须是唯一的，而 GUID PK 是实现此目的的一种方法。

这是一篇关于在 SQL Server 中使用 GUID 的简短文章

回复收藏 0 原文

凉墨 2024-08-27 21:06:43

我想说你真正的问题不是如何处理复制，而是如何处理横向扩展，或者至少横向扩展以实现可查询性。虽然这个难题有多种答案，但有一个答案很突出：不使用复制。

复制（特别是合并复制）的问题在于复制中的写入会成倍增加。假设您的系统每秒处理 100 个查询（90 个读取和 10 个写入）的负载。您想要横向扩展并选择复制。现在您有 2 个系统，每个系统处理 50 个查询、45 个读取和 5 个写入。每个。现在必须复制这些写入，因此实际写入次数不是 5+5，而是 5+5（原始写入），然后是另一个 5+5（副本写入），因此您有 90 次读取和 20 次写入。因此，虽然每个系统的负载减少了，但写入和读取的比率却增加了。这不仅改变了 IO 模式，最重要的是它改变了负载的并发模式。添加第三个系统，您将有 90 次读取和 30 次写入，依此类推。很快您的写入次数就会多于读取次数，复制更新延迟加上并发问题和合并冲突将使您的项目脱轨。其要点是“很快”比你预期的要早得多。很快就足以证明考虑扩大规模是合理的，因为无论如何，您所说的是最多 6-8 个对等节点的规模，并且使用扩大规模增加 6-8 倍的容量会更快、更简单，甚至可能更便宜开始于。

请记住，所有这些都只是纯理论数字。实际上，复制基础设施并不是免费的，它会在系统上增加自己的负载。写入需要被跟踪，更改必须被读取，分发者必须存在来存储更改，直到分发给订阅者，然后更改必须被写入并调解可能的冲突。这就是为什么我看到很少有部署可以通过基于复制的横向扩展策略取得成功。

一种替代方案是仅扩展读取，这里复制确实有效，通常使用事务复制，但日志传送或使用数据库快照进行镜像也是如此。

真正的替代方案是分区（即分片）。请求在应用程序中路由到正确的分区并到达包含适当数据的服务器。一个分区上需要反映到另一分区上的更改是通过异步（通常基于消息传递）方式传送的。数据只能在分区内连接。有关我所讨论内容的更详细讨论，请阅读 MySpace 是如何做到的。不用说，这样的策略对应用程序设计有重大影响，不能简单地粘在 v1 之后。

I'd say your real question is not how to handle replication, but how to handle scale out, or at least scale out for queryability. And while there are various answers to this conundrum, one answer will stand out: not using replication.

The problem with replication, specially with merge replication, is that writes gets multiplied in replication. Say you have a system which handles a load of 100 queries (90 reads and 10 writes) per second. You want to scale out and you choose replication. Now you have 2 systems, each handling 50 queries, 45 reads and 5 writes each. Now those writes have to be replicated so the actual number of writes is not 5+5, but 5+5 (original writes ) and then another 5+5 (the replica writes), so you have 90 reads and 20 writes. So while the load on each system was reduced, the ratio of writes and reads has increased. This not only changes the IO patterns, but most importantly it changes the concurency pattern of the load. Add a third system and you'll have 90 reads and 30 writes and so on and so forth. Soon you'll have more writes than reads and the replication update latency combined with the concurency issues and merge conflicts will derail your project. The gist of it is that the 'soon' is much sooner than you expect. Is soon enough to justify looking into scale up instead, since you're talking a scale out of 6-8 peers at best anyway, and 6-8 times capacity increase using scale up will be faster, much more simpler and possible even cheaper to start with.

And keep in mind that all these are just purely theorethical numbers. In practice what happens is that the replication infrastructure is not free, it adds its own load on the system. Writes needs to be tracked, changes have to be read, a distributor has to exists to store changes until distributed to subscribers, then changes have to be writes and mediated for possible conflicts. That's why I've seen very few deployments that could claim success with a replication based scale out strategy.

One alternative is to scale out only reads and here replication does work, usualy using transactional replication, but so does log-shipping or mirroring with a database snapshot.

The real alternative is partitioning (ie. sharding). Requests are routed in the application to the proper partition and land on the server containig the appropiate data. Changes on one partiton that need to be reflected on another partition are shipped via asynchronous (usually messaging based) means. Data can only be joined within a partition. For a more detailed discussion of what I'm talking about, read how MySpace does it. Needless to say, such a strategy has a major impact on the application design and cannot be simply glued in after v1.

回复收藏 0 原文

~没有更多了~