如何设置新的 SQL Server 数据库以允许将来可能的复制?
我正在构建一个系统,该系统可能需要支持 500 多个并发用户,每个用户每分钟进行数十次查询(选择、插入和更新)。基于这些要求和具有数百万行的表,我怀疑将来需要使用数据库复制来减少一些查询负载。
过去没有使用过复制,我想知道在模式设计中是否需要考虑什么?
例如,我曾经被告知必须使用 GUID 作为主键才能启用复制。这是真的吗?
对于将要复制的数据库,数据库设计有哪些特殊注意事项或最佳实践?
由于项目的时间限制,我不想在不需要时实施复制,从而浪费任何时间。 (我目前有足够多的明确问题需要克服,而不必担心必须解决可能的问题。)但是,当/如果将来需要复制时,我不想进行可能可以避免的模式更改。
关于这个主题的任何其他建议,包括学习如何实现复制的好地方,也将不胜感激。
I'm building a system which has the potential to require support for 500+ concurrent users, each making dozens of queries (selects, inserts AND updates) each minute. Based on these requirements and tables with many millions of rows I suspect that there will be the need to use database replication in the future to reduce some of the query load.
Having not used replication in the past, I am wondering if there is anything I need to consider in the schema design?
For instance, I was once told that it is necessary to use GUIDs for primary keys to enable replication. Is this true?
What special considerations or best practices for database design are there for a database that will be replicated?
Due to time constraints on the project I don't want to waste any time by implementing replication when it may not be needed. (I have enough definite problems to overcome at the moment without worrying about having to solve possible ones.) However, I don't want to have to make potentially avoidable schema changes when/if replication is required in the future.
Any other advice on this subject, including good places to learn about implementing replication, would also be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
虽然每行都必须有一个
rowguid
列,但您不需要使用 Guid 作为主键。实际上,您甚至不需要拥有主键(尽管您会因为未能创建主键而被石头砸死)。即使您将主键定义为 guid,不将其设为rowguid
列也会导致 Replication Services 为您创建一个附加列。您绝对可以这样做,这不是一个坏主意,但它绝不是必要的,也不是特别有利。以下是一些提示:
While every row must have a
rowguid
column, you are not required to use a Guid for your primary key. In reality, you aren't even required to have a primary key (though you will be stoned to death for failing to create one). Even if you define your primary key as a guid, not making it therowguid
column will result in Replication Services creating an additional column for you. You definitely can do this, and it's not a bad idea, but it is by no means necessary nor particularly advantageous.Here are some tips:
您可能希望使用 GUID 作为主键 - 在复制系统中,行在整个拓扑中必须是唯一的,而 GUID PK 是实现此目的的一种方法。
这是一篇关于在 SQL Server 中使用 GUID 的简短文章
You may want to use GUIDs for primary keys - in a replicated system rows must be unique throughout your entire topology, and GUID PKs is one way of achieving this.
Here's a short article about use of GUIDs in SQL Server
我想说你真正的问题不是如何处理复制,而是如何处理横向扩展,或者至少横向扩展以实现可查询性。虽然这个难题有多种答案,但有一个答案很突出:不使用复制。
复制(特别是合并复制)的问题在于复制中的写入会成倍增加。假设您的系统每秒处理 100 个查询(90 个读取和 10 个写入)的负载。您想要横向扩展并选择复制。现在您有 2 个系统,每个系统处理 50 个查询、45 个读取和 5 个写入。每个。现在必须复制这些写入,因此实际写入次数不是 5+5,而是 5+5(原始写入),然后是另一个 5+5(副本写入),因此您有 90 次读取和 20 次写入。因此,虽然每个系统的负载减少了,但写入和读取的比率却增加了。这不仅改变了 IO 模式,最重要的是它改变了负载的并发模式。添加第三个系统,您将有 90 次读取和 30 次写入,依此类推。很快您的写入次数就会多于读取次数,复制更新延迟加上并发问题和合并冲突将使您的项目脱轨。其要点是“很快”比你预期的要早得多。很快就足以证明考虑扩大规模是合理的,因为无论如何,您所说的是最多 6-8 个对等节点的规模,并且使用扩大规模增加 6-8 倍的容量会更快、更简单,甚至可能更便宜开始于。
请记住,所有这些都只是纯理论数字。实际上,复制基础设施并不是免费的,它会在系统上增加自己的负载。写入需要被跟踪,更改必须被读取,分发者必须存在来存储更改,直到分发给订阅者,然后更改必须被写入并调解可能的冲突。这就是为什么我看到很少有部署可以通过基于复制的横向扩展策略取得成功。
一种替代方案是仅扩展读取,这里复制确实有效,通常使用事务复制,但日志传送或使用数据库快照进行镜像也是如此。
真正的替代方案是分区(即分片)。请求在应用程序中路由到正确的分区并到达包含适当数据的服务器。一个分区上需要反映到另一分区上的更改是通过异步(通常基于消息传递)方式传送的。数据只能在分区内连接。有关我所讨论内容的更详细讨论,请阅读 MySpace 是如何做到的。不用说,这样的策略对应用程序设计有重大影响,不能简单地粘在 v1 之后。
I'd say your real question is not how to handle replication, but how to handle scale out, or at least scale out for queryability. And while there are various answers to this conundrum, one answer will stand out: not using replication.
The problem with replication, specially with merge replication, is that writes gets multiplied in replication. Say you have a system which handles a load of 100 queries (90 reads and 10 writes) per second. You want to scale out and you choose replication. Now you have 2 systems, each handling 50 queries, 45 reads and 5 writes each. Now those writes have to be replicated so the actual number of writes is not 5+5, but 5+5 (original writes ) and then another 5+5 (the replica writes), so you have 90 reads and 20 writes. So while the load on each system was reduced, the ratio of writes and reads has increased. This not only changes the IO patterns, but most importantly it changes the concurency pattern of the load. Add a third system and you'll have 90 reads and 30 writes and so on and so forth. Soon you'll have more writes than reads and the replication update latency combined with the concurency issues and merge conflicts will derail your project. The gist of it is that the 'soon' is much sooner than you expect. Is soon enough to justify looking into scale up instead, since you're talking a scale out of 6-8 peers at best anyway, and 6-8 times capacity increase using scale up will be faster, much more simpler and possible even cheaper to start with.
And keep in mind that all these are just purely theorethical numbers. In practice what happens is that the replication infrastructure is not free, it adds its own load on the system. Writes needs to be tracked, changes have to be read, a distributor has to exists to store changes until distributed to subscribers, then changes have to be writes and mediated for possible conflicts. That's why I've seen very few deployments that could claim success with a replication based scale out strategy.
One alternative is to scale out only reads and here replication does work, usualy using transactional replication, but so does log-shipping or mirroring with a database snapshot.
The real alternative is partitioning (ie. sharding). Requests are routed in the application to the proper partition and land on the server containig the appropiate data. Changes on one partiton that need to be reflected on another partition are shipped via asynchronous (usually messaging based) means. Data can only be joined within a partition. For a more detailed discussion of what I'm talking about, read how MySpace does it. Needless to say, such a strategy has a major impact on the application design and cannot be simply glued in after v1.