编写RDBMS时没有水平可扩展性是一个缺陷吗?还是所有 DBMS 都会发生这种情况?
当您从数据库中读取数据达到极限时,您有两种选择:通过在服务器中放置更多硬件来垂直扩展,或者通过放置第二个服务器来帮助卸载读取来进行水平扩展。
将读取卸载到第二台服务器意味着所有写入都将命中两台服务器,而只读则命中一台服务器。
问题是,当您的写入达到顶峰时,由于写入必须发生在所有服务器上,这意味着所有服务器都将因写入请求而过载,并且服务器将无法使用。添加更多服务器来解决问题并没有帮助,因为它只会添加更多服务器,从而导致过载。所以你必须垂直缩放。
这是 RDBMS 特有的东西吗?还是所有 DBMS 都会发生这种情况?
我知道你可以在软件方面做一些事情,并将数据库分成两部分,例如。所有条目在一个数据库中以 0-m 开头,而在另一个数据库中以 nz 开头,但恕我直言,这更多的是一种解决方法,而不是问题的解决方案。
When you hit a roof on reading from a database, you have two choices, scale vertically by putting more hardware in the server, or scale horizontally by putting a second server to help offload the reads.
Offloading reads to a second server, means that all writes will hit both servers, while read only hits one.
Problem is when you hit a roof with writing, since writing has to happen to all servers, it means that all servers will be overloaded with write requests, and the server comes unusable. Adding more servers to the problem doesn't help, since it only adds more servers that will be overloaded. So you have to scale vertically.
Is this something that is specific to RDBMS'? or is it something that happens with all DBMS'?
I know you can do things on software side, and split the database in two, eg. all entries starting with 0-m in one db while n-z in another, but IMHO it is more of a workaround than a solution to the problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我看不出这特定于关系模型。所有必须读写的数据库(这是大多数)都会遇到类似的问题。
就其价值而言,大多数数据库的读取次数远远多于写入次数,因此写屋顶发生的频率比您想象的要低。此外,根据您的方法,负载平衡数据库往往是立即写入主数据库,并对所有辅助数据库进行排队写入(至少根据我的经验)。
在这种情况下,您实际上并没有作为用户等待多次写入,而只是等待第一次写入。 DBMS 本身管理实例之间的同步。这当然意味着辅助数据库可能不是完全最新的,但这是可以控制的。从技术上讲,这破坏了整个系统的 ACID 属性,但可以围绕它进行架构设计。
I can't see that this would be specific to the relational model. All databases that have to read and write (and that's most of them) will have a similar problem.
For what it's worth, most databases are read far more than written so the write roof occurs less frequently than you might think. In addition, load balancing databases as per your method tends to be an immediate write to the primary with queued writes to all secondaries (at least in my experience).
In that case, you're not actually waiting around for multiple writes as a user, you just wait for the first. The DBMS itself manages the synchronisation between instances. This of course means that secondary databases might not be totally up-to-date but this can be controlled. Technically, this breaks the ACID properties of the system as a whole but this can be architected around.
我认为任何 DBMS 都是如此,尽管有些 DBMS 处理得比其他 DBMS 更好。正如您提到的,在软件中对数据库进行分区似乎是最常见的解决方案。
但在许多应用程序中,如果规模如此之大以至于有必要,那么对数据库进行分区是有意义的。例如,如果您有一个社交网络应用程序,那么按国家或其他地理区域对数据库进行分区可能是有意义的。这将使您的服务器在地理位置上靠近它们所服务的区域。它还将有助于缓解跨数据库“社交图”的任何问题,因为人们的朋友往往住在附近。
I think this is the case with any DBMS, although some handle it better than others. Like you mention, partitioning the database in software seems to be the most common solution to this.
In many applications though, partitioning the database like that makes sense anyways if you are at such a huge scale that it becomes necessary. For example, if you had a social networking app, it would probably make sense to partition your database by country or other geographical regions. This would allow you to have your servers located geographically close to the regions they serve. It would also help mitigate any problems with a cross-database "social graph" since peoples friends tend to live nearby.
您几乎不会“因写入而达到顶峰,因为写入必须发生在所有服务器上”,因为在大多数 RDBMS 安装中:
1) 读取比写入更加频繁
2) 现代 RDBM 具有多版本并发控制,能够减少读/写时的阻塞
You're hardly going to "hit a roof with writing, since writing has to happen to all server" because in most of RDBMS installations:
1) Reads are overwhelming more frequent than writes
2) Modern RDBMs have Multi-Version Concurrency Control able to reduce blocking when reading/writing