可扩展的数据库系统,请批评
我正在寻找为我的网站后端构建可扩展的数据库解决方案。我最近一直在阅读有关数据库设计的内容,我似乎自己开发了一个可能可行的想法。我认为这是一种用同步数据维护 n 个数据库的新颖方法,但我可能是错的。所以我要求 SO 评估这个想法并告诉我它是否疯狂。 (或者如果它已经存在并已实现)
在此方案中有一组服务器节点。一个节点运行查询负载均衡器(我们称之为A),其余节点运行典型的数据库管理系统,我们将这些节点统称为N。
每个 N 都与其他 N 断开连接。即)N 中的节点不需要与任何其他节点通信。每个N仅与A有连接。
该过程的工作原理如下:
- 所有数据库查询都通过A传递。 (现在假设A具有无限的吞吐量和处理能力)
- A检查每个查询(Q)并确定它是否是一个将执行的操作。从数据库读取或将写入数据库的查询。 (在sql中,读将被选择,写将被更新)
- 如果Q是读操作,则转发 其转发给N中的一个节点,
- 如果Q是写操作,则将 将其转发给所有节点 N 中的节点
假设正确实现,这会导致 N 中的所有节点都具有同步的数据库内容。仅读取数据的查询需要发送到一个节点。
这个想法似乎对我来说特别有效,因为在我的系统中写操作很少,不到 1%。
那么关于这个想法的几个问题
- 从理论的角度来看,这样的方案有意义吗?
- 如果这确实有意义,是否有已经实施的商业或免费解决方案?
I'm looking to build a scalable database solution for the back end of my website. I've been reading about database design lately, and I seem to have developed an idea on my own that might work. I think this is a novel way of maintaining n databases with synchronized data, but I could be wrong. So I'm asking SO to evaluate the idea and tell me if it's crazy or not. (or if it already exists and is implemented)
In this scheme there are a group of server nodes. One node runs a query load balancer (Let's call it A) and the rest are running a typical dbms, let's call those nodes N collectively.
Each N is disconnected from the others. ie) a node in N doesn't need to communicate with any of the others. Each N has a connection to A only.
The process works like this
- All database queries are passed through A. (Let's assume for now A has infinite throughput and processing ability)
- A inspects each query (Q) and determines if it is an operation that will read from a database or a query that will write to a database. (in sql, read would be select and write would be update)
- If Q is a read operation, forward it to one of the nodes in N
- if Q is a write operation, forward it to all of the nodes in N
Assuming it's implemented properly, this results in all of the nodes in N having synchronized database content. Queries that are only reading data need to be sent to one node.
This idea seems to work especially well for me because in my system there are very few write operations, less than 1%.
So a few questions on this idea
- Does a scheme like this make sense from a theoretical point of view?
- If this does make sense, is there an already implemented solution either commercial or free?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不是对您问题的直接回答,但 SQL Server 2008 已经支持与您所描述的内容等效的功能。它称为对等事务复制。我确信其他 RDBMS 也是如此。我认为 MySQL 将其称为主主复制。
Not a direct answer to your question, but SQL Server 2008 already supports something equivalent to what you are describing. Its called Peer-to-Peer Transactional Replication. I'm sure the other RDBMS do as well. I think MySQL calls it master-master replication.
多读少写的典型设置是拥有一个读/写主数据库和 n 个只读的复制从数据库。复制由 RBDMS 处理。只读查询可以在所有 n 个只读节点之间进行负载平衡,如果您的读/写主机暂时关闭,至少您的应用程序将能够为读取操作提供服务。您不需要中央“A”代理来决定查询是读取还是写入。发出查询的客户端应该足够聪明,知道它是在读还是在写。这样您的“A”服务器就不会出现瓶颈。
您提出的设置有一个明显的缺陷,即如果您同时写入 n 个节点,如果其中一个或多个写入失败怎么办?
The typical setup for many reads few writes is to have a read/write master db and n replicated slave dbs which are read only. Replication is handled by the RBDMS. Read only queries can be load balanced across all your n read only nodes and if your read/write master goes down temporarily, at least your app will be able to service read operations. You don't need a central "A" proxy to decide if a query is a read or a write. The client issuing the query should be smart enough to know if it's reading or writing. That way you won't be bottlenecked on your "A" server.
Your proposed setup has the distinct flaw in that if you are simultaneously writing to n nodes, what if one or more of those writes fails?
您的方案仅适用于无限可用的节点。您将如何处理节点停机?如果节点由于任何原因而关闭并错过更新,则下次被请求时它将提供脏数据。
Your scheme only works with infinitely available nodes. How are you going to deal with node downtime? If a node is down for any reason and missed an update, it will serve dirty data next time is asked.