分布式系统中用于共识的更快的 Paxos 相关算法有哪些?
我读过 Lamport 的论文帕克索斯。我还听说,由于性能原因,它在实践中使用不多。分布式系统中常用的共识算法有哪些?
I've read Lamport's paper on Paxos. I've also heard that it isn't used much in practice, for reasons of performance. What algorithms are commonly used for consensus in distributed systems?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
使用 Multi-Paxos,当领导者正在驰骋时,当它听到大多数节点已将值写入磁盘时,它可以响应客户端写入。这是您可以保持 Paxos 所做的一致性保证的最好和最有效的方法。
通常,人们使用类似 paxos 的东西,例如 Zookeeper 作为外部服务(专用集群)来保持关键信息一致(谁锁定了什么、谁是领导者、谁在集群中、集群的配置是什么),然后运行不太严格的算法,一致性保证较少,依赖于应用程序细节(例如矢量时钟和合并的同级)。简短的电子书分布式系统的乐趣和利润作为一本好书替代方案概述。
请注意,许多数据库通过使用有风险的默认值来竞争速度,这会带来一致性风险,并且可能会在网络分区下丢失数据。 Jepson 上的 Aphry 博客系列 显示了众所周知的 opensouce 系统是否存在松散数据。 CAP 定理无法被欺骗;如果您为了安全而配置系统,那么它们最终会执行与 paxos 相同的消息传递和相同的磁盘写入操作。所以实际上你不能说 Paxos 很慢,你必须说“系统的一部分需要在网络分区下保持一致性,每个操作需要最少数量的消息和磁盘刷新,这很慢”。
With Multi-Paxos when the leader is galloping it can respond to the client write when it has heard that the majority of nodes have written the value to disk. This is as good and efficient as you can get to maintain the consistency guarantees that Paxos makes.
Typically though people use something paxos-like such as zookeeper as an external service (dedicated cluster) to keep critical information consistent (who has locked what, who is leader, who is in a cluster, what's the configuration of the cluster) then run a less strict algorithm with less consistency guarantees which relies upon application specifics (eg vector clocks and merged siblings). The short ebook distributed systems for fun and profit as a good overview of the alternatives.
Note that lots of databases compete on speed by using risky defaults which risk consistency and can loose data under network partitions. The Aphry blog series on Jepson shows whether well know opensouce systems loose data. One cannot cheat the CAP Theorem; if you configure systems for safety then they end up doing about the same messaging and same disk writes as paxos. So really you cannot say Paxos is slow you have to say "a part of a system which needs consistency under network partitions requires a minimum number of messages and disk flushes per operation and that is slow".
有两种通用的区块链共识系统:
验证者
依赖高概率的最终结果。
第一代区块链共识算法(工作量证明、股权证明和比特股的委托股权证明)仅提供随着时间的推移而增长的高确定性概率。理论上,有人可以支付足够的钱来挖掘替代的“更长”的比特币区块链,该区块链可以一直追溯到创世。
最近的共识算法,无论是 HashGraph、Casper、Tendermint 还是 DPOS BFT,都采用了 Paxos 和相关共识算法长期确立的原理。在这些模型下,只要超过 2/3 的参与者是诚实的,就可以在所有网络条件下达到明确的最终结果。
对于所有希望支持区块链间通信的区块链来说,客观且明确的 100% 最终确定性是一个关键属性。如果没有 100% 的最终确定性,一条链上的逆转可能会对所有互连链产生不可调和的连锁反应。
这些较新协议的抽象协议涉及:
(承诺)
协议中的
技术差异对用户体验产生了实际影响。这包括最终确定之前的延迟、最终确定程度、带宽以及证明生成/验证开销等。
此处
There are two general blockchain consensus systems:
validators
rely on high probability of finality.
The first generation blockchain consensus algorithms (Proof of Work, Proof of Stake, and BitShares’ Delegated Proof of Stake) only offer high probability of finality that grows with time. In theory someone could pay enough money to mine an alternative “longer” Bitcoin blockchain that goes all the way back to genesis.
More recent consensus algorithms, whether
HashGraph, Casper, Tendermint, or DPOS BFT
all adopt long-established principles ofPaxos
and related consensus algorithms. Under these models it is possible to reach unambiguous finality under all network conditions so long as more than ⅔ of participants are honest.Objective and unambiguous 100% finality is a critical property for all blockchains that wish to support inter-blockchain communication. Absent 100% finality, a reversion on one chain could have irreconcilable ripple effects across all interconnected chains.
The abstract protocol for these more
recent protocols
involves:(commitment)
are bad and evidence of bad behavior is available to all
It is the technical differences in the protocols that give rise to real-world impact on user experience. This includes things such as latency until finality, degrees of finality, bandwidth, and proof generation / validation overhead.
Look for more details on delegated proof of stake by eos here
Raft 是 Paxos 的更易于理解、更快的替代方案。使用 Raft 的最流行的分布式系统之一是 Etcd。 Etcd 是 Kubernetes 中使用的分布式存储。
在容错方面相当于Paxos。
Raft is more understandable, and faster alternative of Paxos. One of the most popular distributed systems which uses Raft is Etcd. Etcd is the distributed store used in Kubernetes.
It's equivalent to Paxos in fault-tolerance.
不确定这是否有帮助(因为这不是来自实际生产信息),但在我们的“分布式系统”课程中,我们与 Paxos 一起研究了 Chandra-Toueg 和 Mostefaoui -Raynal算法(我们的教授特别喜欢后者)。
Not sure if this is helpful (since this is not from actual production information), but in our "distributed systems" course we've studied, along with Paxos, the Chandra-Toueg and Mostefaoui-Raynal algorithms (of the latter our professor was especially fond).
查看 Raft 算法以获得共识算法,该算法经过优化,易于理解且实现清晰。哦……速度也蛮快的。
https://ramcloud.stanford.edu/wiki/display/logcabin/LogCabin
https://ramcloud.stanford.edu/wiki/download/attachments/11370504 /筏.pdf
Check out the Raft algorithm for a consensus algorithm that is optimized for ease of understanding and clarity of implementation. Oh... it is pretty fast as well.
https://ramcloud.stanford.edu/wiki/display/logcabin/LogCabin
https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf
如果性能是一个问题,请考虑您是否需要 Paxos 为您提供的所有强一致性保证。参见例如 http://queue.acm.org/detail.cfm?id=1466448< /a> 和 http://incubator.apache.org/cassandra/。搜索 Paxos 优化会得到一些结果,但我怀疑放宽一些要求比调整协议更能给你带来好处。
If performance is an issue, consider whether you need all of the strong consistency guarantees Paxos gives you. See e.g. http://queue.acm.org/detail.cfm?id=1466448 and http://incubator.apache.org/cassandra/. Searching on Paxos optimised gets me hits, but I suspect that relaxing some of the requirements will buy you more than tuning the protocol.
我运行的 Paxos 系统(支持非常非常大的网站)介于 Basic-Paxos Multi-paxos 之间。
我计划将其迁移到完整的 Multi-Paxos 实现。Paxos 作为高吞吐量数据存储系统并不是那么出色,但它擅长通过提供领导者选举来支持这些系统。例如,假设您有一个复制数据存储,出于性能原因您需要单个主数据库。您的数据存储节点将使用 Paxos 系统来选择主节点。
与 Google Chubby 一样,我的系统作为服务运行,也可以将数据存储为配置容器。 (我宽松地使用配置;我听说 Google 使用 Chubby 作为 DNS。)此数据不会像用户输入那样频繁更改,因此不需要高吞吐量写入 SLA。另一方面,读取速度非常快,因为它是完全复制的,并且您可以从任何节点读取。
更新
自从写这篇文章以来,我升级了我的 Paxos 系统。我现在使用链共识协议作为主要共识系统。链系统仍然利用 Basic-Paxos 进行重新配置,包括在链成员资格发生变化时通知链节点。
The Paxos system I run (which supports really, really big web sites) is halfway in-between Basic-Paxos Multi-paxos.
I plan on moving it to a full Multi-Paxos implementation.Paxos isn't that great as a high-throughput data storage system, but it excels in supporting those systems by providing leader election. For example, say you have a replicated data store where you want a single master for performance reasons. Your data store nodes will use the Paxos system to choose the master.
Like Google Chubby, my system is run as a service and can also store data as configuration container. (I use configuration loosely; I hear Google uses Chubby for DNS.) This data doesn't change as often as user input so it doesn't need high throughput write SLAs. Reading, on the other hand, is extremely quick because it is fully replicated and you can read from any node.
Update
Since writing this, I have upgraded my Paxos system. I am now using a chain-consensus protocol as the primary consensus system. The chain system still utilizes Basic-Paxos for re-configuration—including notifying chain nodes when the chain membership changes.
就共识协议的性能而言,Paxos 是最优,至少在网络延迟数量(这通常是主导因素)方面是如此。如果在客户端请求之间没有与至少 (f-1) 个其他节点进行单次往返通信,则显然不可能在容忍最多 f 个故障的同时可靠地达成共识以及相应的确认,Paxos 达到了这个下界。无论实现如何,这都对基于共识的协议的每个请求的延迟给出了严格限制。特别是,Raft、Zab、Viewstamped Replication 和共识协议的所有其他变体都具有相同的性能约束。
标准 Paxos(还有 Raft、Zab 等)可以改进的一件事是,有一个杰出的领导者最终完成的工作超出了其应得的工作份额,因此可能最终成为一个瓶颈。有一种名为 Egalarian Paxos 的协议,它将负载分散到多个领导者之间,尽管在我看来,它非常复杂,但仅适用于某些领域,并且仍然必须遵守每个请求中往返次数的下限。有关更多详细信息,请参阅 Moraru 等人的论文“平等主义议会中存在更多共识”。
当你听说 Paxos 由于性能不佳而很少使用时,通常意味着共识本身由于性能不佳而很少使用,这是一个公平的批评:它有可能实现更高的目标如果您可以尽可能避免节点之间基于共识的协调,则可以提高性能,因为这可以实现水平可扩展性。
讽刺的是,通过声称使用正确的共识协议但实际上做了一些在某些情况下会失败的事情也有可能获得更好的性能。 Aphyr 的博客 中充斥着这些失败的例子,这些失败并不像您想象的那么罕见,数据库实现有要么通过“优化”的方式将错误引入良好的类似共识的协议,要么开发出自定义的类似共识的协议,但在某些微妙的方式下无法完全正确。这东西很难。
Paxos is optimal in terms of performance of consensus protocols, at least in terms of the number of network delays (which is often the dominating factor). It's clearly not possible to reliably achieve consensus while tolerating up to f failures without a single round-trip communication to at least (f-1) other nodes in between a client request and the corresponding confirmation, and Paxos achieves this lower bound. This gives a hard bound on the latency of each request to a consensus-based protocol regardless of implementation. In particular, Raft, Zab, Viewstamped Replication and all other variants on consensus protocols all have the same performance constraint.
One thing that can be improved from standard Paxos (also Raft, Zab, ...) is that there is a distinguished leader which ends up doing more than its fair share of the work and may therefore end up being a bit of a bottleneck. There is a protocol known as Egalitarian Paxos which spreads the load out across multiple leaders, although it's mindbendingly complicated IMO, is only applicable to certain domains, and still must obey the lower bound on the number of round-trips within each request. See the paper "There Is More Consensus in Egalitarian Parliaments" by Moraru et al for more details.
When you hear that Paxos is rarely used due to its poor performance, it is frequently meant that consensus itself is rarely used due to poor performance, and this is a fair criticism: it is possible to achieve much higher performance if you can avoid the need for consensus-based coordination between nodes as much as possible, because this allows for horizontal scalability.
Snarkily, it's also possible to achieve better performance by claiming to be using a proper consensus protocol but actually doing something that fails in some cases. Aphyr's blog is littered with examples of these failures not being as rare as you might like, where database implementations have either introduced bugs into good consensus-like protocols by way of "optimisation", or else developed custom consensus-like protocols that fail to be fully correct in some subtle fashion. This stuff is hard.
您应该检查 Apache Zookeeper 项目。 Yahoo! 在生产中使用它。和 Facebook 等。
http://hadoop.apache.org/zookeeper/
如果您寻找描述它的学术论文,它usenix ATC'10 的一篇论文中对此进行了描述。 DSN'11 的一篇论文描述了共识协议(Paxos 的变体)。
You should check the Apache Zookeeper project. It is used in production by Yahoo! and Facebook among others.
http://hadoop.apache.org/zookeeper/
If you look for academic papers describing it, it is described in a paper at usenix ATC'10. The consensus protocol (a variant of Paxos) is described in a paper at DSN'11.
Google 在以下论文中记录了他们如何为其大型商店实现快速 paxos:链接。
Google documented how they did fast paxos for their megastore in the following paper: Link.