分布式系统中的数据同步
我们有一个基于 Restlet 框架构建的基于 REST 的应用程序,支持 CRUD 操作。它使用本地文件来存储数据。
现在的要求是将此应用程序部署在多个虚拟机上,并且一台虚拟机中的任何更新操作都需要传播到其他虚拟机上运行的其他应用程序实例。
我们解决这个问题的想法是,当给定虚拟机中发生更新操作时,发送多个 POST 消息(到所有其他应用程序)。 这里的假设是每个应用程序都有所有其他应用程序的列表/URL。
有更好的方法来解决这个问题吗?
We have an REST-based application built on the Restlet framework which supports CRUD operations. It uses a local-file to store the data.
Now the requirement is to deploy this application on multiple VMs and any update operation in one VM needs to be propagated other application instances running on other VMs.
Our idea to solve this was to send multiple POST msgs (to all other applications) when a update operation happens in a given VM.
The assumption here is that each application has a list/URLs of all other applications.
Is there a better way to solve this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一致性是一个深入的话题,很难做到正确。当同一数据几乎同时发生两个更改时,就会出现麻烦:冲突的更新可能会以一种顺序到达一台服务器,而在另一台服务器上则以另一种顺序到达。这是一个问题,因为两台服务器不再就数据内容达成一致,并且不清楚谁是“正确的”。
简而言之:获取您最喜欢的 RDBMS(例如,mysql 很流行)并连接您的应用程序服务器所谓的三层模型。请务必在事务中执行复杂的更新,这将提供可接受的一致性模型。
长话短说:三层模型非常适合中小型网站/服务。你最终会发现单个数据库成为瓶颈。对于读取流量远大于写入流量的服务,常见的优化是创建单主、多从数据库复制安排,其中所有写入都转到单个主服务器(与非分布式事务保持一致所需),但是更常见的读取可以发送到任何读取从属设备。
对于具有均匀混合读/写流量的服务,放弃正式 SQL 提供的一些便利(以及附带的限制),转而使用最近出现的各种“nosql”数据存储之一,可能会更好。它们的相对优点和对各种问题的适用性本身就是一个深刻的话题。
Consistency is a deep topic, and a hard thing to get right. The trouble comes when two nearly-simultaneous changes occur to the same data: conflicting updates can arrive in one order on one server, and in another order on another. This is a problem, since the two servers no longer agree on what the data is, and it isn't clear who is "right".
The short-story: get your favorite RDBMS (for example, mysql is popular) and have your app servers connect to in what is called the three-tier model. Be sure to perform complex updates in transactions, which will provide an acceptable consistency model.
The long-story: The three-tier model serves well for small-to-medium scale web sites/services. You will eventually find that the single database becomes the bottleneck. For services whose read traffic is substantially larger than write traffic, a common optimization is to create a single-master, many-slave database replication arrangement, where all writes go to the single master (required for consistency with non-distributed transactions), but the more-common reads could go to any of the read slaves.
For services with evenly-mixed read/write traffic, you may be better served by dropped some of the conveniences (and accompanying restrictions) that formal SQL provides and instead use of one of the various "nosql" data stores that have recently emerged. Their relative merits and fitness for various problems is a deep topic in itself.
目前我可以看到 7 个主要选项。您应该了解更多详细信息并确定设施/权衡是否适合您的目的
以同步方式对其进行操作,并在需要时以惰性方式将成功的操作写入数据库。
I can see 7 major options for now. You should find out more details and decide whether the facilities / trade-offs are appropriate for your purpose
in a synchronized manner on it and write out the successfull operations to a DB in a lazy manner if required.