分布式系统中的数据同步

发布于 2024-12-03 21:19:41 字数 253 浏览 4 评论 0原文

我们有一个基于 Restlet 框架构建的基于 REST 的应用程序,支持 CRUD 操作。它使用本地文件来存储数据。

现在的要求是将此应用程序部署在多个虚拟机上,并且一台虚拟机中的任何更新操作都需要传播到其他虚拟机上运行的其他应用程序实例。

我们解决这个问题的想法是,当给定虚拟机中发生更新操作时,发送多个 POST 消息(到所有其他应用程序)。 这里的假设是每个应用程序都有所有其他应用程序的列表/URL。

有更好的方法来解决这个问题吗?

We have an REST-based application built on the Restlet framework which supports CRUD operations. It uses a local-file to store the data.

Now the requirement is to deploy this application on multiple VMs and any update operation in one VM needs to be propagated other application instances running on other VMs.

Our idea to solve this was to send multiple POST msgs (to all other applications) when a update operation happens in a given VM.
The assumption here is that each application has a list/URLs of all other applications.

Is there a better way to solve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清泪尽 2024-12-10 21:19:41

一致性是一个深入的话题,很难做到正确。当同一数据几乎同时发生两个更改时,就会出现麻烦:冲突的更新可能会以一种顺序到达一台服务器,而在另一台服务器上则以另一种顺序到达。这是一个问题,因为两台服务器不再就数据内容达成一致,并且不清楚谁是“正确的”。

简而言之:获取您最喜欢的 RDBMS(例如,mysql 很流行)并连接您的应用程序服务器所谓的三层模型。请务必在事务中执行复杂的更新,这将提供可接受的一致性模型。

长话短说:三层模型非常适合中小型网站/服务。你最终会发现单个数据库成为瓶颈。对于读取流量远大于写入流量的服务,常见的优化是创建单主、多从数据库复制安排,其中所有写入都转到单个主服务器(与非分布式事务保持一致所需),但是更常见的读取可以发送到任何读取从属设备。

对于具有均匀混合读/写流量的服务,放弃正式 SQL 提供的一些便利(以及附带的限制),转而使用最近出现的各种“nosql”数据存储之一,可能会更好。它们的相对优点和对各种问题的适用性本身就是一个深刻的话题。

Consistency is a deep topic, and a hard thing to get right. The trouble comes when two nearly-simultaneous changes occur to the same data: conflicting updates can arrive in one order on one server, and in another order on another. This is a problem, since the two servers no longer agree on what the data is, and it isn't clear who is "right".

The short-story: get your favorite RDBMS (for example, mysql is popular) and have your app servers connect to in what is called the three-tier model. Be sure to perform complex updates in transactions, which will provide an acceptable consistency model.

The long-story: The three-tier model serves well for small-to-medium scale web sites/services. You will eventually find that the single database becomes the bottleneck. For services whose read traffic is substantially larger than write traffic, a common optimization is to create a single-master, many-slave database replication arrangement, where all writes go to the single master (required for consistency with non-distributed transactions), but the more-common reads could go to any of the read slaves.

For services with evenly-mixed read/write traffic, you may be better served by dropped some of the conveniences (and accompanying restrictions) that formal SQL provides and instead use of one of the various "nosql" data stores that have recently emerged. Their relative merits and fitness for various problems is a deep topic in itself.

墨落成白 2024-12-10 21:19:41

目前我可以看到 7 个主要选项。您应该了解更多详细信息并确定设施/权衡是否适合您的目的

  1. 在通用 RDBMS 上执行 CRUD 操作。最简单且最一致
  2. 在通用 RDBMS 上执行 CRUD 操作,该 RDBMS 运行速度与内存 RDBMS 一样快。例如 Oracle 的 TimesTen 等
  3. 在分布式缓存或您自己的自制分布式哈希表上执行 CRUD,这可以保证同步 例如 Hazelcast/ehcache 等
  4. 使用快速通用状态服务器(如 REDIS/memcached)并执行更新
    以同步方式对其进行操作,并在需要时以惰性方式将成功的操作写入数据库。
  5. 分布式 REST 服务器,以便单个实体上的 CRUD 操作仅由单个主服务器执行。完成此操作后,可以使用可靠的消息总线或在底层运行并相当快地同步所有更新的分布式数据库(例如 postgres)将有关更改的详细信息传达给其他人。
  6. 以最终一致性为目标,并使用像 Cassandra 这样的分布式数据存储,它可以让您目标您需要的一致性
  7. 使用分布式共识算法(如 Paxos 或 RAFT)或相同(推荐)的实现(如 Zookeeper 或 etcd),并分别获得您想要的项目的所有权在执行 CRUD 操作之前从每个 REST 服务器进行更改 - 虽然可能有点慢,但 Cassandra 可能会为您提供相同的内容。

I can see 7 major options for now. You should find out more details and decide whether the facilities / trade-offs are appropriate for your purpose

  1. Perform the CRUD operation on a common RDBMS. Simplest and most consistent
  2. Perform the CRUD operations on a common RDBMS which runs as fast in-memory RDBMS. eg TimesTen from Oracle etc
  3. Perform the CRUD on a distributed cache or your own home cooked distributed hash table which can guarantee synchronization eg Hazelcast/ehcache and others
  4. Use a fast common state server like REDIS/memcached and perform your updates
    in a synchronized manner on it and write out the successfull operations to a DB in a lazy manner if required.
  5. Distribute your REST servers such that the CRUD operations on a single entity are only performed by a single master. Once this is done, the details about the changes can be communicated to everyone else using a reliable message bus or a distributed database (eg postgres) that runs underneath and syncs all of your updates fairly fast.
  6. Target eventual consistency and use a distributed data store like Cassandra which lets you target the consistency you require
  7. Use distributed consensus algorithms like Paxos or RAFT or an implementation of the same(recommended) like zookeeper or etcd respectively and take ownership of the item you want to change from each REST server before you perform the CRUD operation - might be a bit slow though and same stuff is what Cassandra might give you.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文