Windows下少量数据的分布式和复制数据存储
我们正在寻找缓存问题的良好解决方案。我们希望在 Web 服务器集群中分发相对少量的数据(可能是 10 GB),以便:
- 数据被复制到所有节点
- 数据是持久的
- 数据可以在本地访问
我们采用缓存解决方案的动机问题是我们目前存在单点故障:SQL Server 数据库。不幸的是,我们无法为此数据库设置故障转移集群。我们已经在很大程度上使用 Memcached,但我们希望避免这样的问题:如果 Memcached 节点发生故障,我们会突然出现大量缓存未命中,从而遇到对一个端点的大量请求。
相反,我们更愿意在每个 Web 服务器节点上拥有本地持久缓存,以便分配生成的负载。当进行检索时,它将通过以下步骤:
- 检查 Memcached 中的数据。如果不存在...
- 检查本地持久存储中的数据。如果不存在...
- 从数据库中检索数据。
当数据发生变化时,两个缓存层的缓存键都会失效。
我们一直在寻找一堆潜在的解决方案,但似乎没有一个能够完全满足我们的需求:
CouchDB
这非常接近;我们想要缓存的数据模型是非常面向文档的。然而,它的复制模型并不正是我们正在寻找的。在我看来,复制是您必须执行的操作,而不是节点之间的永久关系。您可以设置连续复制,但这在重新启动之间不会持续存在。
Cassandra
这个解决方案似乎主要面向那些有大量存储需求的人。我们有大量的用户,但数据量却很小。 Cassandra 看起来能够支持 n 个故障转移节点,但节点之间 100% 的复制似乎并不是它的初衷;相反,它似乎更适合于分销。
SAN
一个有吸引力的想法是我们可以在 SAN 或类似类型的设备上存储一堆文件。我以前没有使用过这些,但看起来这仍然是一个单点故障;如果 SAN 出现故障,我们会突然访问数据库以查找所有缓存未命中情况。
DFS 复制
一个简单的 Google 搜索就揭示了这一点。它似乎做我们想做的事;它在复制集群中的所有节点之间同步文件。但营销文字让它看起来更像是一个确保文档复制到不同办公地点的系统。此外,它还有一些限制,例如文件计数最大值,这对我们来说效果不佳。
你们中是否有人有与我们类似的需求并找到了满足您需求的良好解决方案?
We're looking for a good solution to a caching problem. We'd like to distribute a relatively small amount of data (perhaps 10's of GBs) among a cluster of web servers such that:
- The data is replicated to all nodes
- The data is persistent
- The data can be accessed locally
Our motivation for a caching solution is that we currently have a single point of failure: a SQL Server database. We're unable to set up a fail-over cluster for this database, unfortunately. We're already using Memcached to a large extent, but we want to avoid the problem where if a Memcached node goes down, we'd suddenly have a large amount of cache misses and therefore experience a massive amount of requests to one endpoint.
We'd prefer instead to have local persistent caches on each web server node so that the resulting load would be distributed. When a retrieval is made, it would pass through the following:
- Check for data in Memcached. If it's not there...
- Check for data in local persistent storage. If it's not there...
- Retrieve data from the database.
When data changes, the cache key is invalidated at both caching layers.
We've been looking at a bunch of potential solutions, but none of them seem to match exactly what we need:
CouchDB
This is pretty close; the data model we'd like to cache is very document-oriented. However, its replication model isn't exactly what we're looking for. It seems to me as though replication is an action you have to perform rather than a permanent relationship among nodes. You can set up continuous replication, but this doesn't persist between restarts.
Cassandra
This solution seems to be mostly geared toward those with large storage requirements. We have a large amount of users, but small amounts of data. Cassandra looks to be able to support n number of fail-over nodes, but 100% replication among nodes doesn't seem to be what it's intended for; instead, it seems more geared toward distribution only.
SAN
One attractive idea is that we can store a bunch of files on a SAN or similar type of appliance. I haven't worked with these before, but it seems like this would still be a single point of failure; if the SAN goes down, we'd suddenly be going to the database for all cache misses.
DFS Replication
A simple Google search revealed this. It seems to do what we want; it synchronizes files across all nodes in a replication cluster. But the marketing text makes it look like it's more of a system for ensuring documents are copied to different office locations. Also, it has limits, like a file count maximum, that wouldn't work well for us.
Have any of you had similar requirements to ours and found a good solution that meets your needs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我们已经在生产中成功使用 Riak 几个月来解决与您描述的问题有些相似的问题。我们之前也评估过 CouchDB 和 Cassandra。
在我看来,Riak 在此类问题上的优势在于分布式和数据复制是系统的核心。您可以定义集群中所需的数据副本数量,它会处理其余部分(当然,这比这要复杂一些,但这就是本质)。我们经历了添加节点、删除节点、压垮节点的过程,事实证明它具有令人惊讶的弹性。
它在其他方面很像 Couch - 面向文档、REST 接口、Erlang。
We've been using Riak successfully in production for several months now for a problem that's somewhat similar to what you describe. We too have evaluated CouchDB and Cassandra before.
The advantage of Riak in this sort of problems imo is that distribution and data replication are at the core of the system. You define how many replicas of the data across the cluster you want and it takes care of the rest (it's a bit more complicated than that of course, but that's the essence). We went through adding nodes, removing nodes, had nodes crush, and it's proven surprisingly resilient.
It's a lot like Couch in other matters - document oriented, REST interface, Erlang.
您可以查看hazelcast。
它不保留数据,但提供故障转移系统。每个节点可以有多个节点来备份其数据,以防节点发生故障。
You can check the hazelcast.
It does not persist the data but provides a fail-over system. Each node can have a number of nodes to backup it's data in case a node fails.