Windows下少量数据的分布式和复制数据存储

发布于 2024-11-07 06:41:50 字数 1145 浏览 1 评论 0原文

我们正在寻找缓存问题的良好解决方案。我们希望在 Web 服务器集群中分发相对少量的数据（可能是 10 GB），以便：

数据被复制到所有节点
数据是持久的
数据可以在本地访问

我们采用缓存解决方案的动机问题是我们目前存在单点故障：SQL Server 数据库。不幸的是，我们无法为此数据库设置故障转移集群。我们已经在很大程度上使用 Memcached，但我们希望避免这样的问题：如果 Memcached 节点发生故障，我们会突然出现大量缓存未命中，从而遇到对一个端点的大量请求。

相反，我们更愿意在每个 Web 服务器节点上拥有本地持久缓存，以便分配生成的负载。当进行检索时，它将通过以下步骤：

检查 Memcached 中的数据。如果不存在...
检查本地持久存储中的数据。如果不存在...
从数据库中检索数据。

当数据发生变化时，两个缓存层的缓存键都会失效。

我们一直在寻找一堆潜在的解决方案，但似乎没有一个能够完全满足我们的需求：

CouchDB

这非常接近；我们想要缓存的数据模型是非常面向文档的。然而，它的复制模型并不正是我们正在寻找的。在我看来，复制是您必须执行的操作，而不是节点之间的永久关系。您可以设置连续复制，但这在重新启动之间不会持续存在。

Cassandra

这个解决方案似乎主要面向那些有大量存储需求的人。我们有大量的用户，但数据量却很小。 Cassandra 看起来能够支持 n 个故障转移节点，但节点之间 100% 的复制似乎并不是它的初衷；相反，它似乎更适合于分销。

SAN

一个有吸引力的想法是我们可以在 SAN 或类似类型的设备上存储一堆文件。我以前没有使用过这些，但看起来这仍然是一个单点故障；如果 SAN 出现故障，我们会突然访问数据库以查找所有缓存未命中情况。

DFS 复制

一个简单的 Google 搜索就揭示了这一点。它似乎做我们想做的事；它在复制集群中的所有节点之间同步文件。但营销文字让它看起来更像是一个确保文档复制到不同办公地点的系统。此外，它还有一些限制，例如文件计数最大值，这对我们来说效果不佳。

你们中是否有人有与我们类似的需求并找到了满足您需求的良好解决方案？

原文

We're looking for a good solution to a caching problem. We'd like to distribute a relatively small amount of data (perhaps 10's of GBs) among a cluster of web servers such that:

The data is replicated to all nodes
The data is persistent
The data can be accessed locally

Our motivation for a caching solution is that we currently have a single point of failure: a SQL Server database. We're unable to set up a fail-over cluster for this database, unfortunately. We're already using Memcached to a large extent, but we want to avoid the problem where if a Memcached node goes down, we'd suddenly have a large amount of cache misses and therefore experience a massive amount of requests to one endpoint.

We'd prefer instead to have local persistent caches on each web server node so that the resulting load would be distributed. When a retrieval is made, it would pass through the following:

Check for data in Memcached. If it's not there...
Check for data in local persistent storage. If it's not there...
Retrieve data from the database.

When data changes, the cache key is invalidated at both caching layers.

We've been looking at a bunch of potential solutions, but none of them seem to match exactly what we need:

CouchDB

This is pretty close; the data model we'd like to cache is very document-oriented. However, its replication model isn't exactly what we're looking for. It seems to me as though replication is an action you have to perform rather than a permanent relationship among nodes. You can set up continuous replication, but this doesn't persist between restarts.

Cassandra

This solution seems to be mostly geared toward those with large storage requirements. We have a large amount of users, but small amounts of data. Cassandra looks to be able to support n number of fail-over nodes, but 100% replication among nodes doesn't seem to be what it's intended for; instead, it seems more geared toward distribution only.

SAN

One attractive idea is that we can store a bunch of files on a SAN or similar type of appliance. I haven't worked with these before, but it seems like this would still be a single point of failure; if the SAN goes down, we'd suddenly be going to the database for all cache misses.

DFS Replication

A simple Google search revealed this. It seems to do what we want; it synchronizes files across all nodes in a replication cluster. But the marketing text makes it look like it's more of a system for ensuring documents are copied to different office locations. Also, it has limits, like a file count maximum, that wouldn't work well for us.

Have any of you had similar requirements to ours and found a good solution that meets your needs?

分享到QQ

分享到微博