是否存在允许沿二维对数据进行分组的集群数据库?

发布于 2024-11-18 04:40:40 字数 616 浏览 6 评论 0原文

假设我有实体要存储。 ATM 将它们视为斑点就足够了。我希望将实体存储在集群上。实体的键/ID 是 (x,y) 整数坐标。所以它们基本上位于一个二维网格中。更新任何实体都需要锁定它的 4 个邻居。由于我想要冗余,所以我认为最好是使用冗余来确保邻居始终可用。分布如下:

   1  2  3  4  5  6
1 [F][F][E][E][G][G]
2 [F][F][E][E][G][G]
3 [D][D][A][A][B][B]
4 [D][D][A][A][B][B]
5 [H][H][C][C][I][I]
6 [H][H][C][C][I][I]

如果 A、B、C、D、E、F、G、H、I 是服务器,则 A 拥有 (3,3) 实体,并且它需要知道 (2,3 ) 和 (3,2) 属于其他服务器。以 4 个块排列,这总是留下属于其他服务器的两侧。使用三重冗余,我想强制所有邻居的本地副本。这实际上会给我带来线性可扩展性。

是否有一个数据库允许我定义分片/复制键,以便我可以指定这样的分布,或者是否有一种方法将 x 和 y 组合成可用于实现此目的的单个值?

我追求的是低延迟和冗余,而不是节省驱动器空间。我的实体具有“引用位置”属性;事务仅访问邻居,但对实体及其邻居使用相同的密钥将导致每个人都拥有相同的密钥。

Say I have entities to store. ATM it would be good enough to consider them blobs. I want the entities to be stored on a cluster. The key/ID of the entity is a (x,y) integer coordinate. So they are basically located in a two dimensional grid. Updating any entity requires locking it's 4 neighbors. Since I want redundancy, I thought that the best would be to use the redundancy to ensure that the neighbors are always available. Here is what the distribution could look like:

   1  2  3  4  5  6
1 [F][F][E][E][G][G]
2 [F][F][E][E][G][G]
3 [D][D][A][A][B][B]
4 [D][D][A][A][B][B]
5 [H][H][C][C][I][I]
6 [H][H][C][C][I][I]

If A,B,C,D,E,F,G,H,I are servers, then A owns the (3,3) entity, and it needs to know (2,3) and (3,2) which belong to other servers. Arranged in blocks of 4, this always leaves two sides belonging to other servers. Using triple redundancy, I want to force a local copy of all neighbors. This would gives me in effect linear scalability.

Is there a database which allows me to define the sharding/replication key such that I can specify such a distribution, or is there a way of combining x and y into a single value that could be used to achieve this?

What I'm after is low latency and redundancy, not saving drive space. My entities have a "locality of reference" property; transactions only ever access the neighbors, but using the same key for an entity and it's neighbors would result in everyone have the same key.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

幸福%小乖 2024-11-25 04:40:40

我知道 BLOB 数据的重复存储成本可能会很高,我们希望限制冗余。数据分片的好处在于,以非标准化的方式可以大大加快搜索能力和性能。

这只是一个想法,但也许您可以通过创建三个表以标准化的方式来解决这个问题:

坐标 - 列 (ID)、列 (Xcor)、列 (Ycor)

数据 - 列 - (ID)、列(校验和?),

坐标数据 - 列(坐标 ID)、列(数据 ID)

以坐标数据作为映射表。这通常对于索引或搜索来说并不理想,但是如果您存储了校验和字符串,则可以利用其他介质来存储和定位原始数据。

就像我说的只是一个想法。

I understand that duplication of BLOB data can be expensive storage wise, we want to limit redundancy. The benefit of sharding the data is that in an unnormalized way you can greatly speed up searching capabilities and performance.

This is just a thought but perhaps you could approach this in a normalized way instead by creating three tables:

Coordinates - Column (ID), Column (Xcor), Column (Ycor)

Data - Column - (ID), Column (Checksum?),

CoordinateData - Column (CoordinateID), Column (DataID)

With CoordinateData as a mapping table. This normally isn't ideal for indexes or searching however if you stored perhaps a checksum string, you could utilize some other medium for storing and locating raw data.

Like I said just an idea.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文