可通过 Key-Value 访问的复杂集合的外部存储

发布于 2024-11-15 16:51:14 字数 864 浏览 2 评论 0 原文

问题

我需要一个可以存储以下形式的值的键值存储：

DS<DS<E>>

其中数据结构DS可以是 List、SortedSet 或 Array

和E可以是String或字节数组。

生成这些数据的成本非常高，因此一旦我将其放入存储中，我只会对其执行读取查询。本质上，它是一个没有驱逐的复杂对象缓存。

示例应用程序

一个（可能不好，但足以澄清）应用程序示例正在存储文档中的标记化句子，您需要能够快速访问文档中的第 q 个单词给定 documentID 的第 p 句。在这种情况下，我会将其存储为 KV 对，如下所示：

K - docID
V - List<List<String>>
String word = map.get(docID).get(p).get(q);

我更喜欢避免应用程序集成的 Map 解决方案（例如 Java 中的 EhCache）。

我曾使用过 Redis，但它似乎不支持第二层数据结构复杂性。还有其他可以帮助我的用例的 KV 解决方案吗？

更新：

我知道我可以序列化/反序列化我的对象，但我想知道是否还有其他解决方案。

原文

Problem

I need a key-value store that can store values of the following form:

DS<DS<E>>

where the data structure DS can be
either a List, SortedSet or an Array

and E can be either a String or byte-array.

It is very expensive to generate this data and so once I put it into the store, I will only perform read queries on it. Essentially it is a complex object cache with no eviction.

Example Application

A (possibly bad, but sufficient to clarify) example of an application is storing tokenized sentences from a document where you need to be able to quickly access the qth word of the pth sentence given documentID. In this case, I would be storing it as a K-V pair as follows:

K - docID
V - List<List<String>>
String word = map.get(docID).get(p).get(q);

I prefer to avoid app-integrated Map solutions (such as EhCache within Java).

I have worked with Redis but it doesn't appear to support the second layer of data-structure complexity. Any other K-V solutions that can help my use case?

Update:

I know that I could serialize/deserialize my object but I was wondering if there is any other solution.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

七分※倦醒 2024-11-22 16:51:14

在平台选择方面，您有两个选择 - 完整的文档数据库将支持任意复杂的对象，但没有用于处理特定数据结构的内置命令。像 Redis 这样的东西确实针对特定数据结构优化了代码，但无法支持所有可能的数据结构。

实际上，通过使用 ids 而不是嵌套数据结构，您可以非常接近 Redis。 DS1> 变为 DS1 和 DS2，其中 int来自 DS1 和一个前缀，为您提供持有 DS2 的密钥。

使用此结构，您只需两个操作即可访问任何 E。在某些情况下，您可以通过了解给定查询的 DS2 id 来将其简化为单个操作。

回复收藏 0 原文

不及他 2024-11-22 16:51:14

我犹豫是否“推荐”它，但据我所知，唯一能有效处理此类多维数据的存储引擎之一是系统间缓存。我在上一份工作中不得不使用它，主要是使用它基于 MUMPS 的语言构建的代码。我不会推荐本机方法，除非您讨厌自己或您的开发人员。然而，他们确实有不错的 Java 适配器，这似乎就是您正在使用的。我见过它处理数十亿条记录，并有效地存储在嵌套二叉树表中。您可以使用的深度（维度数）没有实际限制。然而，这在很大程度上是一个专有的解决方案。有一个名为 GT.M 的开源替代方案，但我不知道如何它与非 M 或 C 语言兼容。