问题
我需要一个可以存储以下形式的值的键值存储:
DS<DS<E>>
其中数据结构DS
可以是
List
、SortedSet
或 Array
和E
可以是String
或字节数组
。
生成这些数据的成本非常高,因此一旦我将其放入存储中,我只会对其执行读取查询。本质上,它是一个没有驱逐的复杂对象缓存。
示例应用程序
一个(可能不好,但足以澄清)应用程序示例正在存储文档中的标记化句子,您需要能够快速访问文档中的第 q 个单词给定 documentID
的第 p 句。在这种情况下,我会将其存储为 KV 对,如下所示:
K - docID
V - List<List<String>>
String word = map.get(docID).get(p).get(q);
我更喜欢避免应用程序集成的 Map 解决方案(例如 Java 中的 EhCache)。
我曾使用过 Redis,但它似乎不支持第二层数据结构复杂性。还有其他可以帮助我的用例的 KV 解决方案吗?
更新:
我知道我可以序列化/反序列化我的对象,但我想知道是否还有其他解决方案。
Problem
I need a key-value store that can store values of the following form:
DS<DS<E>>
where the data structure DS
can be
either a List
, SortedSet
or an Array
and E
can be either a String
or byte-array
.
It is very expensive to generate this data and so once I put it into the store, I will only perform read queries on it. Essentially it is a complex object cache with no eviction.
Example Application
A (possibly bad, but sufficient to clarify) example of an application is storing tokenized sentences from a document where you need to be able to quickly access the qth word of the pth sentence given documentID
. In this case, I would be storing it as a K-V pair as follows:
K - docID
V - List<List<String>>
String word = map.get(docID).get(p).get(q);
I prefer to avoid app-integrated Map solutions (such as EhCache within Java).
I have worked with Redis but it doesn't appear to support the second layer of data-structure complexity. Any other K-V solutions that can help my use case?
Update:
I know that I could serialize/deserialize my object but I was wondering if there is any other solution.
发布评论
评论(3)
在平台选择方面,您有两个选择 - 完整的文档数据库将支持任意复杂的对象,但没有用于处理特定数据结构的内置命令。像 Redis 这样的东西确实针对特定数据结构优化了代码,但无法支持所有可能的数据结构。
实际上,通过使用 ids 而不是嵌套数据结构,您可以非常接近 Redis。
DS1>
变为DS1
和DS2
,其中int
来自DS1
和一个前缀,为您提供持有DS2
的密钥。使用此结构,您只需两个操作即可访问任何
E
。在某些情况下,您可以通过了解给定查询的 DS2 id 来将其简化为单个操作。In terms of platform choice you have two options - A full document database will support arbitrarily complex objects, but won't have built in commands for working with specific data structures. Something like Redis which does have optimised code for specific data structures can't support all possible data structures.
You can actually get pretty close with Redis by using ids instead of the nested data structure.
DS1<DS2<E>>
becomesDS1<int>
andDS2<E>
, with theint
fromDS1
and a prefix giving you the key holdingDS2
.With this structure you can access any
E
with only two operations. In some cases you will be able to get that down to a single operation by knowing what the id of DS2 will be for a given query.我犹豫是否“推荐”它,但据我所知,唯一能有效处理此类多维数据的存储引擎之一是 系统间缓存。我在上一份工作中不得不使用它,主要是使用它基于 MUMPS 的语言构建的代码。我不会推荐本机方法,除非您讨厌自己或您的开发人员。然而,他们确实有不错的 Java 适配器,这似乎就是您正在使用的。我见过它处理数十亿条记录,并有效地存储在嵌套二叉树表中。您可以使用的深度(维度数)没有实际限制。然而,这在很大程度上是一个专有的解决方案。有一个名为 GT.M 的开源替代方案,但我不知道如何它与非 M 或 C 语言兼容。
I hesitate to "recommend" it, but one of the only storage engines I know of which handles multi-dimensional data of this sort efficiently is Intersystems Cache. I had to use it at my last job, mostly coding against it using it's built in MUMPS-based language. I would not recommend the native approach, unless you hate yourself or your developers. However, they do have decent Java adapters, which appears to be what you're using. I've seen it handle billions of records, efficiently stored in nested binary tree tables. There is no practical limit to the depth (number of dimensions) you can use. However, this is very much a proprietary solution. There is an open-source alternative called GT.M, but I don't know how compatible it is with languages that aren't M or C.
任何键值存储都支持复杂值,您只需要序列化/反序列化数据即可。
如果您只想快速检索数据的特定部分,则可以使用更复杂的键。在你的例子中,这将是:
K - 元组(docID, p, q)
Any Key-Value store supports complex values, you just need to serialize/deserialize the data.
If you want fast retrieval only for specific parts of the data, you could use a more complex Key. In your example this would be:
K - tuple(docID, p, q)