混合“类索引” btree 结构 - PostgreSQL 可以做到这一点吗?
我是 PostgreSQL 新手。我对需要构建的混合数据库有一个非常不寻常的要求。从我见过的模块来看,在我看来,以下情况是可能的。
我需要能够将键 - [值] 添加到索引中,而无需实际将数据添加到表中。简而言之,我需要一个键[值]存储,最好是 btree(查找速度)。索引结构是理想的。也许另一个结构可以做到这一点。
非常具体地说,我希望存储类似的内容:
KEY [IDs]
Blue 10, 20, 23, 47
Green 5, 12, 40
我不想要存储这些数据并为其建立索引的开销。可以这么说,我只需要“索引但不存储”的数据。
同样重要的是能够查询这些结构并获取数据 (ID),并且能够对 ID 执行 INTERSECTS 等操作,以及对键执行 IN、BETWEEN、= 等操作。
正如您可能猜到的那样,最终目标是最终的 ID 列表,然后将其发送给客户端并随意查找。
编辑
我不想记录每个值的键。使用上面的示例,我不想存储 {Blue, 10}、{Blue, 20} 等。我想存储 {Blue, [10, 20, 23, 47]}。
如果我将其存储为传统表,我无法找到解决此重复问题的方法。
再看看 Blue, [10, 20, 23, 47]},从技术上讲,这只不过是一个 btree,其中 ID (10, 20, 23, 47) 被标记为值,父键“Blue”被标记为密钥。
由于这种数据类型不匹配在单个树中可能会很混乱,所以我相信理想的解决方案是“btree 中的 [btrees]”,其中“btree”是关键,而 [btrees] 是 a 的每组值的 btree钥匙。
I am new to PostgreSQL. I have a very unusual requirement for a hybrid database I need to build. From modules I've seen, it seems to me that the following is possible.
I need to be able to add key - [values] into an index without actually adding data to a table. Simply put, I need a key-[values] store, ideally as a btree (lookup speed). An index structure is ideal. Perhaps another structure will do this.
To be very specific, I wish to store something like:
KEY [IDs]
Blue 10, 20, 23, 47
Green 5, 12, 40
I don't want the overhead of storing this data AND indexing it. I just need the data "indexed but not stored" so to speak.
Equally important is the ability to query these structures and get the data (IDs) out, and be able to perform INTERSECTS etc. on the IDs, and IN, BETWEEN, =, etc. on the keys.
As you can probably guess, the end goal is a final list of IDs, which would then be sent to the client, and looked up at will.
EDIT
What I don't want is to record the key for every value. Using the example above, I don't want to store {Blue, 10}, {Blue, 20} etc. I want to store {Blue, [10, 20, 23, 47]}.
If I store this as a traditional table, I cannot see a way around this duplicate problem.
Looking again at Blue, [10, 20, 23, 47]}, this is technically nothing more than a single btree, where the IDs (10, 20, 23, 47) are marked as values, and the parent key "Blue" is marked as a key.
Since this data type mismatch could be messy in a single tree, I believe the ideal solution is "[btrees] in a btree", where "btree" is the key, and [btrees] is a btree for each group of values of a key.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您确实坚持这样做,您可以将值存储为数组,并且 intarray 模块提供了操作符来操作它们。也就是说:
用这个你可以写:
理想情况下,你需要一个聚合函数来将 int[] 转换为集合并计算交集等,但它们似乎没有提供。
实际上,这只是更典型结构的稍微更紧凑的存储:
事实上,您可以简单地将视图定义为上面的查询。
一种更规范化的方法是有两个表:一个用于描述键,一个用于将它们与值关联:
现在:
因此,任何选择键值的查询都只需要针对键集(key_dimension)运行,然后是一个最小的合成key (key_id) 用于将它们转换为实际的数据值集(来自 key_value)。
If you really insist on doing it this way you can store the values as an array, and the intarray module provides operators to manipulate those. That is:
with this you can write:
Ideally you need an aggregate function to convert an int[] to a set and calculate intersections etc., but they don't seem to be provided.
Really, this is just a slightly more compact storage of the more typical structure:
In fact, you can simply define a view to be the above query.
A more normalised approach would be to have two tables: one to describe keys, one to associate them with values:
and now:
So any queries to select key values need run only against the set of keys (key_dimension), and then a minimal synthetic key (key_id) is used to convert these to actual sets of data values (from key_value).