混合“类索引” btree 结构 - PostgreSQL 可以做到这一点吗？

发布于 2024-10-21 20:13:53 字数 831 浏览 11 评论 0原文

我是 PostgreSQL 新手。我对需要构建的混合数据库有一个非常不寻常的要求。从我见过的模块来看，在我看来，以下情况是可能的。

我需要能够将键 - [值] 添加到索引中，而无需实际将数据添加到表中。简而言之，我需要一个键[值]存储，最好是 btree（查找速度）。索引结构是理想的。也许另一个结构可以做到这一点。

非常具体地说，我希望存储类似的内容：

KEY     [IDs]
Blue    10, 20, 23, 47
Green   5, 12, 40

我不想要存储这些数据并为其建立索引的开销。可以这么说，我只需要“索引但不存储”的数据。

同样重要的是能够查询这些结构并获取数据 (ID)，并且能够对 ID 执行 INTERSECTS 等操作，以及对键执行 IN、BETWEEN、= 等操作。

正如您可能猜到的那样，最终目标是最终的 ID 列表，然后将其发送给客户端并随意查找。

编辑

我不想记录每个值的键。使用上面的示例，我不想存储 {Blue, 10}、{Blue, 20} 等。我想存储 {Blue, [10, 20, 23, 47]}。

如果我将其存储为传统表，我无法找到解决此重复问题的方法。

再看看 Blue, [10, 20, 23, 47]}，从技术上讲，这只不过是一个 btree，其中 ID (10, 20, 23, 47) 被标记为值，父键“Blue”被标记为密钥。

由于这种数据类型不匹配在单个树中可能会很混乱，所以我相信理想的解决方案是“btree 中的 [btrees]”，其中“btree”是关键，而 [btrees] 是 a 的每组值的 btree钥匙。

原文

I am new to PostgreSQL. I have a very unusual requirement for a hybrid database I need to build. From modules I've seen, it seems to me that the following is possible.

I need to be able to add key - [values] into an index without actually adding data to a table. Simply put, I need a key-[values] store, ideally as a btree (lookup speed). An index structure is ideal. Perhaps another structure will do this.

To be very specific, I wish to store something like:

KEY     [IDs]
Blue    10, 20, 23, 47
Green   5, 12, 40

I don't want the overhead of storing this data AND indexing it. I just need the data "indexed but not stored" so to speak.

Equally important is the ability to query these structures and get the data (IDs) out, and be able to perform INTERSECTS etc. on the IDs, and IN, BETWEEN, =, etc. on the keys.

As you can probably guess, the end goal is a final list of IDs, which would then be sent to the client, and looked up at will.

EDIT

What I don't want is to record the key for every value. Using the example above, I don't want to store {Blue, 10}, {Blue, 20} etc. I want to store {Blue, [10, 20, 23, 47]}.

If I store this as a traditional table, I cannot see a way around this duplicate problem.

Looking again at Blue, [10, 20, 23, 47]}, this is technically nothing more than a single btree, where the IDs (10, 20, 23, 47) are marked as values, and the parent key "Blue" is marked as a key.

Since this data type mismatch could be messy in a single tree, I believe the ideal solution is "[btrees] in a btree", where "btree" is the key, and [btrees] is a btree for each group of values of a key.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一身软味 2024-10-28 20:13:53

如果您确实坚持这样做，您可以将值存储为数组，并且 intarray 模块提供了操作符来操作它们。也就是说：

create table data(key text primary key, values int[] not null);
insert into data
  values('Blue', '{10,20,23,47}'),('Green','{5,12,40}'),('Red', '{5,10,28}');

用这个你可以写：

select unnest(values) from data where key = 'Blue'
  intersect
  select unnest(values) from data where key = 'Red';

理想情况下，你需要一个聚合函数来将 int[] 转换为集合并计算交集等，但它们似乎没有提供。

实际上，这只是更典型结构的稍微更紧凑的存储：

select key, unnest(values) as value from data;
  key  | value
-------+-------
 Blue  |    10
 Blue  |    20
 Blue  |    23
[...]

事实上，您可以简单地将视图定义为上面的查询。

一种更规范化的方法是有两个表：一个用于描述键，一个用于将它们与值关联：

create table key_dimension(key_id serial primary key, key text not null unique);
insert into key_dimension(key) values('Blue'),('Green'),('Red');
create table key_value(key_id int not null references key_dimension(key_id), value int not null, primary key(key_id, value));
insert into key_value(key_id, value)
  select key_id, unnest(values) from key_dimension join data using (key);

现在：

select value from key_value
  where key_id = (select key_id from key_dimension where key = 'Red')
intersect
select value from key_value
  where key_id = (select key_id from key_dimension where key = 'Blue')

因此，任何选择键值的查询都只需要针对键集（key_dimension）运行，然后是一个最小的合成key (key_id) 用于将它们转换为实际的数据值集（来自 key_value）。

If you really insist on doing it this way you can store the values as an array, and the intarray module provides operators to manipulate those. That is:

create table data(key text primary key, values int[] not null);
insert into data
  values('Blue', '{10,20,23,47}'),('Green','{5,12,40}'),('Red', '{5,10,28}');

with this you can write:

select unnest(values) from data where key = 'Blue'
  intersect
  select unnest(values) from data where key = 'Red';

Ideally you need an aggregate function to convert an int[] to a set and calculate intersections etc., but they don't seem to be provided.

Really, this is just a slightly more compact storage of the more typical structure:

select key, unnest(values) as value from data;
  key  | value
-------+-------
 Blue  |    10
 Blue  |    20
 Blue  |    23
[...]

In fact, you can simply define a view to be the above query.

A more normalised approach would be to have two tables: one to describe keys, one to associate them with values:

create table key_dimension(key_id serial primary key, key text not null unique);
insert into key_dimension(key) values('Blue'),('Green'),('Red');
create table key_value(key_id int not null references key_dimension(key_id), value int not null, primary key(key_id, value));
insert into key_value(key_id, value)
  select key_id, unnest(values) from key_dimension join data using (key);

and now:

select value from key_value
  where key_id = (select key_id from key_dimension where key = 'Red')
intersect
select value from key_value
  where key_id = (select key_id from key_dimension where key = 'Blue')

So any queries to select key values need run only against the set of keys (key_dimension), and then a minimal synthetic key (key_id) is used to convert these to actual sets of data values (from key_value).

回复收藏 0 原文

~没有更多了~