PostgreSQL 中字符串列类型的索引数组

发布于 2024-10-17 01:20:59 字数 352 浏览 1 评论 0原文

是否可以在文本数组类型的列上创建索引。尝试使用GIN索引,但查询似乎没有使用这些索引。

-- Example:
CREATE TABLE users (
   name VARCHAR(100),
   groups TEXT[],
);

-- Query: 
SELECT name FROM users WHERE ANY(groups) = 'Engineering';

另外,在groups列上有效执行GROUP BY的最佳方法是什么,以便它可以提供groups和计数。

Is it possible to create an index on a column with type of text array. Tried using GIN indexes, but queries do not seem to be using those indexes.

-- Example:
CREATE TABLE users (
   name VARCHAR(100),
   groups TEXT[],
);

-- Query: 
SELECT name FROM users WHERE ANY(groups) = 'Engineering';

Also what is the best way to perform GROUP BY on groups column efficiently so that it can give groups and count.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

水溶 2024-10-24 01:20:59

可以使用 gin 索引:

CREATE TABLE users (
 name VARCHAR(100),
 groups text[]
);

CREATE INDEX idx_users ON users USING GIN(groups);

-- disable sequential scan in this test:
SET enable_seqscan TO off;

EXPLAIN ANALYZE
SELECT name FROM users WHERE  groups @> (ARRAY['Engineering']);

结果:

"Bitmap Heap Scan on users  (cost=4.26..8.27 rows=1 width=218) (actual time=0.021..0.021 rows=0 loops=1)"
"  Recheck Cond: (groups @> '{Engineering}'::text[])"
"  ->  Bitmap Index Scan on idx_users  (cost=0.00..4.26 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)"
"        Index Cond: (groups @> '{Engineering}'::text[])"
"Total runtime: 0.074 ms"

在数组上使用聚合函数,这将是另一个问题。函数 unnest() 可能会有所帮助。

为什么不规范化你的数据?这将解决所有问题,包括许多您尚未遇到的问题。

A gin index can be used:

CREATE TABLE users (
 name VARCHAR(100),
 groups text[]
);

CREATE INDEX idx_users ON users USING GIN(groups);

-- disable sequential scan in this test:
SET enable_seqscan TO off;

EXPLAIN ANALYZE
SELECT name FROM users WHERE  groups @> (ARRAY['Engineering']);

Result:

"Bitmap Heap Scan on users  (cost=4.26..8.27 rows=1 width=218) (actual time=0.021..0.021 rows=0 loops=1)"
"  Recheck Cond: (groups @> '{Engineering}'::text[])"
"  ->  Bitmap Index Scan on idx_users  (cost=0.00..4.26 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)"
"        Index Cond: (groups @> '{Engineering}'::text[])"
"Total runtime: 0.074 ms"

Using aggregate functions on an array, that will be another problem. The function unnest() might help.

Why don't you normalize your data? That will fix all problems, including many problems you didn't encouter yet.

私野 2024-10-24 01:20:59

我认为处理这个问题的最佳方法是标准化你的模型。由于我没有尝试,以下内容可能会包含错误,但想法应该很清楚:

CREATE TABLE users (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE groups (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE user_group (
    user INTEGER NOT NULL REFERENCES users,
    group INTEGER NOT NULL REFERENCES groups);
CREATE UNIQUE INDEX user_group_unique ON user_group (user, group);

SELECT users.name
    FROM user_group
    INNER JOIN users ON user_group.user = users.id
    INNER JOIN groups ON user_group.group = groups.id
    WHERE groups.name = 'Engineering';

生成的执行计划应该已经相当高效了;您仍然可以通过索引 ON user_group(group) 进行优化,这允许使用 index_scan 而不是顺序扫描来查找特定组的成员。

I think the best way to handle this would be to normalize your model. The following will probably contain errors as I didn't try it, but the idea should be clear:

CREATE TABLE users (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE groups (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE user_group (
    user INTEGER NOT NULL REFERENCES users,
    group INTEGER NOT NULL REFERENCES groups);
CREATE UNIQUE INDEX user_group_unique ON user_group (user, group);

SELECT users.name
    FROM user_group
    INNER JOIN users ON user_group.user = users.id
    INNER JOIN groups ON user_group.group = groups.id
    WHERE groups.name = 'Engineering';

The resulting execution plan should be fairly efficient already; you can optimize still by indexing ON user_group(group), which allows an index_scan rather than a sequential_scan to find the members of a particular group.

晚雾 2024-10-24 01:20:59

从 PostgreSQL 9.6 开始,现在有 array_to_tsvector() 函数。

https://pgpedia.info/a/array_to_tsvector.html

所以也许非常有效的是创建tsvectors 并对其使用 gin 索引。

There is array_to_tsvector() function as of PostgreSQL 9.6 now.

https://pgpedia.info/a/array_to_tsvector.html

So maybe quite efficient is to create tsvectors and use gin index on them.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文