当前位置：文江博客话题详情

最优数据库表优化方法

发布于 2024-08-23 10:54:33 字数 487 浏览 7 评论 0原文

我有一个数据库表变得太大（几亿行）需要优化，但在对其进行分区之前，我想我应该询问建议。

这是用法：

0。表包含约 10 列，每列长度约 20 字节。

INSERTS 以每秒数百次的速率执行。
每小时根据列“a”（其中 a='xxxx'）执行几次 SELECT 语句。
DELETE 语句是基于 DATE 列执行的。（删除超过 1 年的日期）通常每天一次。

关键要求是加快 INSERT 和 SELECT 语句的速度，并且能够保留 1 年前的历史数据，而无需在删除时锁定整个表。

我猜想我必须有两个索引，一个用于列“a”，另一个用于日期字段。或者是否可以同时优化两者？

选择速度和删除速度之间是否需要进行权衡？

分区是唯一的解决方案吗？对此类表进行分区的好策略是什么？

我使用的是 PostgreSQL 8.4 数据库。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小ぇ时光︴ 2024-08-30 10:54:33

您是否没有研究过 PostgreSQL 分区<，而不是将其保留为单个物理表/a>?从 8.1 版本开始受支持。

分区可以帮助您避免在快速插入和快速删除性能之间进行选择的问题。您始终可以按年/月对表进行分区，然后删除不再需要的分区。删除分区非常快，插入小分区也非常快。

从手册中：

<块引用>
<块引用>
分区是指将逻辑上的一个大表分割成
较小的物理碎片。分区
可以提供多种好处：

对于某些特定情况，查询性能可以显着提高
各种查询。
更新性能也可以得到提高，因为每个部分
表的索引小于
整个数据集的索引将是。
当索引不再容易适应时
内存，读和写操作
在指数上逐渐增加
磁盘访问。
只需删除其中一项即可完成批量删除
分区，如果该要求是
规划到分区设计中。
DROP TABLE 比批量操作快得多
删除，更不用说接下来的事情了
真空吸尘。
很少使用的数据可以迁移到更便宜且速度较慢的存储中
媒体。
<块引用>
<块引用>
通常只有当桌子可以使用时，这些好处才值得
否则会非常大。确切的
桌子将受益的点
分区取决于
应用程序，尽管是经验法则
就是桌子的大小应该
超过物理内存
数据库服务器。
目前，PostgreSQL 支持通过表继承进行分区。
每个分区必须创建为
单个父表的子表。
父表本身通常是
空的;它的存在只是为了代表
整个数据集。你应该是
熟悉继承（参见部分
5.8）在尝试实现分区之前。

回复收藏 0 原文

枕梦 2024-08-30 10:54:33

正如其他人所说，分区是您的答案，但是：

我会根据一些 hash(a) 进行分区。如果 a 是一个整数，那么 a%256 就可以了。如果它是文本，则类似于 substring(md5(a) for 2)。

它将加快插入和选择的速度。

对于删除，我会让它们运行得更频繁，但更小，并且也进行分区。我每小时运行一次（在 XX:30），如下所示：

delete from table_name
where date<(current_date - interval '1 year')
and
  hash(a)
  =
  (extract(doy from current_timestamp) * 24
    + extract(hour from current_timestamp))::int % 256;

编辑：我刚刚测试了这个：

create function hash(a text) returns text as $ select substring(md5($1) for 1) $ language sql immutable strict;
CREATE TABLE tablename (id text, mdate date);
CREATE TABLE tablename_partition_0 ( CHECK ( hash(id) = '0' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_1 ( CHECK ( hash(id) = '1' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_2 ( CHECK ( hash(id) = '2' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_3 ( CHECK ( hash(id) = '3' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_4 ( CHECK ( hash(id) = '4' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_5 ( CHECK ( hash(id) = '5' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_6 ( CHECK ( hash(id) = '6' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_7 ( CHECK ( hash(id) = '7' ) ) INHERITS (tablename); 
CREATE TABLE tablename_partition_8 ( CHECK ( hash(id) = '8' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_9 ( CHECK ( hash(id) = '9' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_a ( CHECK ( hash(id) = 'a' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_b ( CHECK ( hash(id) = 'b' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_c ( CHECK ( hash(id) = 'c' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_d ( CHECK ( hash(id) = 'd' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_e ( CHECK ( hash(id) = 'e' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_f ( CHECK ( hash(id) = 'f' ) ) INHERITS (tablename);
analyze;
explain select * from tablename where id='bar' and hash(id)=hash('bar');

 查询计划                                          
-------------------------------------------------- -------------------------------------------
 结果（成本=0.00..69.20行=2宽度=36）
   ->附加（成本=0.00..69.20行=2宽度=36）
         ->对表名进行顺序扫描（成本=0.00..34.60 行=1 宽度=36）
               过滤器： ((id = 'bar'::text) AND ("子字符串"(md5(id), 1, 1) = '3'::text))
         ->对 tablename_partition_3 表名进行顺序扫描（成本=0.00..34.60 行=1 宽度=36）
               过滤器： ((id = 'bar'::text) AND ("子字符串"(md5(id), 1, 1) = '3'::text))
（6 行）

您需要将 hash(id)=hash('searched_value') 添加到查询中，否则 Postgres 将搜索所有表。

编辑：您还可以使用规则系统自动插入以更正表：

create rule tablename_rule_0 as
  on insert to tablename where hash(NEW.id)='0'
  do instead insert into tablename_partition_0 values (NEW.*);
create rule tablename_rule_1 as
  on insert to tablename where hash(NEW.id)='1'
  do instead insert into tablename_partition_1 values (NEW.*);
-- and so on
insert into tablename (id) values ('a');
select * from tablename_partition_0;
 id | mdate 
----+-------
 a  | 
(1 row)

Partitioning is your answer, as others stated, but:

I'd partition on some hash(a). If a is an integer then a%256 would be good. If it is a text then something like substring(md5(a) for 2).

It will speed up inserts and selects.

For deletes I'd make them run more often but smaller and also partitioned. I'd run them every hour (at XX:30) and like this:

delete from table_name
where date<(current_date - interval '1 year')
and
  hash(a)
  =
  (extract(doy from current_timestamp) * 24
    + extract(hour from current_timestamp))::int % 256;

EDIT: I've just tested this:

create function hash(a text) returns text as $ select substring(md5($1) for 1) $ language sql immutable strict;
CREATE TABLE tablename (id text, mdate date);
CREATE TABLE tablename_partition_0 ( CHECK ( hash(id) = '0' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_1 ( CHECK ( hash(id) = '1' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_2 ( CHECK ( hash(id) = '2' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_3 ( CHECK ( hash(id) = '3' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_4 ( CHECK ( hash(id) = '4' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_5 ( CHECK ( hash(id) = '5' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_6 ( CHECK ( hash(id) = '6' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_7 ( CHECK ( hash(id) = '7' ) ) INHERITS (tablename); 
CREATE TABLE tablename_partition_8 ( CHECK ( hash(id) = '8' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_9 ( CHECK ( hash(id) = '9' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_a ( CHECK ( hash(id) = 'a' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_b ( CHECK ( hash(id) = 'b' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_c ( CHECK ( hash(id) = 'c' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_d ( CHECK ( hash(id) = 'd' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_e ( CHECK ( hash(id) = 'e' ) ) INHERITS (tablename);
CREATE TABLE tablename_partition_f ( CHECK ( hash(id) = 'f' ) ) INHERITS (tablename);
analyze;
explain select * from tablename where id='bar' and hash(id)=hash('bar');

                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Result  (cost=0.00..69.20 rows=2 width=36)
   ->  Append  (cost=0.00..69.20 rows=2 width=36)
         ->  Seq Scan on tablename  (cost=0.00..34.60 rows=1 width=36)
               Filter: ((id = 'bar'::text) AND ("substring"(md5(id), 1, 1) = '3'::text))
         ->  Seq Scan on tablename_partition_3 tablename  (cost=0.00..34.60 rows=1 width=36)
               Filter: ((id = 'bar'::text) AND ("substring"(md5(id), 1, 1) = '3'::text))
(6 rows)

You'd need to add hash(id)=hash('searched_value') to your queries or Postgres will search all tables.

EDIT: You can also use rule system for automatic insertions to correct tables:

create rule tablename_rule_0 as
  on insert to tablename where hash(NEW.id)='0'
  do instead insert into tablename_partition_0 values (NEW.*);
create rule tablename_rule_1 as
  on insert to tablename where hash(NEW.id)='1'
  do instead insert into tablename_partition_1 values (NEW.*);
-- and so on
insert into tablename (id) values ('a');
select * from tablename_partition_0;
 id | mdate 
----+-------
 a  | 
(1 row)

回复收藏 0 原文