MySQL 与 PostgreSQL 性能与复杂查询匹配模式的比较

发布于 2024-11-02 03:58:14 字数 1606 浏览 1 评论 0原文

我有一个复杂的数据库,大约有 30 个表。一个表有超过 500,000 行,另一个表有超过 15,000 行,我在一个单独的数据库中使用这两个表,直到今天我决定只在一个数据库中实现。

在今天之前,500,000 行的表位于 MySQL 数据库中,而 15,000 行的表位于 PostgreSQL 中。在一个大量使用的页面中,这是 PHP 基准测试的结果:

getSimilarAvaiable - 0.0287 s
getUnavaiable - 0.27 s
ProcessDataOfUnavaiable - 1.4701 s
Process - 1.8622 s
TotalPageTime - 3.631 s

在我将所有内容迁移到 PostgreSQL 并使用相同的 SQL 代码而不进行任何更改后,同一页面的结果是这样的:

getSimilarAvaiable - 2.7465 s
getUnavaiableCars - 9.0763 s
ProcesseDataOfUnavaiableCars - 1.4167 s
ProcessCars - 1.7207 s
TotalPageTime - 14.9602 s

我把MySQL 中的一切都是一样的,相同的索引,一切,但我不明白为什么会有如此巨大的差异。我应该做什么来优化这个?

编辑:现在更好地解释了。

500.00 表由以下结构组成:

id - bigint (primary key)
plate- varchar(10) Unique key
manufacturer - varchar(30)
vin - varchar(30)

主要查询是这样的:

SELECT plate, vin, 1 as n, substr(plate,1,2) as l 
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 3 as n, substr(plate,4,2) as l 
FROM imtt_vin WHERE substr(plate,4,1) >= 'A' and substr(plate,4,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 2 as n, substr(plate,7,2) as l 
FROM imtt_vin WHERE substr(plate,7,1) >= 'A' and substr(plate,7,1) <= 'Z' AND 
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
ORDER BY n, l, plate;

EDIT2:使用复杂的单个查询进行测试,我将其从 15 秒减少到 8/9 秒。即便如此,这对我来说还是太多了。

I have a complex database, with around 30 tables. One table has more than 500,000 rows and another more than 15,000 and I use both in a separate database until today I decided to implement in only one database.

Before today, the table with 500,000 rows was in a MySQL database and the 15,000 row table was in PostgreSQL. In one page of heavy use, this was the result in a PHP benchmark:

getSimilarAvaiable - 0.0287 s
getUnavaiable - 0.27 s
ProcessDataOfUnavaiable - 1.4701 s
Process - 1.8622 s
TotalPageTime - 3.631 s

After I migrate everything to PostgreSQL, and use the same SQL code without any changes the result of the same page was this:

getSimilarAvaiable - 2.7465 s
getUnavaiableCars - 9.0763 s
ProcesseDataOfUnavaiableCars - 1.4167 s
ProcessCars - 1.7207 s
TotalPageTime - 14.9602 s

I put everything the same in MySQL, same index, everything, but I can't understand why there is this huge difference. What I should do to optimize this?

EDIT: Now better explained.

The 500.00 table is composed with the follow structure:

id - bigint (primary key)
plate- varchar(10) Unique key
manufacturer - varchar(30)
vin - varchar(30)

The major query is something like this:

SELECT plate, vin, 1 as n, substr(plate,1,2) as l 
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 3 as n, substr(plate,4,2) as l 
FROM imtt_vin WHERE substr(plate,4,1) >= 'A' and substr(plate,4,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 2 as n, substr(plate,7,2) as l 
FROM imtt_vin WHERE substr(plate,7,1) >= 'A' and substr(plate,7,1) <= 'Z' AND 
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
ORDER BY n, l, plate;

EDIT2: Tested with a complex single query and I reduced it from 15 to 8/9 seconds. Even so it is too much for me.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

清音悠歌 2024-11-09 03:58:14

您需要发布 EXPLAIN yourquery (for mysql) 和 EXPLAIN ANALYZE yourquery (for postgres) ;如果没有这一点,就不可能说出任何相关的话。

另外 SELECT pg_relation_size('imtt_vin')

例如“?”的值是多少?在这个查询中?

SELECT plate, vin, 1 as n, substr(plate,1,2) as l 
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?

我不知道你工作的地方的牌照,但这部分:

WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z'

可能会选择数据库中的所有行,因此它的唯一目的是燃烧 CPU 周期。您至少可以像这样重写它(以及所有其他),以避免调用 substr() :

WHERE substr(plate,1,1) BETWEEN 'A' AND 'Z'

当然,当条件没有用时删除它。

然后我们有:

manufacturer ILIKE '%".self::$Manufacturer."%'

糟糕的数据库设计:世界上有 500,000 个汽车制造商吗?可能不会。您应该将制造商放在另一个表中并使用外键。这会将这种不可转位的情况转变为可转位的情况。

其余的,请发布解释/解释分析。

You need to post EXPLAIN yourquery (for mysql) and EXPLAIN ANALYZE yourquery (for postgres) ; without that it's impossible to say anything relevant.

Also SELECT pg_relation_size('imtt_vin')

For instance what is the value of "?" in this query ?

SELECT plate, vin, 1 as n, substr(plate,1,2) as l 
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?

I don't know about license plates where you work but this part :

WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z'

probably selects all rows in the database, thus its only purpose is to burn CPU cycles. You could at least rewrite it (and all the others) like this to avoid a call to substr() :

WHERE substr(plate,1,1) BETWEEN 'A' AND 'Z'

And of course remove the condition when it is not useful.

Then we have :

manufacturer ILIKE '%".self::$Manufacturer."%'

Bad database design : are there 500.000 car manufacturers in the world ? Probably not. You should put the manufacturers in another table and use a foreign key. That would turn this unindexable condition into an indexable one.

For the rest, post EXPLAIN / EXPLAIN ANALYZE.

一杆小烟枪 2024-11-09 03:58:14

如果您在 MySQL 中使用 MyISAM,理论上可以解释性能差异(因为关于您的数据库设计和执行的查询没有太多公开)。关于两个 RDBMS 之间的交叉性能,我建议您查看此比较页面 (锚定到 MyISAM 部分)。

If you were using MyISAM in MySQL the performance difference could theoretically (because not much has been exposed regarding your database design and queries performed) be explained. Regarding cross performance between the two RDBMS I'd recommend you take a look at this comparison page (Anchored to the MyISAM section).

初心 2024-11-09 03:58:14

MySQL 默认使用更多内存。我认为 def install 分配给它使用超过 256MB。不确定确切的数字。 PostgreSQL 默认设置为使用 32MB 之类的东西。尝试在配置文件中将每个内存增加到 1GB,然后运行基准测试并回复我们。

MySQL uses more memory by default. I think it is assigned to use more than 256MB by def install. Not sure on the exact number. PostgreSQL by default is set to use something like 32MB. Try to bump each one up to 1GB of ram in config file then run benchmarks and get back to us.

初见 2024-11-09 03:58:14

在我看来,您可能还没有更新 Postgres 数据库的统计信息。如果统计数据不正确,数据库的性能将不会很好。

Seems to me that you likely have not updated the statistics on the Postgres database. With improper statistics, the database will not perform very well.

一身骄傲 2024-11-09 03:58:14

查询

(
SELECT 1 AS n, left(plate, 2) AS l, plate, vin
FROM   imtt_vin
WHERE  left(plate, 1) BETWEEN 'A' AND 'Z'
AND    manufacturer ILIKE '%".self::$Manufacturer."%'
AND    vin LIKE ?   -- You probably mean: vin = ?
ORDER  BY l, plate
)

UNION ALL
(
SELECT 3 AS n, substr(plate, 4, 2) AS l, plate, vin
FROM   imtt_vin
WHERE  substr(plate, 4, 1) BETWEEN 'A' AND 'Z'
AND    manufacturer ILIKE '%".self::$Manufacturer."%'
AND    vin LIKE ?
ORDER  BY l, plate
)

UNION  ALL ...
  • 使用 UNION ALL UNION 将用于折叠重复项,这显然不是这里的情况,并且会更昂贵。
  • 由于您的主要 ORDER BY 项是 n,因此对查询的各个部分进行排序可能会更有效。为此需要额外的一组括号。
  • left (plate, 2)substr(plate, 1, 2) 快一点。仅适用于前导子字符串(您的第一个 SELECT)。

索引

默认B 树索引仅适用于左锚定LIKE 表达式。但三元组 GiST 或 GIN 索引可用于非左锚定模式。您需要附加模块 pg_trgm 。使用 CREATE EXTENSION 对每个数据库安装一次在 PostgreSQL 9.1 或更高版本中。请参阅旧版本的手册。

CREATE EXTENSION pg_trgm;

我没有太多信息可以继续,基本的部分GIN索引 应该可以奇迹

CREATE INDEX imtt_vin_partial_gist_idx ON imtt_vin
USING  gin (manufacturer gin_trgm_ops)
WHERE  left(plate, 1) BETWEEN 'A' AND 'Z';

CREATE INDEX imtt_vin_partial_gist_idx ON imtt_vin
USING  gin (manufacturer gin_trgm_ops)
WHERE  substr(plate, 4, 1) BETWEEN 'A' AND 'Z';

-- more ...
  • 我没有在索引中包含 vin,因为您可能需要其中的相等运算符 =
  • 部分索引上的谓词必须在查询中重复(或多或少),以便查询规划器了解索引是适用的。
  • 三元组索引适用于不区分大小写的匹配。
  • 使用EXPLAIN ANALYZE测试索引是否实际被使用。如果是的话,查询时间应该是毫秒,而不是秒。
  • 该速度是以索引维护的写入操作的(小)成本为代价的。并且索引通常是磁盘上表大小的几倍。
  • MySQL 无法做到这些。

Query

(
SELECT 1 AS n, left(plate, 2) AS l, plate, vin
FROM   imtt_vin
WHERE  left(plate, 1) BETWEEN 'A' AND 'Z'
AND    manufacturer ILIKE '%".self::$Manufacturer."%'
AND    vin LIKE ?   -- You probably mean: vin = ?
ORDER  BY l, plate
)

UNION ALL
(
SELECT 3 AS n, substr(plate, 4, 2) AS l, plate, vin
FROM   imtt_vin
WHERE  substr(plate, 4, 1) BETWEEN 'A' AND 'Z'
AND    manufacturer ILIKE '%".self::$Manufacturer."%'
AND    vin LIKE ?
ORDER  BY l, plate
)

UNION  ALL ...
  • Use UNION ALL. UNION would be used to fold duplicates, which is obviously not the case here, and would be more expensive.
  • Since your leading ORDER BY item is n, it's probably more efficient to order individual legs of the query. The extra set of parentheses is needed for that.
  • left (plate, 2) is a bit faster than substr(plate, 1, 2). Works only for leading substrings (your first SELECT).

Index

A default B-tree index only works for left-anchored LIKE expressions. But a trigram GiST or GIN index can be used for non-left-anchored patterns. You need the additional module pg_trgm. Install once per database with CREATE EXTENSION in PostgreSQL 9.1 or later. Consult the manual for older versions.

CREATE EXTENSION pg_trgm;

I don't have much information to go on, basic partial GIN indexes should work wonders:

CREATE INDEX imtt_vin_partial_gist_idx ON imtt_vin
USING  gin (manufacturer gin_trgm_ops)
WHERE  left(plate, 1) BETWEEN 'A' AND 'Z';

CREATE INDEX imtt_vin_partial_gist_idx ON imtt_vin
USING  gin (manufacturer gin_trgm_ops)
WHERE  substr(plate, 4, 1) BETWEEN 'A' AND 'Z';

-- more ...
  • I didn't include vin in the index, since you probably want the equality operator = there.
  • Predicates on a partial index have to be repeated (more or less) in queries so the query planner understands the index is applicable.
  • A trigram index works for case insensitive matches.
  • Test with EXPLAIN ANALYZE whether the index is actually used. If it is, query time should be a matter of milliseconds, not seconds.
  • The speed comes at a (small) cost for write operations for index maintenance. And The index is typically several times the size of the table on disk.
  • You can't do any of this with MySQL.
述情 2024-11-09 03:58:14

您仍然没有提供足够的信息 - 您有哪些索引、慢速查询的 EXPLAIN ANALYZE 输出等。

关于优化示例查询的一些想法:

1:UTF-8 字符串函数通常不是很快。如果您想加快字符串函数的速度,请对此列使用 bytea 类型而不是 varchar (或者将整个数据库编码更改为 SQL_ASCII,但这是不可取的)

2:根据您的查询,数据库可能必须遍历表中的所有行并为每行计算这些字符串函数。

我不知道它们有多少匹配项,因此索引可能没有用,但功能索引可能会帮助您:

 CREATE INDEX imtt_vin_plate_1 ON imtt_vin (substr(plate,1,1));
 CREATE INDEX imtt_vin_plate_4 ON imtt_vin (substr(plate,4,1));
 CREATE INDEX imtt_vin_plate_7 ON imtt_vin (substr(plate,7,1));

3:如果您可以容忍重复输出,请使用 UNION ALL 而不是 UNION 在您的查询中——这将为您节省一些处理较大结果集的时间。

4:尽可能避免LIKE/ILIKE

You still haven't provided enough information -- what indexes do you have, EXPLAIN ANALYZE output for slow queries, etc.

Some thoughts on optimizing your example query:

1: UTF-8 string functions are generally not very fast. If you want to speed up string functions, use the bytea type instead of varchar for this column (or change your whole database encoding to SQL_ASCII, but this is unadvisable)

2: Given your queries, the database probably has to go through all rows in the table and compute these string functions for each.

I don't know how many matches they have, so the index might not be useful, but functional indexes might help you out:

 CREATE INDEX imtt_vin_plate_1 ON imtt_vin (substr(plate,1,1));
 CREATE INDEX imtt_vin_plate_4 ON imtt_vin (substr(plate,4,1));
 CREATE INDEX imtt_vin_plate_7 ON imtt_vin (substr(plate,7,1));

3: If you can tolerate duplicate outputs, use UNION ALL instead of UNION in your queries -- this will save you some processing with larger result sets.

4: Avoid LIKE/ILIKE whenever you can.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文