MySQL 与 PostgreSQL 性能与复杂查询匹配模式的比较
我有一个复杂的数据库,大约有 30 个表。一个表有超过 500,000 行,另一个表有超过 15,000 行,我在一个单独的数据库中使用这两个表,直到今天我决定只在一个数据库中实现。
在今天之前,500,000 行的表位于 MySQL 数据库中,而 15,000 行的表位于 PostgreSQL 中。在一个大量使用的页面中,这是 PHP 基准测试的结果:
getSimilarAvaiable - 0.0287 s
getUnavaiable - 0.27 s
ProcessDataOfUnavaiable - 1.4701 s
Process - 1.8622 s
TotalPageTime - 3.631 s
在我将所有内容迁移到 PostgreSQL 并使用相同的 SQL 代码而不进行任何更改后,同一页面的结果是这样的:
getSimilarAvaiable - 2.7465 s
getUnavaiableCars - 9.0763 s
ProcesseDataOfUnavaiableCars - 1.4167 s
ProcessCars - 1.7207 s
TotalPageTime - 14.9602 s
我把MySQL 中的一切都是一样的,相同的索引,一切,但我不明白为什么会有如此巨大的差异。我应该做什么来优化这个?
编辑:现在更好地解释了。
500.00 表由以下结构组成:
id - bigint (primary key)
plate- varchar(10) Unique key
manufacturer - varchar(30)
vin - varchar(30)
主要查询是这样的:
SELECT plate, vin, 1 as n, substr(plate,1,2) as l
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 3 as n, substr(plate,4,2) as l
FROM imtt_vin WHERE substr(plate,4,1) >= 'A' and substr(plate,4,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 2 as n, substr(plate,7,2) as l
FROM imtt_vin WHERE substr(plate,7,1) >= 'A' and substr(plate,7,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
ORDER BY n, l, plate;
EDIT2:使用复杂的单个查询进行测试,我将其从 15 秒减少到 8/9 秒。即便如此,这对我来说还是太多了。
I have a complex database, with around 30 tables. One table has more than 500,000 rows and another more than 15,000 and I use both in a separate database until today I decided to implement in only one database.
Before today, the table with 500,000 rows was in a MySQL database and the 15,000 row table was in PostgreSQL. In one page of heavy use, this was the result in a PHP benchmark:
getSimilarAvaiable - 0.0287 s
getUnavaiable - 0.27 s
ProcessDataOfUnavaiable - 1.4701 s
Process - 1.8622 s
TotalPageTime - 3.631 s
After I migrate everything to PostgreSQL, and use the same SQL code without any changes the result of the same page was this:
getSimilarAvaiable - 2.7465 s
getUnavaiableCars - 9.0763 s
ProcesseDataOfUnavaiableCars - 1.4167 s
ProcessCars - 1.7207 s
TotalPageTime - 14.9602 s
I put everything the same in MySQL, same index, everything, but I can't understand why there is this huge difference. What I should do to optimize this?
EDIT: Now better explained.
The 500.00 table is composed with the follow structure:
id - bigint (primary key)
plate- varchar(10) Unique key
manufacturer - varchar(30)
vin - varchar(30)
The major query is something like this:
SELECT plate, vin, 1 as n, substr(plate,1,2) as l
FROM imtt_vin WHERE substr(plate,1,1) >= 'A' and substr(plate,1,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 3 as n, substr(plate,4,2) as l
FROM imtt_vin WHERE substr(plate,4,1) >= 'A' and substr(plate,4,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
UNION
SELECT plate, vin, 2 as n, substr(plate,7,2) as l
FROM imtt_vin WHERE substr(plate,7,1) >= 'A' and substr(plate,7,1) <= 'Z' AND
(manufacturer ILIKE '%".self::$Manufacturer."%') AND vin LIKE ?
ORDER BY n, l, plate;
EDIT2: Tested with a complex single query and I reduced it from 15 to 8/9 seconds. Even so it is too much for me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您需要发布 EXPLAIN yourquery (for mysql) 和 EXPLAIN ANALYZE yourquery (for postgres) ;如果没有这一点,就不可能说出任何相关的话。
另外 SELECT pg_relation_size('imtt_vin')
例如“?”的值是多少?在这个查询中?
我不知道你工作的地方的牌照,但这部分:
可能会选择数据库中的所有行,因此它的唯一目的是燃烧 CPU 周期。您至少可以像这样重写它(以及所有其他),以避免调用 substr() :
当然,当条件没有用时删除它。
然后我们有:
糟糕的数据库设计:世界上有 500,000 个汽车制造商吗?可能不会。您应该将制造商放在另一个表中并使用外键。这会将这种不可转位的情况转变为可转位的情况。
其余的,请发布解释/解释分析。
You need to post EXPLAIN yourquery (for mysql) and EXPLAIN ANALYZE yourquery (for postgres) ; without that it's impossible to say anything relevant.
Also SELECT pg_relation_size('imtt_vin')
For instance what is the value of "?" in this query ?
I don't know about license plates where you work but this part :
probably selects all rows in the database, thus its only purpose is to burn CPU cycles. You could at least rewrite it (and all the others) like this to avoid a call to substr() :
And of course remove the condition when it is not useful.
Then we have :
Bad database design : are there 500.000 car manufacturers in the world ? Probably not. You should put the manufacturers in another table and use a foreign key. That would turn this unindexable condition into an indexable one.
For the rest, post EXPLAIN / EXPLAIN ANALYZE.
如果您在 MySQL 中使用 MyISAM,理论上可以解释性能差异(因为关于您的数据库设计和执行的查询没有太多公开)。关于两个 RDBMS 之间的交叉性能,我建议您查看此比较页面 (锚定到 MyISAM 部分)。
If you were using MyISAM in MySQL the performance difference could theoretically (because not much has been exposed regarding your database design and queries performed) be explained. Regarding cross performance between the two RDBMS I'd recommend you take a look at this comparison page (Anchored to the MyISAM section).
MySQL 默认使用更多内存。我认为 def install 分配给它使用超过 256MB。不确定确切的数字。 PostgreSQL 默认设置为使用 32MB 之类的东西。尝试在配置文件中将每个内存增加到 1GB,然后运行基准测试并回复我们。
MySQL uses more memory by default. I think it is assigned to use more than 256MB by def install. Not sure on the exact number. PostgreSQL by default is set to use something like 32MB. Try to bump each one up to 1GB of ram in config file then run benchmarks and get back to us.
在我看来,您可能还没有更新 Postgres 数据库的统计信息。如果统计数据不正确,数据库的性能将不会很好。
Seems to me that you likely have not updated the statistics on the Postgres database. With improper statistics, the database will not perform very well.
查询
UNION ALL
。UNION
将用于折叠重复项,这显然不是这里的情况,并且会更昂贵。n
,因此对查询的各个部分进行排序可能会更有效。为此需要额外的一组括号。left (plate, 2)
比substr(plate, 1, 2)
快一点。仅适用于前导子字符串(您的第一个SELECT
)。索引
默认B 树索引仅适用于左锚定
LIKE
表达式。但三元组 GiST 或 GIN 索引可用于非左锚定模式。您需要附加模块pg_trgm
。使用CREATE EXTENSION
对每个数据库安装一次在 PostgreSQL 9.1 或更高版本中。请参阅旧版本的手册。我没有太多信息可以继续,基本的部分GIN索引 应该可以奇迹:
vin
,因为您可能需要其中的相等运算符=
。EXPLAIN ANALYZE
测试索引是否实际被使用。如果是的话,查询时间应该是毫秒,而不是秒。Query
UNION ALL
.UNION
would be used to fold duplicates, which is obviously not the case here, and would be more expensive.n
, it's probably more efficient to order individual legs of the query. The extra set of parentheses is needed for that.left (plate, 2)
is a bit faster thansubstr(plate, 1, 2)
. Works only for leading substrings (your firstSELECT
).Index
A default B-tree index only works for left-anchored
LIKE
expressions. But a trigram GiST or GIN index can be used for non-left-anchored patterns. You need the additional modulepg_trgm
. Install once per database withCREATE EXTENSION
in PostgreSQL 9.1 or later. Consult the manual for older versions.I don't have much information to go on, basic partial GIN indexes should work wonders:
vin
in the index, since you probably want the equality operator=
there.EXPLAIN ANALYZE
whether the index is actually used. If it is, query time should be a matter of milliseconds, not seconds.您仍然没有提供足够的信息 - 您有哪些索引、慢速查询的 EXPLAIN ANALYZE 输出等。
关于优化示例查询的一些想法:
1:UTF-8 字符串函数通常不是很快。如果您想加快字符串函数的速度,请对此列使用
bytea
类型而不是 varchar (或者将整个数据库编码更改为SQL_ASCII
,但这是不可取的)2:根据您的查询,数据库可能必须遍历表中的所有行并为每行计算这些字符串函数。
我不知道它们有多少匹配项,因此索引可能没有用,但功能索引可能会帮助您:
3:如果您可以容忍重复输出,请使用
UNION ALL
而不是UNION
在您的查询中——这将为您节省一些处理较大结果集的时间。4:尽可能避免
LIKE
/ILIKE
。You still haven't provided enough information -- what indexes do you have, EXPLAIN ANALYZE output for slow queries, etc.
Some thoughts on optimizing your example query:
1: UTF-8 string functions are generally not very fast. If you want to speed up string functions, use the
bytea
type instead of varchar for this column (or change your whole database encoding toSQL_ASCII
, but this is unadvisable)2: Given your queries, the database probably has to go through all rows in the table and compute these string functions for each.
I don't know how many matches they have, so the index might not be useful, but functional indexes might help you out:
3: If you can tolerate duplicate outputs, use
UNION ALL
instead ofUNION
in your queries -- this will save you some processing with larger result sets.4: Avoid
LIKE
/ILIKE
whenever you can.