Python 的数据库查询速度较慢,但​​ Perl 则不然

发布于 2024-12-19 14:43:21 字数 2689 浏览 0 评论 0原文

我的网上商店使用 python (Django)。

当我测试高负载(数据库访问)时得到了有趣的结果:

python 10 process = 200sec / 100% CPU utilisation
perl 10 process  = 65sec / 35% CPU utilisation

Centos 6、python 2.6、mysql 5.5、标准库、其他服务器上的 mysql-server。 表product_cars 有 70 000 000 条记录。

为什么 python 程序这么慢?

Python 程序:

#!/usr/bin/python
import MySQLdb
import re
from MySQLdb import cursors
import shutil
import datetime
import random

db0 = MySQLdb.connect(user="X", passwd="X", db="parts")
cursor0 = db0.cursor()
cursor0.execute('SET NAMES utf8')

now = datetime.datetime.now()
for x in xrange(1, 100000):
    id = random.randint(10, 50000)
    cursor0.execute("SELECT * FROM product_cars WHERE car_id=%s LIMIT 500", [id])
    cursor0.fetchone()

Perl 程序:

#!/usr/bin/perl
use DBI;
my $INSTANCE=$ARGV[0];
my $user = "x";
my $pw = "x";
my $db = DBI->connect( "dbi:mysql:parts", "x", "x");
my $sql= "SELECT * FROM product_cars WHERE car_id=? LIMIT 500";
foreach $_ ( 1 .. 100000 )
{
 $random = int(rand(50000));
 $cursor = $db->prepare($sql);
 $cursor->execute($random) || die $cursor->errstr;
 @Data= $cursor->fetchrow_array();
}

$cursor->finish;
$db->disconnect;

update1

有趣的事情:

总是选择 id=1 的行:< /strong>

清楚 MYSQL 使用缓存和查询会非常快,但同样很慢并且 CPU 利用率为 100%。但相同的 Perl 或 ruby​​ 代码运行速度很快。

如果替换 python 代码中的字符串:

# remove "SET NAMES utf8" string - this has no impact
# python-mysql use "%s", but not "?" as parameter marker
id = 1
for x in xrange(1, 100000):
    id = 1
    cursor0.execute("SELECT * FROM product_cars WHERE car_id=%s LIMIT 500", [id])
    cursor0.fetchone()

perl 中的相同代码:

foreach $_ ( 1 .. 20000 )
{
 $cursor = $db->prepare( "SELECT * FROM product_cars WHERE car_id=? LIMIT 500";);
 $cursor->execute(1);
#    while (my @Data= $cursor->fetchrow_array())
 if ($_ % 1000 == 0) { print "$_\n" };.
 @Data= $cursor->fetchrow_array();
# print "$_\n";
}

ruby​​ 中的代码:

pk=2
20000.times do |i|
    if i % 1000 == 0
        print i, "\n"
    end
    res = my.query("SELECT * FROM product_cars WHERE car_id='#{pk}' LIMIT 500")
    res.fetch_row
end

更新 2

Exec SQL "SELECT * FROM product WHERE id=1" (string without params) 100000 times
Python: ~15 sec 100% CPU 100%
Perl:   ~9 sec CPU 70-90%
Ruby:   ~6 sec CPU 60-80%

其他计算机上的 MySQL 服务器。


更新 3

尝试使用 oursql 和 pymysql - 结果更差。

I use python (Django) for my web-shop.

When I tested high loading (db access) got interesting results:

python 10 process = 200sec / 100% CPU utilisation
perl 10 process  = 65sec / 35% CPU utilisation

Centos 6, python 2.6, mysql 5.5, standard libraries, mysql-server on other server.
Table product_cars have 70 000 000 records.

Why python-program so slow?

Python program:

#!/usr/bin/python
import MySQLdb
import re
from MySQLdb import cursors
import shutil
import datetime
import random

db0 = MySQLdb.connect(user="X", passwd="X", db="parts")
cursor0 = db0.cursor()
cursor0.execute('SET NAMES utf8')

now = datetime.datetime.now()
for x in xrange(1, 100000):
    id = random.randint(10, 50000)
    cursor0.execute("SELECT * FROM product_cars WHERE car_id=%s LIMIT 500", [id])
    cursor0.fetchone()

Perl program:

#!/usr/bin/perl
use DBI;
my $INSTANCE=$ARGV[0];
my $user = "x";
my $pw = "x";
my $db = DBI->connect( "dbi:mysql:parts", "x", "x");
my $sql= "SELECT * FROM product_cars WHERE car_id=? LIMIT 500";
foreach $_ ( 1 .. 100000 )
{
 $random = int(rand(50000));
 $cursor = $db->prepare($sql);
 $cursor->execute($random) || die $cursor->errstr;
 @Data= $cursor->fetchrow_array();
}

$cursor->finish;
$db->disconnect;

update1

Interesting thing:

select always row with id=1:

Сlear that MYSQL use cache and query will be very fast, but again slow and 100% CPU utilisation. But same perl or ruby code work quick.

if replace string in python code:

# remove "SET NAMES utf8" string - this has no impact
# python-mysql use "%s", but not "?" as parameter marker
id = 1
for x in xrange(1, 100000):
    id = 1
    cursor0.execute("SELECT * FROM product_cars WHERE car_id=%s LIMIT 500", [id])
    cursor0.fetchone()

Same code in perl:

foreach $_ ( 1 .. 20000 )
{
 $cursor = $db->prepare( "SELECT * FROM product_cars WHERE car_id=? LIMIT 500";);
 $cursor->execute(1);
#    while (my @Data= $cursor->fetchrow_array())
 if ($_ % 1000 == 0) { print "$_\n" };.
 @Data= $cursor->fetchrow_array();
# print "$_\n";
}

Code in ruby:

pk=2
20000.times do |i|
    if i % 1000 == 0
        print i, "\n"
    end
    res = my.query("SELECT * FROM product_cars WHERE car_id='#{pk}' LIMIT 500")
    res.fetch_row
end

update 2

Exec SQL "SELECT * FROM product WHERE id=1" (string without params) 100000 times
Python: ~15 sec 100% CPU 100%
Perl:   ~9 sec CPU 70-90%
Ruby:   ~6 sec CPU 60-80%

MySQL-server on other machine.


update 3

Tried use oursql and pymysql - worse results.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ら栖息 2024-12-26 14:43:21

正如人们所指出的,两者之间准备和执行语句的方式不同,并且不是推荐的做法。两者都应该利用准备好的语句,并且都应该在循环之外进行准备。

但是,Python MySQL 驱动程序似乎没有利用 服务器端完全准备好了声明。这可能是性能不佳的原因。

MySQL 4.1 中添加了服务器端预准备语句,但某些驱动程序适应起来非常缓慢。 MySQLdb 用户指南 没有提及准备好的语句,并认为“MySQL 中没有游标,也没有参数替换” “自 MySQL 4.1 以来,情况并非如此。它还说“MySQLdb 的 Connection 和 Cursor 对象是用 Python 编写的”,而不是利用 MySQL API。

您可能需要查看 oursql 驱动程序。看起来它是为了利用“新”MySQL API 并让数据库自我优化而编写的。

DBD::mysql(Perl MySQL 驱动程序)可以利用准备好的语句,但它根据文档,默认情况下不会。您必须通过将 mysql_server_prepare=1 添加到您的 dsn 来打开它。这应该会使 Perl 示例运行得更快。或者文档是撒谎的并且默认情况下它们是打开的。

顺便说一句,虽然没有考虑到 2 分钟的差异,但会偏离基准的一件事是生成随机数。他们的成本很高。

Python 代码

#!/usr/bin/python
import random

for x in xrange(1, 100000):
    id = random.randint(0, 50000)

Perl 代码

#!/usr/bin/perl
foreach $_ ( 1 .. 100000 )
{
 $random = int(rand(50000));
}

Python 时间

real    0m0.194s
user    0m0.184s
sys     0m0.008s

Perl 时间

real    0m0.019s
user    0m0.015s
sys     0m0.003s

为了防止这成为更敏感的基准测试中的问题,请改为增加计数器。

As people have pointed out, the way you're preparing and executing statements between the two is not the same and is not the recommended practice. Both should be taking advantage of prepared statements, and both should be preparing outside the loop.

However, it looks like that Python MySQL driver does not take advantage of server side prepared statements at all. This probably accounts for the poor performance.

Server side prepared statements were added in MySQL 4.1, but some drivers have been very slow to adapt. The MySQLdb users guide makes no mention of prepared statements and thinks "there are no cursors in MySQL, and no parameter substitution" which hasn't been true since MySQL 4.1. It also says "MySQLdb's Connection and Cursor objects are written in Python" rather than taking advantage of the MySQL API.

You may want to look at the oursql driver. It looks like it was written to take advantage of the "new" MySQL API and let the database optimize itself.

DBD::mysql (the Perl MySQL driver) can take advantage of prepared statements, but it does not by default according to the documentation. You have to turn it on by adding mysql_server_prepare=1 to your dsn. That should make the Perl example run even faster. Or the documentation is lying and they're on by default.

As an aside, one thing that will throw off benchmarks, though not account for anything like 2 minutes difference, is generating random numbers. They have significant cost.

Python code

#!/usr/bin/python
import random

for x in xrange(1, 100000):
    id = random.randint(0, 50000)

Perl code

#!/usr/bin/perl
foreach $_ ( 1 .. 100000 )
{
 $random = int(rand(50000));
}

Python time

real    0m0.194s
user    0m0.184s
sys     0m0.008s

Perl time

real    0m0.019s
user    0m0.015s
sys     0m0.003s

To keep this from becoming an issue in more sensitive benchmarks, increment a counter instead.

单身狗的梦 2024-12-26 14:43:21

理论上,如果您在循环之前执行 $cursor = $db->prepare($sql); 并简单地重复执行相同的准备好的查询,您的 Perl 代码应该会显着加速。我怀疑 DBI 或 MySQL 只是缓存并忽略了您重复的相同查询准备工作。

另一方面,您的 Python 代码要求每次重新编译不同的查询,因为您没有使用准备好的查询。如果您在循环之前正确准备两个查询,我希望速度差异会消失。顺便说一下,使用准备好的查询也有安全上的好处。

In theory, your Perl code should speed up significantly if you execute $cursor = $db->prepare($sql); before the loop and simply reexecute the same prepared query repeatedly. I suspect either DBI or MySQL has simply cached and ignored your repeated identical query preparations.

Your Python code, on the other hand, demands that different queries be recompiled each time because you aren't using a prepared query. I'd expect the speed difference to evaporate if you prepare both queries properly before their loop. There are security benefits for to using prepared queries as well, by the way.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文