加快 ORM 的数据库插入速度

发布于 2024-08-05 02:40:25 字数 3909 浏览 2 评论 0原文

我有一个 Django 视图,它在循环中创建 500-5000 个新数据库插入。问题是,它真的很慢!我在 Postgres 8.3 上每分钟大约插入 100 次。我们过去在较小的硬件(较小的 EC2 实例)上使用 MySQL,并且从未遇到过此类速度问题。

细节: Ubuntu Server 9.04 上的 Postgres 8.3。 服务器是一个“大型”Amazon EC2,数据库位于 EBS (ext3) - 11GB/20GB。

这是我的 postgresql.conf 的一些内容——如果您需要更多,请告诉我

shared_buffers = 4000MB
effective_cache_size = 7128MB

我的 python:

for k in kw:
        k = k.lower()
        p = ProfileKeyword(profile=self)
        logging.debug(k)
        p.keyword, created = Keyword.objects.get_or_create(keyword=k, defaults={'keyword':k,})
        if not created and ProfileKeyword.objects.filter(profile=self, keyword=p.keyword).count():
            #checking created is just a small optimization to save some database hits on new keywords
            pass #duplicate entry
        else:
            p.save()

顶部的一些输出:

top - 16:56:22 up 21 days, 20:55,  4 users,  load average: 0.99, 1.01, 0.94
Tasks:  68 total,   1 running,  67 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.8%us,  0.2%sy,  0.0%ni, 90.5%id,  0.7%wa,  0.0%hi,  0.0%si,  2.8%st
Mem:  15736360k total, 12527788k used,  3208572k free,   332188k buffers
Swap:        0k total,        0k used,        0k free, 11322048k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                            
14767 postgres  25   0 4164m 117m 114m S   22  0.8   2:52.00 postgres                                                                                                                                            
    1 root      20   0  4024  700  592 S    0  0.0   0:01.09 init                                                                                                                                                
    2 root      RT   0     0    0    0 S    0  0.0   0:11.76 migration/0                                                                                                                                         
    3 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0                                                                                                                                         
    4 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0                                                                                                                                          
    5 root      10  -5     0    0    0 S    0  0.0   0:00.08 events/0                                                                                                                                            
    6 root      11  -5     0    0    0 S    0  0.0   0:00.00 khelper                                                                                                                                             
    7 root      10  -5     0    0    0 S    0  0.0   0:00.00 kthread                                                                                                                                             
    9 root      10  -5     0    0    0 S    0  0.0   0:00.00 xenwatch                                                                                                                                            
   10 root      10  -5     0    0    0 S    0  0.0   0:00.00 xenbus                                                                                                                                              
   18 root      RT  -5     0    0    0 S    0  0.0   0:11.84 migration/1                                                                                                                                         
   19 root      34  19     0    0    0 S    0  0.0   0:00.01 ksoftirqd/1 

如果有任何其他详细信息会有帮助,请告诉我。

I have a Django view which creates 500-5000 new database INSERTS in a loop. Problem is, it is really slow! I'm getting about 100 inserts per minute on Postgres 8.3. We used to use MySQL on lesser hardware (smaller EC2 instance) and never had these types of speed issues.

Details:
Postgres 8.3 on Ubuntu Server 9.04.
Server is a "large" Amazon EC2 with database on EBS (ext3) - 11GB/20GB.

Here is some of my postgresql.conf -- let me know if you need more

shared_buffers = 4000MB
effective_cache_size = 7128MB

My python:

for k in kw:
        k = k.lower()
        p = ProfileKeyword(profile=self)
        logging.debug(k)
        p.keyword, created = Keyword.objects.get_or_create(keyword=k, defaults={'keyword':k,})
        if not created and ProfileKeyword.objects.filter(profile=self, keyword=p.keyword).count():
            #checking created is just a small optimization to save some database hits on new keywords
            pass #duplicate entry
        else:
            p.save()

Some output from top:

top - 16:56:22 up 21 days, 20:55,  4 users,  load average: 0.99, 1.01, 0.94
Tasks:  68 total,   1 running,  67 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.8%us,  0.2%sy,  0.0%ni, 90.5%id,  0.7%wa,  0.0%hi,  0.0%si,  2.8%st
Mem:  15736360k total, 12527788k used,  3208572k free,   332188k buffers
Swap:        0k total,        0k used,        0k free, 11322048k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                            
14767 postgres  25   0 4164m 117m 114m S   22  0.8   2:52.00 postgres                                                                                                                                            
    1 root      20   0  4024  700  592 S    0  0.0   0:01.09 init                                                                                                                                                
    2 root      RT   0     0    0    0 S    0  0.0   0:11.76 migration/0                                                                                                                                         
    3 root      34  19     0    0    0 S    0  0.0   0:00.00 ksoftirqd/0                                                                                                                                         
    4 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0                                                                                                                                          
    5 root      10  -5     0    0    0 S    0  0.0   0:00.08 events/0                                                                                                                                            
    6 root      11  -5     0    0    0 S    0  0.0   0:00.00 khelper                                                                                                                                             
    7 root      10  -5     0    0    0 S    0  0.0   0:00.00 kthread                                                                                                                                             
    9 root      10  -5     0    0    0 S    0  0.0   0:00.00 xenwatch                                                                                                                                            
   10 root      10  -5     0    0    0 S    0  0.0   0:00.00 xenbus                                                                                                                                              
   18 root      RT  -5     0    0    0 S    0  0.0   0:11.84 migration/1                                                                                                                                         
   19 root      34  19     0    0    0 S    0  0.0   0:00.01 ksoftirqd/1 

Let me know if any other details would be helpful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

挖个坑埋了你 2024-08-12 02:40:25

像这样的批量操作缓慢的一个常见原因是每次插入都发生在自己的事务中。如果您可以在一次事务中实现所有这些操作,那么速度可能会快得多。

One common reason for slow bulk operations like this is each insert happening in its own transaction. If you can get all of them to happen in a single transaction, it could go much faster.

楠木可依 2024-08-12 02:40:25

首先,ORM 操作总是比纯 SQL 慢。我曾经用 ORM 代码编写了一个大型数据库的更新并将其设置为运行,但几个小时后就退出了,因为它只完成了一小部分。用 SQL 重写后,整个过程不到一分钟就完成了。

其次,请记住,这里的代码为数据集中的每一行执行最多四个单独的数据库操作 - get_or_create 中的 get,也可能是 create、对过滤器进行计数,最后保存。这是大量的数据库访问。

请记住,最多 5000 个对象并不是很大,您应该能够在开始时将整个数据集读入内存。然后,您可以执行一个 filter 一次性获取所有现有的 Keyword 对象,从而节省了 Keyword get_or_create 中的大量查询,并且还避免了实例化重复项的需要个人资料关键词放在首位。

Firstly, ORM operations are always going to be slower than pure SQL. I once wrote an update to a large database in ORM code and set it running, but quit it after several hours when it had completed only a tiny fraction. After rewriting it in SQL the whole thing ran in less than a minute.

Secondly, bear in mind that your code here is doing up to four separate database operations for every row in your data set - the get in get_or_create, possibly also the create, the count on the filter, and finally the save. That's a lot of database access.

Bearing in mind that a maximum of 5000 objects is not huge, you should be able to read the whole dataset into memory at the start. Then you can do a single filter to get all the existing Keyword objects in one go, saving a huge number of queries in the Keyword get_or_create and also avoiding the need to instantiate duplicate ProfileKeywords in the first place.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文