在 django 中使用原始 sql 进行初始插入后 MySQL LOAD DATA INFILE 速度变慢

发布于 2024-10-24 18:59:15 字数 1543 浏览 10 评论 0原文

我正在使用以下自定义处理程序在 django 中使用原始 sql 以及带有 innodb 表的 MySQLdb 后端进行批量插入:

def handle_ttam_file_for(f, subject_pi):
    import datetime
    write_start = datetime.datetime.now()

    print "write to disk start: ", write_start
    destination = open('temp.ttam', 'wb+')
    for chunk in f.chunks():
        destination.write(chunk)
    destination.close()
    print "write to disk end", (datetime.datetime.now() - write_start)

    subject = Subject.objects.get(id=subject_pi)

    def my_custom_sql():
        from django.db import connection, transaction
        cursor = connection.cursor()

        statement = "DELETE FROM ttam_genotypeentry WHERE subject_id=%i;" % subject.pk
        del_start = datetime.datetime.now()
        print "delete start: ", del_start
        cursor.execute(statement)
        print "delete end", (datetime.datetime.now() - del_start)

        statement = "LOAD DATA LOCAL INFILE 'temp.ttam' INTO TABLE ttam_genotypeentry IGNORE 15 LINES (snp_id, @dummy1, @dummy2, genotype) SET subject_id=%i;" % subject.pk

        ins_start = datetime.datetime.now()
        print "insert start: ", ins_start
        cursor.execute(statement)
        print "insert end", (datetime.datetime.now() - ins_start)
        transaction.commit_unless_managed()

    my_custom_sql()

上传的文件有 500k 行,大小约为 15M。

随着文件的添加,加载时间似乎逐渐变长。

Insert times:
1st:    30m
2nd:    50m
3rd:    1h20m
4th:    1h30m
5th:    1h35m

我想知道随着添加恒定大小(# 行)的文件,加载时间变长是否正常,以及是否有任何方法可以提高批量插入的性能。

I'm using the following custom handler for doing bulk insert using raw sql in django with a MySQLdb backend with innodb tables:

def handle_ttam_file_for(f, subject_pi):
    import datetime
    write_start = datetime.datetime.now()

    print "write to disk start: ", write_start
    destination = open('temp.ttam', 'wb+')
    for chunk in f.chunks():
        destination.write(chunk)
    destination.close()
    print "write to disk end", (datetime.datetime.now() - write_start)

    subject = Subject.objects.get(id=subject_pi)

    def my_custom_sql():
        from django.db import connection, transaction
        cursor = connection.cursor()

        statement = "DELETE FROM ttam_genotypeentry WHERE subject_id=%i;" % subject.pk
        del_start = datetime.datetime.now()
        print "delete start: ", del_start
        cursor.execute(statement)
        print "delete end", (datetime.datetime.now() - del_start)

        statement = "LOAD DATA LOCAL INFILE 'temp.ttam' INTO TABLE ttam_genotypeentry IGNORE 15 LINES (snp_id, @dummy1, @dummy2, genotype) SET subject_id=%i;" % subject.pk

        ins_start = datetime.datetime.now()
        print "insert start: ", ins_start
        cursor.execute(statement)
        print "insert end", (datetime.datetime.now() - ins_start)
        transaction.commit_unless_managed()

    my_custom_sql()

The uploaded file has 500k rows and is ~ 15M in size.

The load times seem to get progressively longer as files are added.

Insert times:
1st:    30m
2nd:    50m
3rd:    1h20m
4th:    1h30m
5th:    1h35m

I was wondering if it is normal for load times to get longer as files of constant size (# rows) are added and if there is anyway to improve performance of bulk inserts.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

只是我以为 2024-10-31 18:59:15

我发现批量插入到 innodb 表的主要问题是我忽略了 mysql innodb 设置。

对于我的 mysql 版本,innodb_buffer_pool_size 的设置默认为 8M,随着表大小的增长,速度会大幅减慢。

innodb-performance-optimization-basics

choosing-innodb_buffer_pool_size

如果使用专用 mysql 服务器,根据文章建议的大小是内存的 70% 到 80%。增加缓冲池大小后,我的插入时间从一个多小时缩短到不到 10 分钟,没有其他任何变化。

我能够做出的另一个更改是在 LOAD DATA 语句中使用 LOCAL 参数(感谢 @f00)。我之前的问题是,当我尝试让 mysql 访问 django 上传的文件时,我不断收到文件未找到或无法获取统计错误的信息。

事实证明,这与使用 ubuntu 以及这个 bug。

  1. 选择一个允许 mysqld 加载文件的目录。
    也许只有某个地方可写
    您的 DBA 帐户并且只能由以下人员读取
    mysql 组的成员?

  2. sudo aa-complain /usr/sbin/mysqld

  3. 尝试从指定的加载目录加载文件:'load
    数据输入文件
    '/var/opt/mysql-load/import.csv' 到
    表...'

  4. sudo aa-logprof aa-logprof 将识别访问冲突
    由“加载数据文件”触发
    ...' 查询,并以交互方式引导您
    通过允许将来访问。
    您可能想选择 Glob
    菜单,以便您最终阅读
    访问“/var/opt/mysql-load/*”。
    一旦您选择了正确的
    (glob) 模式,从中选择允许
    菜单来完成。 (注意:不要
    出现提示时启用存储库
    第一次跑步时就这样做
    aa-logprof,除非你真的
    了解整个apparmor
    过程。)

  5. sudo aa-enforce /usr/sbin/mysqld

  6. 尝试再次加载您的文件。这次应该可以了。

I found the main issue with bulk inserting to my innodb table was a mysql innodb setting I had overlooked.

The setting for innodb_buffer_pool_size is default 8M for my version of mysql and causing a huge slow down as my table size grew.

innodb-performance-optimization-basics

choosing-innodb_buffer_pool_size

The recommended size according to the articles is 70 to 80 percent of the memory if using a dedicated mysql server. After increasing the buffer pool size, my inserts went from an hour+ to less than 10 minutes with no other changes.

Another change I was able to make was getting ride of the LOCAL argument in the LOAD DATA statement (thanks @f00). My problem before was that i kept getting file not found, or cannot get stat errors when trying to have mysql access the file django uploaded.

Turns out this is related to using ubuntu and this bug.

  1. Pick a directory from which mysqld should be allowed to load files.
    Perhaps somewhere writable only by
    your DBA account and readable only by
    members of group mysql?

  2. sudo aa-complain /usr/sbin/mysqld

  3. Try to load a file from your designated loading directory: 'load
    data infile
    '/var/opt/mysql-load/import.csv' into
    table ...'

  4. sudo aa-logprof aa-logprof will identify the access violation
    triggered by the 'load data infile
    ...' query, and interactively walk you
    through allowing access in the future.
    You probably want to choose Glob from
    the menu, so that you end up with read
    access to '/var/opt/mysql-load/*'.
    Once you have selected the right
    (glob) pattern, choose Allow from the
    menu to finish up. (N.B. Do not
    enable the repository when prompted to
    do so the first time you run
    aa-logprof, unless you really
    understand the whole apparmor
    process.)

  5. sudo aa-enforce /usr/sbin/mysqld

  6. Try to load your file again. It should work this time.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文