MySQL LOAD DATA INFILE 帮助

发布于 2024-07-30 11:08:16 字数 1885 浏览 6 评论 0原文

我想加载一个如下所示的 CSV 文件:

Acct. No.,1-15 Days,16-30 Days,31-60 Days,61-90 Days,91-120 Days,Beyond 120 Days
2314134101,898.89,8372.16,5584.23,7744.41,9846.54,2896.25
2414134128,5457.61,7488.26,9594.02,6234.78,273.7,2356.13
2513918869,2059.59,7578.59,9395.51,7159.15,5827.48,3041.62
1687950783,4846.85,8364.22,9892.55,7213.45,8815.33,7603.4
2764856043,5250.11,9946.49,8042.03,6058.64,9194.78,8296.2
2865446086,596.22,7670.04,8564.08,3263.85,9662.46,7027.22
,4725.99,1336.24,9356.03,1572.81,4942.11,6088.94
,8248.47,956.81,8713.06,2589.14,5316.68,1543.67
,538.22,1473.91,3292.09,6843.89,2687.07,9808.05
,9885.85,2730.72,6876,8024.47,1196.87,1655.29

但如果您注意到,某些字段不完整。 我认为 MySQL 会跳过第一列缺失的行。 当我运行命令时:

LOAD DATA LOCAL INFILE 'test-long.csv' REPLACE INTO TABLE accounts
    FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'
    IGNORE 1 LINES
    (cf_535, cf_580, cf_568, cf_569, cf_571, cf_572);

MySQL 输出为:

Query OK, 41898 rows affected, 20948 warnings (0.78 sec)
Records: 20949  Deleted: 20949  Skipped: 0  Warnings: 20948

行数仅为 20,949,但 MySQL 报告有 41,898 行受到影响。 为什么这样? 此外,表中没有任何真正的变化。 我也看不出生成的警告是什么。 我想使用 LOAD DATA INFILE 因为 python 需要半秒来更新每一行,对于包含 20,000 多条记录的文件来说,这意味着需要 2.77 小时。

更新:修改了代码以将自动提交设置为“False”并添加了 db.commit() 语句:

# Tell MySQLdb to turn off auto-commit
db.autocommit(False) 

# Set count to 1
count = 1
while count < len(contents):
    if contents[count][0] != '':
        cursor.execute("""
            UPDATE accounts SET cf_580 = %s, cf_568 = %s, cf_569 = %s, cf_571 = %s, cf_572 = %s
            WHERE cf_535 = %s""" % (contents[count][1], contents[count][2], contents[count][3], contents[count][4], contents[count][5], contents[count][0]))
    count += 1

try:
    db.commit()
except:
    db.rollback()

I want to load a CSV file that looks like this:

Acct. No.,1-15 Days,16-30 Days,31-60 Days,61-90 Days,91-120 Days,Beyond 120 Days
2314134101,898.89,8372.16,5584.23,7744.41,9846.54,2896.25
2414134128,5457.61,7488.26,9594.02,6234.78,273.7,2356.13
2513918869,2059.59,7578.59,9395.51,7159.15,5827.48,3041.62
1687950783,4846.85,8364.22,9892.55,7213.45,8815.33,7603.4
2764856043,5250.11,9946.49,8042.03,6058.64,9194.78,8296.2
2865446086,596.22,7670.04,8564.08,3263.85,9662.46,7027.22
,4725.99,1336.24,9356.03,1572.81,4942.11,6088.94
,8248.47,956.81,8713.06,2589.14,5316.68,1543.67
,538.22,1473.91,3292.09,6843.89,2687.07,9808.05
,9885.85,2730.72,6876,8024.47,1196.87,1655.29

But if you notice, some of the fields are incomplete. I'm thinking MySQL will just skip the row where the first column is missing. When I run the command:

LOAD DATA LOCAL INFILE 'test-long.csv' REPLACE INTO TABLE accounts
    FIELDS TERMINATED BY ',' LINES TERMINATED BY '\r\n'
    IGNORE 1 LINES
    (cf_535, cf_580, cf_568, cf_569, cf_571, cf_572);

And the MySQL output is:

Query OK, 41898 rows affected, 20948 warnings (0.78 sec)
Records: 20949  Deleted: 20949  Skipped: 0  Warnings: 20948

The number of lines is only 20,949 but MySQL reports it as 41,898 rows affected. Why so? Also, nothing really changed in the table. I also couldn't see what the warnings generated is all about. I wanted to use the LOAD DATA INFILE because it takes python half a second to update each row which translates to 2.77 hours for a file with 20,000+ records.

UPDATE: Modified the code to set auto-commit to 'False' and added a db.commit() statement:

# Tell MySQLdb to turn off auto-commit
db.autocommit(False) 

# Set count to 1
count = 1
while count < len(contents):
    if contents[count][0] != '':
        cursor.execute("""
            UPDATE accounts SET cf_580 = %s, cf_568 = %s, cf_569 = %s, cf_571 = %s, cf_572 = %s
            WHERE cf_535 = %s""" % (contents[count][1], contents[count][2], contents[count][3], contents[count][4], contents[count][5], contents[count][0]))
    count += 1

try:
    db.commit()
except:
    db.rollback()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

真心难拥有 2024-08-06 11:08:16

你这里基本上有3个问题。 相反的顺序

  1. 你是否在单独的语句中进行 Python 插入? 您可能希望用开始事务/提交来包围它们。 20,000 次提交很容易需要几个小时。
  2. 您的导入语句定义了 6 个字段,但 CSV 有 7 个字段。 这可以解释双行计数:每行输入都会在数据库中产生 2 行,第二行的字段 2-6 为空。
  3. 将插入不完整的行,其中缺失的列为空值或默认值。 对于那些格式错误的行,这可能不是您想要的。

如果您的 python 程序即使在单个事务中也无法足够快地执行,那么您至少应该让 python 程序在导入之前编辑/清理数据文件。 如果帐户。 编号是主键,这似乎是合理的,插入空白行将导致整个导入失败,或者如果打开自动编号,则会导致导入虚假数据。

You have basically 3 issues here. In reverse order

  1. Are you doing your Python inserts in individual statements? You probably want to surround them all with a begin transaction/commit. 20,000 commits could easily take hours.
  2. Your import statement defines 6 fields, but the CSV has 7 fields. That would explain the double row count: every line of input results in 2 rows in the database, the 2nd one with fields 2-6 null.
  3. Incomplete rows will be inserted with null or default values for the missing columns. This may not be what you want with those malformed rows.

If your python program can't perform fast enough even with a single transaction, you should at least have the python program edit/clean the data file before importing. If Acct. No. is the primary key, as seems reasonable, inserting rows with blank will either cause the whole import to fail, or if auto number is on, cause bogus data to be imported.

丑丑阿 2024-08-06 11:08:16

如果在 LOAD DATA 中使用 REPLACE 关键字,则“Deleted:”后面的数字显示实际替换的行数

If you use REPLACE keyword in LOAD DATA, then number after "Deleted: " shows how many rows were actually replaced

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文