MySQL csv 导入增量减慢(替代方案?)

发布于 2024-11-10 03:05:21 字数 1498 浏览 6 评论 0原文

我有下表:

CREATE  TABLE `Product_Category_Rank` (
  `CategoryId` SMALLINT(5) UNSIGNED NOT NULL ,
  `ProductId` VARCHAR(32) NOT NULL ,
  `RankedOn` DATE NOT NULL ,
  `PopularityRank` SMALLINT(5) NOT NULL ,
  PRIMARY KEY (`CategoryId`, `ProductId`, `RankedOn`) ,
  INDEX `fk_Product_Category_Rank_Product` (`ProductId` ASC) ,
  INDEX `fk_Product_Category_Rank_Category` (`CategoryId` ASC) ,
  CONSTRAINT `fk_Product_Category_Rank_Category`
    FOREIGN KEY (`CategoryId` )
    REFERENCES `Category` (`CategoryId` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_Product_Category_Rank_Product`
    FOREIGN KEY (`ProductId` )
    REFERENCES `Product` (`ProductId` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci

我有一个 csv 文件(我每天导入该文件),其中包含不到 30,000 条记录。

我已经尝试过:

LOAD DATA LOCAL INFILE 'temp/product_category_rank.csv'
INTO TABLE `Product_Category_Rank`
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\n';

我也尝试过将文件加载到临时表中并执行以下操作:

INSERT
  INTO `Product_Category_Rank` ( ... )
SELECT ...
  FROM `tmp_product_category_rank`

两种方法效果相同(结果相同),但是我确信第二种方法的资源成本稍高。我遇到的问题是,加载/插入选择功能的导入时间每天都会增加约 3 秒。因此,第 30 天,我需要 90 秒的时间来直接导入到表中。

事实:导入永远不会有重复的主键(当然,除非我尝试导入相同的数据两次,这不是问题)

那么有没有办法加快这个过程(也许关闭导入之前进行键检查?(如何?))

编辑:还要注意,不需要外键检查,因为类别和产品表已经更新,并且外键约束不是导入的问题。

I have the following table:

CREATE  TABLE `Product_Category_Rank` (
  `CategoryId` SMALLINT(5) UNSIGNED NOT NULL ,
  `ProductId` VARCHAR(32) NOT NULL ,
  `RankedOn` DATE NOT NULL ,
  `PopularityRank` SMALLINT(5) NOT NULL ,
  PRIMARY KEY (`CategoryId`, `ProductId`, `RankedOn`) ,
  INDEX `fk_Product_Category_Rank_Product` (`ProductId` ASC) ,
  INDEX `fk_Product_Category_Rank_Category` (`CategoryId` ASC) ,
  CONSTRAINT `fk_Product_Category_Rank_Category`
    FOREIGN KEY (`CategoryId` )
    REFERENCES `Category` (`CategoryId` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_Product_Category_Rank_Product`
    FOREIGN KEY (`ProductId` )
    REFERENCES `Product` (`ProductId` )
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci

I have a csv file (which I am importing daily) that contains just under 30,000 records.

I have tried:

LOAD DATA LOCAL INFILE 'temp/product_category_rank.csv'
INTO TABLE `Product_Category_Rank`
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\n';

As well I have tried loading the file into a temporary table and doing:

INSERT
  INTO `Product_Category_Rank` ( ... )
SELECT ...
  FROM `tmp_product_category_rank`

Both methods work equally (same results), however the second I am sure has a slightly higher resource cost. The problem I am running into is that every day the import time on the load / insert-select functionality is increasing by about 3 seconds / day. So day 30 I am looking at 90 seconds to do a straight import into the table.

Facts: the import will never have duplicate primary keys (unless of course I try to import the same data twice, which is not an issue)

So is there a way to speed up this process (maybe turning off key checking before the import? (how?))

edit: Also to note that the foreign key checks are not required either as the Category and Product tables have already been updated and the foreign key constraints are not an issue with the import.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

時窥 2024-11-17 03:05:21

解决方案:(显然与查询无关)

MySQL服务器变量:

innodb_buffer_pool_size = 8MB(改为2GB)
innodb_log_file_size = 5MB(更改为256MB)
innodb_log_buffer_size = 1MB(更改为4MB)
innodb_flush_log_at_trx_commit = 1(更改为 2)

这些新设置基于以下位置的一篇文章:
http://www.mysqlperformanceblog.com/2007/11/ 01/innodb-performance-optimization-basics/

现在,当我运行导入时,导入速度每天都不会降低。我已经导入了30天,每次导入的速度都和上次一样。所以这实际上是一个服务器优化问题。

Solution: (Apparently had nothing to do with the query)

MySQL server variables:

innodb_buffer_pool_size = 8MB (changed to 2GB)
innodb_log_file_size = 5MB (changed to 256MB)
innodb_log_buffer_size = 1MB (changed to 4MB)
innodb_flush_log_at_trx_commit = 1 (changed to 2)

These new settings are based on an article found at :
http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/

Now as I run the import, there is no degradation to the speed of the imports on a day to day basis. I have imported 30 days and each import is the same speed as the last. So it was actually a server optimization issue.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文