mysql 查询选择键并插入
我有两个表:Articles
存储有关文章的信息,PageLinks
存储页面之间的超链接。架构如下。
CREATE TABLE `Articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`slug` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`label` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `slug_UNIQUE` (`slug`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `PageLinks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_id` int(11) NOT NULL,
`to_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index4` (`to_id`,`from_id`),
KEY `fk_PageLinks_1` (`from_id`),
KEY `fk_PageLinks_2` (`to_id`),
CONSTRAINT `fk_PageLinks_1` FOREIGN KEY (`from_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_PageLinks_2` FOREIGN KEY (`to_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
我有几百万个 slug 对,它们指示相应页面之间的超链接。我正在尝试从这些 slug 对加载 PageLinks 表。
目前,我有一个 python 程序,它为每个 slug 发出 select id
查询,以将 slug 对转换为文章 id 对。然后将 id 对写入文件并使用 load data infile
加载。此外,如果 Articles 表中不存在 slug,则程序会插入一个没有标签的虚拟行,然后使用该行的 id。
我正在尝试优化程序以更快地加载条目(我有大约 18GB 的 slug 对要加载)。我相信,如果可以批量执行 slug->id 解析和页面链接插入(从而避免每个 SELECT 开销),则可以实现一定的速度提升。在 mysql 中执行此操作的最佳方法是什么?
I have two tables: Articles
that stores information about Articles, and PageLinks
that stores hyperlinks between pages. The schema is as below.
CREATE TABLE `Articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`slug` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`label` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `slug_UNIQUE` (`slug`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
CREATE TABLE `PageLinks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_id` int(11) NOT NULL,
`to_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index4` (`to_id`,`from_id`),
KEY `fk_PageLinks_1` (`from_id`),
KEY `fk_PageLinks_2` (`to_id`),
CONSTRAINT `fk_PageLinks_1` FOREIGN KEY (`from_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_PageLinks_2` FOREIGN KEY (`to_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
I have several million slug pairs that indicate hyperlinks between corresponding pages. I am trying to load the PageLinks table from these slug pairs.
Currently, i have a python program that issues select id
queries for each slug to convert the slug pairs to Article id pairs. The id pairs are then written to a file and loaded using load data infile
. Additionally if a slug does not exist in the Articles table, the program inserts a dummy row without label and then uses id of that row.
I am trying to optimize the program to load entries faster (I have around 18GB of slug pairs to load). I believe some speed up can be achieved if it is possible to do slug->id resolution and page link insertion together in bulk (thus avoiding the per SELECT overhead). What is the best possible way to do this in mysql?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
为每个 slug 单独创建一个 SELECT 确实效率很低。
您应该将 slug 对加载到表中:
,将其与您的对一起加载,然后发出以下语句:
Making a separate
SELECT
for each slug is inefficient indeed.You should load your slug pairs into a table:
, load it with your pairs, then issue the following statements: