mysql 查询选择键并插入

发布于 2024-10-25 20:05:02 字数 1351 浏览 6 评论 0原文

我有两个表：Articles 存储有关文章的信息，PageLinks 存储页面之间的超链接。架构如下。

CREATE TABLE `Articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `slug` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `label` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `slug_UNIQUE` (`slug`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

CREATE TABLE `PageLinks` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `from_id` int(11) NOT NULL,
  `to_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `index4` (`to_id`,`from_id`),
  KEY `fk_PageLinks_1` (`from_id`),
  KEY `fk_PageLinks_2` (`to_id`),
  CONSTRAINT `fk_PageLinks_1` FOREIGN KEY (`from_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_PageLinks_2` FOREIGN KEY (`to_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

我有几百万个 slug 对，它们指示相应页面之间的超链接。我正在尝试从这些 slug 对加载 PageLinks 表。

目前，我有一个 python 程序，它为每个 slug 发出 select id 查询，以将 slug 对转换为文章 id 对。然后将 id 对写入文件并使用 load data infile 加载。此外，如果 Articles 表中不存在 slug，则程序会插入一个没有标签的虚拟行，然后使用该行的 id。

我正在尝试优化程序以更快地加载条目（我有大约 18GB 的 slug 对要加载）。我相信，如果可以批量执行 slug->id 解析和页面链接插入（从而避免每个 SELECT 开销），则可以实现一定的速度提升。在 mysql 中执行此操作的最佳方法是什么？

原文

I have two tables: Articles that stores information about Articles, and PageLinks that stores hyperlinks between pages. The schema is as below.

CREATE TABLE `Articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `slug` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `label` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `slug_UNIQUE` (`slug`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

CREATE TABLE `PageLinks` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `from_id` int(11) NOT NULL,
  `to_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `index4` (`to_id`,`from_id`),
  KEY `fk_PageLinks_1` (`from_id`),
  KEY `fk_PageLinks_2` (`to_id`),
  CONSTRAINT `fk_PageLinks_1` FOREIGN KEY (`from_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_PageLinks_2` FOREIGN KEY (`to_id`) REFERENCES `Articles` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

I have several million slug pairs that indicate hyperlinks between corresponding pages. I am trying to load the PageLinks table from these slug pairs.

Currently, i have a python program that issues select id queries for each slug to convert the slug pairs to Article id pairs. The id pairs are then written to a file and loaded using load data infile. Additionally if a slug does not exist in the Articles table, the program inserts a dummy row without label and then uses id of that row.

I am trying to optimize the program to load entries faster (I have around 18GB of slug pairs to load). I believe some speed up can be achieved if it is possible to do slug->id resolution and page link insertion together in bulk (thus avoiding the per SELECT overhead). What is the best possible way to do this in mysql?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何以笙箫默 2024-11-01 20:05:02

为每个 slug 单独创建一个 SELECT 确实效率很低。

您应该将 slug 对加载到表中：

CREATE TABLE pairs
        (
        slug1 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
        slug2 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
        );

，将其与您的对一起加载，然后发出以下语句：

INSERT IGNORE
INTO    Articles (slug)
SELECT  slug1
FROM    pairs;

INSERT IGNORE
INTO    Articles (slug)
SELECT  slug2
FROM    pairs;

INSERT
INTO    pairs (from_id, to_id)
SELECT  a1.id, a2.id
FROM    pairs
JOIN    articles a1
ON      a1.slug = slug1
JOIN    articles a2
ON      a2.slug = slug2;

Making a separate SELECT for each slug is inefficient indeed.

You should load your slug pairs into a table:

CREATE TABLE pairs
        (
        slug1 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
        slug2 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
        );

, load it with your pairs, then issue the following statements:

INSERT IGNORE
INTO    Articles (slug)
SELECT  slug1
FROM    pairs;

INSERT IGNORE
INTO    Articles (slug)
SELECT  slug2
FROM    pairs;

INSERT
INTO    pairs (from_id, to_id)
SELECT  a1.id, a2.id
FROM    pairs
JOIN    articles a1
ON      a1.slug = slug1
JOIN    articles a2
ON      a2.slug = slug2;

回复收藏 0 原文

~没有更多了~