在 1 个查询中更新多行及其所有父行的多对多计数器缓存

发布于 2024-10-04 01:08:39 字数 1772 浏览 0 评论 0原文

考虑一个博客应用程序,其中包含帖子、类别表和将帖子与一个或多个类别链接的查找表。类别是有层次的。帖子可以分配给任何类别,而不仅仅是叶节点。

类别表有一个 post_count 字段,用于缓存分配给特定类别的帖子数量。它还具有 MPTT 的 parent_idlftrght 列。

但它还有一个 under_post_count 字段,用于缓存分配给它或其任何子类别的不同帖子的数量。这很有用,因此您可以显示类别的分层列表,并在其旁边显示分配给该类别或其子级之一的帖子数量

我的应用程序已经达到这样的程度:在创建带有类别的帖子、编辑其类别或删除具有类别的帖子后,我有一个新旧类别的类别 ID 列表,其 post_count 字段需要更新。我希望接下来可以做的是在一个查询中,更新所有已识别类别及其所有父类别的 under_post_count 字段,以及分配给每个类别的不同帖子的数量或它的任何一个孩子

以下是创建表和类别的一些测试数据所需的 SQL:

CREATE TABLE `categories` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent_id` int(11) DEFAULT NULL,
  `lft` int(11) DEFAULT NULL,
  `rght` int(11) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `post_count` int(11) NOT NULL DEFAULT '0',
  `under_post_count` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM;

CREATE TABLE `categories_posts` (
  `category_id` int(11) NOT NULL,
  `post_id` int(11) NOT NULL,
  PRIMARY KEY (`category_id`,`post_id`)
) ENGINE=MyISAM;

INSERT INTO `categories` (`id`, `parent_id`, `lft`, `rght`, `name`) VALUES
(1, NULL, 1, 8, 'Cat 1'),
(4, 1, 2, 3, 'Cat 1.1'),
(5, 1, 4, 5, 'Cat 1.2'),
(6, 1, 6, 7, 'Cat 1.3'),
(2, NULL, 9, 16, 'Cat 2'),
(7, 2, 10, 11, 'Cat 2.1'),
(8, 2, 12, 13, 'Cat 2.2'),
(9, 2, 14, 15, 'Cat 2.3'),
(3, NULL, 17, 24, 'Cat 3'),
(10, 3, 18, 19, 'Cat 3.1'),
(11, 3, 20, 21, 'Cat 3.2'),
(12, 3, 22, 23, 'Cat 3.3');

运行几次,为 categories_posts 表创建一些测试数据:

INSERT IGNORE INTO `categories_posts` (`category_id`, `post_id`) 
SELECT `id`, CEILING(10 * RAND()) FROM `categories` ORDER BY RAND() LIMIT 6

任何人都可以弄清楚这一点,非常感谢您的帮助?

Consider a blog application which has tables for posts, categories and a lookup table linking a post with one or more categories. Categories are hierarchical. Posts can be assigned to any category, not just leaf nodes.

The categories table has a post_count field which caches the number of posts assigned to the specific category. It also has parent_id, lft and rght columns for MPTT.

But it also has a under_post_count field which caches the number of distinct posts assigned to it or any of it's child categories. This is useful so you can display a hierarchical list of categories with the number of posts assigned to it, or one of its children, next to it.

My application has got to the point where after a post is created with categories, or it's categories are edited or one which had categories is deleted, I have a list of category IDs of the old and new categories, whose post_count field needs updating. What I was hoping I could do next is in a single query, update the under_post_count fields for all those categories identified, and all their parents, with the number of distinct posts assigned to each category or any of it's children.

Here's the SQL required to create the tables and some test data for categories:

CREATE TABLE `categories` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent_id` int(11) DEFAULT NULL,
  `lft` int(11) DEFAULT NULL,
  `rght` int(11) DEFAULT NULL,
  `name` varchar(255) NOT NULL,
  `post_count` int(11) NOT NULL DEFAULT '0',
  `under_post_count` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM;

CREATE TABLE `categories_posts` (
  `category_id` int(11) NOT NULL,
  `post_id` int(11) NOT NULL,
  PRIMARY KEY (`category_id`,`post_id`)
) ENGINE=MyISAM;

INSERT INTO `categories` (`id`, `parent_id`, `lft`, `rght`, `name`) VALUES
(1, NULL, 1, 8, 'Cat 1'),
(4, 1, 2, 3, 'Cat 1.1'),
(5, 1, 4, 5, 'Cat 1.2'),
(6, 1, 6, 7, 'Cat 1.3'),
(2, NULL, 9, 16, 'Cat 2'),
(7, 2, 10, 11, 'Cat 2.1'),
(8, 2, 12, 13, 'Cat 2.2'),
(9, 2, 14, 15, 'Cat 2.3'),
(3, NULL, 17, 24, 'Cat 3'),
(10, 3, 18, 19, 'Cat 3.1'),
(11, 3, 20, 21, 'Cat 3.2'),
(12, 3, 22, 23, 'Cat 3.3');

Run this a few times to create some test data for the categories_posts table:

INSERT IGNORE INTO `categories_posts` (`category_id`, `post_id`) 
SELECT `id`, CEILING(10 * RAND()) FROM `categories` ORDER BY RAND() LIMIT 6

Can anyone figure this out, your help would be much appreciated?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

新一帅帅 2024-10-11 01:08:39

好吧,这里有几种给猫剥皮的方法(假设是 5.1 和触发器)

  • 您可以从应用程序层更新所有内容

  • 您可以触发对 post_count 的更新categories_posts 并触发更新(级联)从 categories

    under_post_count

  • 最后,您可以触发来自 的所有更新>categories_posts

另外,根据类别的实际数量,您可能不需要将 under_post_count 反规范化为获取它应该相当容易且便宜 获取

SELECT c.id, SUM(cc.post_count) 
FROM categories c 
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
GROUP BY c.id;

精确匹配的实际计数

SELECT c.id, COUNT(*) 
FROM categories c 
LEFT JOIN categories_posts cp ON c.id = cp.post_id 
GROUP BY c.id;

结合两者给出包括层次结构的计数

SELECT c.id, COUNT(*) 
FROM categories c 
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
LEFT JOIN categories_posts cp ON cc.id = cp.post_id
GROUP BY c.id;

编辑

从上面构造更新语句应该不那么困难

UPDATE categories 
SET post_count = (SELECT COUNT(*) 
                  FROM categories_posts cp 
                  WHERE cp.post_id = categories.id)

应该适用post_count

under_post_count 的情况有所不同,因为 mysql 不喜欢听到在 where 部分提到目标表,因此你必须做一些像这样的怪物

UPDATE categories LEFT JOIN 
       (SELECT c.id, COUNT(*) AS result 
        FROM categories c 
        LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
        INNER JOIN categories_posts cp ON cc.id = cp.post_id
        GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result

编辑2
实际上,上述所有查询都存在错误 - 每当我加入类别和帖子时,我应该加入 cc.id = cp.category_id 而不是 cp.post_id,然后我没有检查。不想纠正...但仅限于最后一个查询

UPDATE categories LEFT JOIN 
       (SELECT c.id, COUNT(*) AS result 
        FROM categories c 
        LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
        INNER JOIN categories_posts cp ON cc.id = cp.category_id
        INNER JOIN posts p ON cp.post_id = p.id
        WHERE p.status = 'published'
        GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result,
    post_count = (SELECT COUNT(*) 
                  FROM categories_posts cp 
                  WHERE cp.category_id = categories.id)

EDIT3
只需注意几点:

  • 上面的查询将修复 under_post_countpost_count 无论数据的状态如何,
  • 如果数据访问层被正确抽象,有些查询会更便宜,安全并且如果您可以保证原子性 - 这些查询只会对状态中的适当记录执行 post_count = post_count +/- 1 (类似于 under_post_count),
  • 以防您无法可靠地从应用程序级别模拟触发器,检查是否需要运行上述查询可能仍然更便宜(尽管 mysql 在这方面相当好,但如果您想与数据库无关),或者采用您需要的某些策略通常只需递增/递减计数器并仅定期重新计算数字。

well there are a few ways to skin a cat here (assuming 5.1 and triggers)

  • you can update everything from application layer

  • you can trigger updates to post_count from categories_posts and trigger updates (cascade) to under_post_count from categories

  • finally, you can trigger all updates from categories_posts

Also depending on the actual number of categories, you might not need to denormalize under_post_count as it should be rather easy and inexpensive to fetch it with

SELECT c.id, SUM(cc.post_count) 
FROM categories c 
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
GROUP BY c.id;

Fetching the actual counts on exact match is

SELECT c.id, COUNT(*) 
FROM categories c 
LEFT JOIN categories_posts cp ON c.id = cp.post_id 
GROUP BY c.id;

Combining the two gives the counts including the hierarchies

SELECT c.id, COUNT(*) 
FROM categories c 
LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
LEFT JOIN categories_posts cp ON cc.id = cp.post_id
GROUP BY c.id;

EDIT

Constructing update statements from the above should not be so hard

UPDATE categories 
SET post_count = (SELECT COUNT(*) 
                  FROM categories_posts cp 
                  WHERE cp.post_id = categories.id)

should work for the post_count

Situation for the under_post_count is different since mysql does not like to hear that target table is mentioned in the where part, therefore you have to do some monstrosity like this

UPDATE categories LEFT JOIN 
       (SELECT c.id, COUNT(*) AS result 
        FROM categories c 
        LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
        INNER JOIN categories_posts cp ON cc.id = cp.post_id
        GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result

EDIT2
Actually there is an error in all of the above queries - whenever I joined categories and posts I should have joined on cc.id = cp.category_id and not cp.post_id, which then I didn't check. Don't feel like correcting... but only in this last query

UPDATE categories LEFT JOIN 
       (SELECT c.id, COUNT(*) AS result 
        FROM categories c 
        LEFT JOIN categories cc ON c.lft <= cc.lft AND c.rght >= cc.rght 
        INNER JOIN categories_posts cp ON cc.id = cp.category_id
        INNER JOIN posts p ON cp.post_id = p.id
        WHERE p.status = 'published'
        GROUP BY c.id) AS x ON categories.id = x.id
SET under_post_count = x.result,
    post_count = (SELECT COUNT(*) 
                  FROM categories_posts cp 
                  WHERE cp.category_id = categories.id)

EDIT3
Just few notes:

  • the above query will fix the under_post_count and post_count regardless of the state of the data,
  • there are queries that are cheaper which if you data access layers are properly abstracted, secured and if you can assure the atomicity - these queries would only do post_count = post_count +/- 1 on appropriate records in statuses (similary for under_post_count),
  • in case you can not reliably emulate triggers from application level it might still be cheaper to check if you need to run the above queries (even though mysql is quite good in this respect, but if you want to be DB agnostic), or adopt some strategy that you normally just increment/decrement the counters and only periodically recalculate the number.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文