如何获取给定页面 id 的当前文本

发布于 2025-01-12 08:42:29 字数 562 浏览 2 评论 0原文

我有一个机器人，可以直接从数据库分析某些页面的当前文本。页面 ID 是已知的。过去，机器人使用 where revision.rev_id = page.page_latest && text.old_id = revision.rev_text_id。 Mediawiki 更新后，该机器人不再工作。

现在，成员 revision.rev_text_id 已丢失。 docu 告诉我们，text.old_id 是现在由表格内容引用。我现在的问题是，找到从 page_id 到表 content 的方法。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

说好的呢 2025-01-19 08:42:29

发布问题后，我继续我的调查，再次阅读文档并找到了解决方案（表slots）：

    SELECT p.page_title, t.old_id, t.old_text
    FROM   `page` p,
           `slots` s,
           `content` c,
           `text` t
    WHERE p.page_id                     = $page_id
     &&   s.slot_origin                 = p.page_latest
     &&   c.content_id                  = s.slot_content_id
     &&   substr(c.content_address,1,3) = "tt:"
     &&   t.old_id                      = substr(c.content_address,4)

但它比旧机器人慢得多（在同一服务器上测试）：7分钟而不是11274 页需要 1.55 秒。也许我添加一些索引。

编辑

使用 alter table slot add index (slot_origin) 添加密钥后，该过程需要 1.162 秒（比旧机器人快一点）。

After posting the question, I continued mý investigation, read the docu again and found the solution (table slots):

    SELECT p.page_title, t.old_id, t.old_text
    FROM   `page` p,
           `slots` s,
           `content` c,
           `text` t
    WHERE p.page_id                     = $page_id
     &&   s.slot_origin                 = p.page_latest
     &&   c.content_id                  = s.slot_content_id
     &&   substr(c.content_address,1,3) = "tt:"
     &&   t.old_id                      = substr(c.content_address,4)

But it is much slower than the old bot (tested on same server): 7 min instead of 1.55s for 11274 pages. Maybe I add some indexes.