删除特定日期之前的 MediaWiki 页面

发布于 2024-12-26 12:00:07 字数 221 浏览 2 评论 0原文

我有一个相当大的 MediaWiki 数据库,我想删除自某个日期以来尚未编辑的所有页面。

所讨论的 wiki 包含我们首次创建 wiki 时导入的维基百科的一部分,以及此后我们自己创建的大量页面。我们最近决定不再需要维基百科页面,因此希望将它们从数据库中删除。

我们能想到的最好方法是删除自原始导入以来尚未编辑的所有页面 - 问题是,我们不确定如何执行此操作。

有人有什么想法吗?

I have a rather large MediaWiki database, and I'd like to remove all the pages that haven't been edited since a certain date.

The wiki in question consists of a cut of Wikipedia that was imported when we first created the wiki, and a load of pages we have created ourselves since. We've recently decided that we no longer want the Wikipedia pages, and would therefore like to remove them from the database.

The best method we could think of to do this was to remove all pages that haven't been edited since the original import - the trouble is, we're not sure how to do this.

Anyone got any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

遗忘曾经 2025-01-02 12:00:07

您可以通过运行如下 SQL 查询来获取给定日期之前(或之后)最后编辑的页面列表:

SELECT page_id, page_namespace, page_title
FROM /*prefix*/page
WHERE page_touched < '20110101000000'

这列出了自 2011 年初以来尚未编辑的所有页面的 ID、命名空间编号和标题 。时间戳格式为'YYYYMMDDHHMMSS'。)如果您在安装MediaWiki时配置了表名前缀,则需要将上面的/*prefix*/替换为它。

此时,您可以执行以下操作:

  • 正如 Joshua C. Lerner 所建议的,您可以 导出您想要保留的所有页面(使用特殊:导出或使用maintenance/dumpBackup.php)并将它们重新导入到新数据库中。

  • 还有一个名为 maintenance/deleteBatch.php 的维护脚本 code> 可用于删除一堆页面,就像管理员以通常的方式删除它们一样。

  • 最后,如果您确实不需要返回页面,则可以将上面 SQL 查询的第一行替换为 DELETE。我强烈建议您在执行此操作之前先备份数据库。这将在数​​据库中留下一些孤立的修订,但您可以使用富有想象力的维护脚本 maintenance/deleteOrphanedRevisions.php 删除它们。

(上述前两种方法的一个小问题是,导出和批量删除脚本都希望使用命名空间 names 列出页面,而 SQL 查询返回命名空间 numbers。仅通过搜索和替换将一个页面转换为另一个页面并不困难,但它确实为该过程添加了额外的步骤,当然,如果您要删除的所有页面都在主命名空间中,那么这不是问题:只是。添加AND page_namespace = 0到查询并从输出中删除 ID 和命名空间。)

You can get a list of pages last edited before (or after) a given date by running an SQL query like this:

SELECT page_id, page_namespace, page_title
FROM /*prefix*/page
WHERE page_touched < '20110101000000'

This lists the ID, namespace number and title of all pages that have not been edited since the beginning of 2011. (The timestamp format is 'YYYYMMDDHHMMSS'.) If you configured a table name prefix when you installed MediaWiki, you need to replace /*prefix*/ above with it.

At this point, there are several things you could do:

  • As Joshua C. Lerner suggests, you could export all the pages you want to keep (either with Special:Export or with maintenance/dumpBackup.php) and re-import them into a new database.

  • There's also a maintenance script named maintenance/deleteBatch.php which can be used to delete a bunch of pages as if they'd been deleted in the usual way by an admin.

  • Finally, if you're really sure you won't want the pages back, you could just replace the first line of the SQL query above with DELETE. I'd strongly suggest making a backup of your database before you do this. This will leave some orphaned revisions in the database, but you can get rid of them with the imaginatively named maintenance script maintenance/deleteOrphanedRevisions.php.

(A minor issue with the first two methods above is that the export and batch delete scripts all want the pages listed with namespace names, while the SQL query returns namespace numbers. It's not that hard to convert one to the other just by search and replace, but it does add an extra step to the process. Of course, if all the pages you want to delete are in the main namespace, this is not an issue: just add AND page_namespace = 0 to the query and drop the ID and namespace from the output.)

离鸿 2025-01-02 12:00:07

生成自初始设置(使用维基百科剪切)以来创建的您自己的文章列表,导出这些文章,然后将它们重新导入到新初始化的 MediaWiki 数据库中可能会更简单。

It would probably be simpler to generate a list of your own articles created since the initial setup (with the Wikipedia cut), export those articles, then re-import them into a newly-initialized MediaWiki database.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文