使用 wget 仅以纯 xhtml 格式下载 dokuwiki 页面

发布于 2024-11-09 12:07:54 字数 729 浏览 11 评论 0原文

我目前正在修改offline-dokuwiki[1] shell 脚本，以获取应用程序的最新文档，以便自动嵌入到该应用程序的实例中。这工作得很好，除了在当前的形式下，它获取每个页面的三个版本：

包括页眉和页脚的完整页面
仅不含页眉和页脚的内容
我实际上只对原始 wiki 语法

感兴趣 2. 这是从主页由中的 html 标记组成，如下所示：

<link rel="alternate" type="text/html" title="Plain HTML" 
href="/dokuwiki/doku.php?do=export_xhtml&amp;id=documentation:index" />

并且与主要 wiki 页面的 url 相同，只是它们包含查询字符串中的“do=export_xhtml”。有没有办法指示 wget 仅下载这些版本或自动将“&do=export_xhtml”添加到其后面的任何链接的末尾？如果是这样，这将是一个很大的帮助。

[1] http://www.dokuwiki.org/tips:offline-dokuwiki.sh< /a> （作者：samlt）

原文

I'm currently modifying the offline-dokuwiki[1] shell script to get the latest documentation for an application for automatically embedding within instances of that application. This works quite well except in its current form it grabs three versions of each page:

The full page including header and footer
Just the content without header and footer
The raw wiki syntax

I'm only actually interested in 2. This is linked to from the main pages by a html <link> tag in the <head>, like so:

<link rel="alternate" type="text/html" title="Plain HTML" 
href="/dokuwiki/doku.php?do=export_xhtml&id=documentation:index" />

and is the same url as the main wiki pages only they contain 'do=export_xhtml' in the querystring. Is there a way of instructing wget to only download these versions or to automatically add '&do=export_xhtml' to the end of any links it follows? If so this would be a great help.

[1] http://www.dokuwiki.org/tips:offline-dokuwiki.sh (author: samlt)

分享到QQ

分享到微博