MediaWiki API action=parse 失败,并显示“页面参数不能与文本和标题参数一起使用”

发布于 2024-10-06 12:52:32 字数 953 浏览 6 评论 0原文

我对 Wikipedia API 非常陌生,但我最近遇到了一个基于 wikimedia 构建的中文网站,我想用它来帮助我将各种页面解析为可用的格式,以便最终使用 XPATH 进行处理。阅读了一些内容后,我发现 action=parse 参数正是我正在寻找的。例如,以下查询加载毫无困难:(来自维基百科)

api.php?action=parse&page=Main_Page&format=xml

它显示文本,然后是语言链接,然后是链接。我对链接部分特别感兴趣,因为我将使用这些数据来爬行这个基于维基百科的网站以创建页面层次结构。

为了尝试复制这些结果,我将查询添加到我的网站页面的末尾:

http://www.youbianku.com/api.php?action=parse&page=%E5%8C%97%E4%BA%AC&format=xml

%E5%8C%97%E4%BA%AC 解析为北京的汉字,顺便说一句。无论如何,我得到以下结果:

<api>
<error code="params" info="The page parameter cannot be used together with the text and title parameters"/>
</api>

我所做的就是复制维基百科的查询并替换页面的名称。我不清楚为什么这会突然引发错误。在此页面上运行其他API查询没有问题,如下所示:

api.php?action=query&format=xml&titles=%E5%8C%97%E4%BA%AC&rvprop=content&prop=revisions

我最近读到这可能是由于htaccess重写规则默认添加标题所致。鉴于我是该网站的客户,有没有办法绕过这些?

I'm very new to the Wikipedia API, but I recently came across a Chinese website built on top of wikimedia and I would like to use it to help me parse various pages into a workable format for eventual processing with XPATH. After reading for a bit, I found that the action=parse parameter was what I am looking for. For instance, the following query loads without difficulty: (from Wikipedia)

api.php?action=parse&page=Main_Page&format=xml

It presents the text, followed by language links, followed by links. I am particularly interested in the links section, as I would be using this data to crawl through this wikipedia-based site to create a hierarchy of pages.

Attempting to replicate these results, I tacked the query onto the end of the page for my site:

http://www.youbianku.com/api.php?action=parse&page=%E5%8C%97%E4%BA%AC&format=xml

%E5%8C%97%E4%BA%AC resolves to the chinese characters for Beijing, btw. Anyhow, I get the following result:

<api>
<error code="params" info="The page parameter cannot be used together with the text and title parameters"/>
</api>

All I have done is to replicate the query from Wikipedia and replace the name of the page. It is unclear to me why this has suddenly thrown an error. There is no problem running other API queries on this page, as the following shows:

api.php?action=query&format=xml&titles=%E5%8C%97%E4%BA%AC&rvprop=content&prop=revisions

I read recently that this may be due to htaccess rewrite rules adding a title by default. Is there a way to bypass these, given that I am a client of this website?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

拧巴小姐 2024-10-13 12:52:32

正如您所建议的,此问题可能是由 URL 重写规则损坏引起的。

您可以通过使用 text 参数和嵌入来解决此问题您想要的页面,如下所示:(

/api.php?action=parse&text={{:Page_title}}

前导 : 是为了防止默认情况下将 Template: 添加到页面标题前面。)

尝试使用示例您问题中的页面由于某些(可能不相关)原因返回 PHP 错误,但是 它与该 wiki 上的其他页面工作得很好。

这个技巧的一个缺点是它绕过了解析器缓存,使其比简单地使用 page 更慢且更消耗资源。此外,页面上使用的依赖于页面标题的任何变量都可能会产生意想不到的结果,并且任何依赖于页面或修订元数据的变量可能会完全失败。幸运的是,此类变量在实践中并不经常使用。


另一个甚至更好的解决方案可能是简单地使用

/index.php?action=render&title=Page_title

它将返回页面的已解析 HTML 源代码,而没有任何周围的皮肤, 这样。此方法不像 API 那样通用,但它不会遇到上述问题。

As you suggest, this issue is probably caused by a broken URL rewriting rule.

You can work around this problem by using the text parameter and transcluding the page you want, like this:

/api.php?action=parse&text={{:Page_title}}

(The leading : is there to prevent Template: being prepended to the page title by default.)

Trying this with the example page in your question returns a PHP error for some — probably unrelated — reason, but it works fine with other pages on that wiki.

A disadvantage of this trick is that it bypasses the parser cache, making it slower and more resource-consuming than simply using page. Also, any variables used on the page that depend on the page title are likely to yield unexpected results, and any variables depending on page or revision metadata will probably fail entirely. Fortunately, such variables are not used very often in practice.


Another, perhaps even better solution may be to simply use

/index.php?action=render&title=Page_title

which will return the parsed HTML source of the page without any surrounding skin, like this. This method is not as versatile as the API, but it suffers from none of the problems described above.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文