list=alllinks 混淆

发布于 2024-09-04 11:17:36 字数 2055 浏览 15 评论 0 原文

我正在做一个夏天的研究项目，我必须使用从维基百科获取一些数据，存储它，然后对其进行一些分析。我正在使用维基百科 API 来收集数据，并且我已经很好地掌握了这些数据。

我的问题是关于 API links-alllinks 选项="nofollow noreferrer">此处为文档阅读完描述后，无论是在 API 本身中（它都已经崩溃了）我无法直接链接到该部分），我想我明白它应该返回什么。然而，当我运行查询时，它返回了我意想不到的东西。

这是我运行的查询：

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=google&rvprop=ids|timestamp|user|comment|content&rvlimit=1&list=alllinks&alunique&allimit=40&format=xml

本质上是说：获取 Google 页面的最新修订版，包括每个修订版的 id、时间戳、用户、评论和内容，并以 XML 格式返回。链接（我认为）应该给我返回指向 google 页面的维基百科页面列表（在本例中是前 40 个唯一的页面）。

我不确定关于脏话的政策是什么，但这就是我准确返回的结果：

<?xml version="1.0"?>
<api>
    <query><normalized>
        <n from="google" to="Google" />
        </normalized>
        <pages>
            <page pageid="1092923" ns="0" title="Google">
                <revisions>
                    <rev revid="366826294" parentid="366673948" user="Citation bot" timestamp="2010-06-08T17:18:31Z" comment="Citations: [161]Tweaked: url. [[User:Mono|Mono]]" xml:space="preserve">
                        <!-- The page content, I've replaced this cos its not of interest -->
                    </rev>
                </revisions>
            </page>
        </pages>
        <alllinks>
                <!-- offensive content removed -->
        </alllinks>
    </query>
    <query-continue>
        <revisions rvstartid="366673948" />
        <alllinks alfrom="!2009" />
    </query-continue>
</api>

部分，它只是一堆随机的官样文章和攻击性评论。几乎没有我想象的那样。我已经进行了相当多的搜索，但似乎无法找到我的问题的直接答案。

list=alllinks 选项应该返回什么？
为什么我会把这些废话放进去？

原文

I'm doing a research project for the summer and I've got to use get some data from Wikipedia, store it and then do some analysis on it. I'm using the Wikipedia API to gather the data and I've got that down pretty well.

What my questions is in regards to the links-alllinks option in the API doc here
After reading the description, both there and in the API itself (it's down and bit and I can't link directly to the section), I think I understand what it's supposed to return. However when I ran a query it gave me back something I didn't expect.

Here's the query I ran:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=google&rvprop=ids|timestamp|user|comment|content&rvlimit=1&list=alllinks&alunique&allimit=40&format=xml

Which in essence says: Get the last revision of the Google page, include the id, timestamp, user, comment and content of each revision, and return it in XML format.
The allinks (I thought) should give me back a list of wikipedia pages which point to the google page (In this case the first 40 unique ones).

I'm not sure what the policy is on swears, but this is the result I got back exactly:

<?xml version="1.0"?>
<api>
    <query><normalized>
        <n from="google" to="Google" />
        </normalized>
        <pages>
            <page pageid="1092923" ns="0" title="Google">
                <revisions>
                    <rev revid="366826294" parentid="366673948" user="Citation bot" timestamp="2010-06-08T17:18:31Z" comment="Citations: [161]Tweaked: url. [[User:Mono|Mono]]" xml:space="preserve">
                        <!-- The page content, I've replaced this cos its not of interest -->
                    </rev>
                </revisions>
            </page>
        </pages>
        <alllinks>
                <!-- offensive content removed -->
        </alllinks>
    </query>
    <query-continue>
        <revisions rvstartid="366673948" />
        <alllinks alfrom="!2009" />
    </query-continue>
</api>

The <alllinks> part, its just a load of random gobbledy-gook and offensive comments. No nearly what I thought I'd get. I've done a fair bit of searching but I can't seem to find a direct answer to my question.