解析维基百科引发 KeyError:查询(Python)

发布于 2025-01-15 15:17:29 字数 2090 浏览 3 评论 0原文

我正在尝试获取超过 200 页的维基百科反向链接。为此,我:

  • 查找意大利语 URL,如果不起作用,我查找英语 URL
  • 将它们放入列表中
  • 迭代此列表以获取它们在维基百科上可用的语言数量(使用 bs4)
  • I将这些语言附加到列表中
  • 我迭代两种语言和网址以获取页面标题和反向链接以放入字典中,其中键为语言,值为该语言中可用的反向链接数量

但我收到错误“查询”。我不知道为什么

opere = df.label #works
listaurl = []
for x in opere:
    try:
        wiki_wiki = wikipediaapi.Wikipedia('it')
        p = wiki_wiki.page(x).fullurl
        listaurl.append(p)
        print(p)
    except:
        wiki_wiki = wikipediaapi.Wikipedia('en')
        p = wiki_wiki.page(x).fullurl
        listaurl.append(p)
        print(p)


lista = []
for url in listaurl:
    soup = BeautifulSoup(urllib.request.urlopen(url))
    links = [(el.get('lang'), el.get('href')) for el in soup.select('li.interlanguage-link > a')]

    for language, link in links:
        lista.append(language)
testo = soup.title.text.replace(" ", "")
import wikipediaapi
lista2 = []
regex = r"(?<=/wiki/).*$"
dik = {}
for lang in lista:
    wikis = wikipediaapi.Wikipedia(lang)
    for apage in listaurl:
        wikipage = apage.split('/wiki/')[1]
        page_py = wikis.page(wikipage)
        print(page_py)
        titles = page_py.title
        print(titles)
        back = page_py.backlinks
        dik[lang] = len(back)

要重现的示例输入(df):

item,label,authorlabel,authorlabel2,numWikipediaLanguages
http://www.wikidata.org/entity/Q172850,Il nome della rosa,,Umberto Eco,53
http://www.wikidata.org/entity/Q437791,Il pendolo di Foucault,,Umberto Eco,30
http://www.wikidata.org/entity/Q791487,Baudolino,,Umberto Eco,26

错误回溯:

Traceback (most recent call last):
  File "C:....myfile.py", line 43, in <module>
    back = page_py.backlinks
  File "C:\....\wikipediaapi\__init__.py", line 1112, in backlinks
    self._fetch('backlinks')
  File "C:....\wikipediaapi\__init__.py", line 1148, in _fetch
    getattr(self.wiki, call)(self)
  File "C:....wikipediaapi\__init__.py", line 468, in backlinks
    self._common_attributes(raw['query'], page)
KeyError: 'query'

I'm trying to get wikipedia backlinks of more than 200 pages. To do this, I:

  • look for URLs in italian, if it doesn't work I look for them in English
  • put them in a list
  • iterate over this list to get the number of languages they are available in on Wikipedia (with bs4)
  • I append these languages in a list
  • I iterate over both languages and urls to get page titles and backlinks to put in a dictinonary with key the language and value the number of backlinks available in that language

But I get the error "query". I don't know why

opere = df.label #works
listaurl = []
for x in opere:
    try:
        wiki_wiki = wikipediaapi.Wikipedia('it')
        p = wiki_wiki.page(x).fullurl
        listaurl.append(p)
        print(p)
    except:
        wiki_wiki = wikipediaapi.Wikipedia('en')
        p = wiki_wiki.page(x).fullurl
        listaurl.append(p)
        print(p)


lista = []
for url in listaurl:
    soup = BeautifulSoup(urllib.request.urlopen(url))
    links = [(el.get('lang'), el.get('href')) for el in soup.select('li.interlanguage-link > a')]

    for language, link in links:
        lista.append(language)
testo = soup.title.text.replace(" ", "")
import wikipediaapi
lista2 = []
regex = r"(?<=/wiki/).*
quot;
dik = {}
for lang in lista:
    wikis = wikipediaapi.Wikipedia(lang)
    for apage in listaurl:
        wikipage = apage.split('/wiki/')[1]
        page_py = wikis.page(wikipage)
        print(page_py)
        titles = page_py.title
        print(titles)
        back = page_py.backlinks
        dik[lang] = len(back)

Example input to reproduce (the df):

item,label,authorlabel,authorlabel2,numWikipediaLanguages
http://www.wikidata.org/entity/Q172850,Il nome della rosa,,Umberto Eco,53
http://www.wikidata.org/entity/Q437791,Il pendolo di Foucault,,Umberto Eco,30
http://www.wikidata.org/entity/Q791487,Baudolino,,Umberto Eco,26

Error traceback:

Traceback (most recent call last):
  File "C:....myfile.py", line 43, in <module>
    back = page_py.backlinks
  File "C:\....\wikipediaapi\__init__.py", line 1112, in backlinks
    self._fetch('backlinks')
  File "C:....\wikipediaapi\__init__.py", line 1148, in _fetch
    getattr(self.wiki, call)(self)
  File "C:....wikipediaapi\__init__.py", line 468, in backlinks
    self._common_attributes(raw['query'], page)
KeyError: 'query'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文