维基百科 API 找不到特定页面(带撇号的 URL)
我正在尝试检索未检索到的页面上的综合浏览量信息,而其他页面则检索到。我收到错误:
File "<unknown>", line 1
article =='L'amica_geniale_ (serie_di_romanzi )'
^
SyntaxError: invalid syntax
但文本中没有空格。此页面是:https://it.wikipedia.org/wiki/L%27amica_geniale_(serie_di_romanzi)
代码是:
start_date = "2005/01/01"
headers = {
'User-Agent': 'Mozilla/5.0'
}
def wikimedia_request(page_name, start_date, end_date = None):
sdate = start_date.split("/")
sdate = ''.join(sdate)
r = requests.get(
"https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/all-agents/{}/daily/{}/{}".format(page_name,sdate, edate),
headers=headers
)
r.raise_for_status() # raises exception when not a 2xx response
result = r.json()
df = pd.DataFrame(result['items'])
df['timestamp'] = [i[:-2] for i in df.timestamp]
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace = True)
return df[['article', 'views']]
df = wikimedia_request(name="Random", start_date)
names = ["L'amica geniale"]
dfs = pd.concat([wikimedia_request(x, start_date) for x in names])
并且该代码除此页面外均有效。我想这可能与撇号有关
I'm trying to retrieve pageviews info on a page which is not retrieved, while other pages are. I get the error:
File "<unknown>", line 1
article =='L'amica_geniale_ (serie_di_romanzi )'
^
SyntaxError: invalid syntax
But there are no whitespaces in the text. this page is: https://it.wikipedia.org/wiki/L%27amica_geniale_(serie_di_romanzi)
The code is:
start_date = "2005/01/01"
headers = {
'User-Agent': 'Mozilla/5.0'
}
def wikimedia_request(page_name, start_date, end_date = None):
sdate = start_date.split("/")
sdate = ''.join(sdate)
r = requests.get(
"https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/all-agents/{}/daily/{}/{}".format(page_name,sdate, edate),
headers=headers
)
r.raise_for_status() # raises exception when not a 2xx response
result = r.json()
df = pd.DataFrame(result['items'])
df['timestamp'] = [i[:-2] for i in df.timestamp]
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace = True)
return df[['article', 'views']]
df = wikimedia_request(name="Random", start_date)
names = ["L'amica geniale"]
dfs = pd.concat([wikimedia_request(x, start_date) for x in names])
And the code works except for this page. I'm thinking it might be something with the apostrophe
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请注意您使用的网址。
'it.wikipedia.org'
和'en.wikipedia.org'
之间存在差异,但在使用正确的网址时效果很好。你可以做这样的事情来解释它:
输出:
Pay attention to which url you are using. there's a difference between
'it.wikipedia.org'
and'en.wikipedia.org'
But works just fine when using the correct url. You could do something like this to account for it:
Output: