错误403刮擦Hansard时使用Cloudflare

发布于 2025-01-24 12:54:27 字数 1076 浏览 2 评论 0 原文

我正在尝试从此链接。我需要编写一个循环来提取此类图的信息，以了解一组特定条件。使用开发人员工具＆gt;＆gt;网络，我发现 url to此图的数据。数据似乎以XML格式存储。

我尝试了不同的方法，但是我一直遇到403错误。我只想仅提取绘图还是对整个网页提出请求都没关系。我认为问题是Cloudflare启动了。有什么想法我如何能够解决这个问题？任何帮助都是非常适合的。

import urllib.request

url = 'https://hansard.parliament.uk/timeline/query?searchTerm=immigration&startDate=27%2F04%2F2017&endDate=27%2F04%2F2022&house=0&contributionType=&isDebatesSearch=False&memberId='

headers = {'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36'}

req = urllib.request.Request(url, headers=headers)
webpage = urllib.request.urlopen(req).read()

原文

I am trying to extract a graph from this link. I need to write a loop to extractd the info of graphs like this for a set of specific criteria. Using Developers tools >> Network, I found the URL to the data underlying this graph. The data seems to be stored in XML format.

I have tried different approaches, but I keep getting 403 Error. It doesn't matter whether I want to extract just the plot or make a get request for the whole web page. I think the problem is that Cloudflare kicks in. Any idea how I might be able to get around this? Any help is very much appriciated.

import urllib.request

url = 'https://hansard.parliament.uk/timeline/query?searchTerm=immigration&startDate=27%2F04%2F2017&endDate=27%2F04%2F2022&house=0&contributionType=&isDebatesSearch=False&memberId='

headers = {'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36'}

req = urllib.request.Request(url, headers=headers)
webpage = urllib.request.urlopen(req).read()

分享到QQ

分享到微博