请求(URL)有5个迭代之后

发布于 2025-02-12 00:04:27 字数 186 浏览 0 评论 0原文

我试图在使用“美丽的套件”上运行Webscraping Algo,并通过不同的页面进行循环。但是,经过2-6次迭代,请求(url)悬挂并停止查找下一页。我已经读到,它可能会阻止服务器有所作为,但这会阻止原始请求,并且还说在线上说确实可以进行网络刮擦。我还听说我应该设置一个标题,但我不确定该怎么做。我正在使用最新版本的Safari和Macos 12.4运行。

I am attempting to run a webscraping algo on indeed using beautifulSoup and loop through the different pages. However, after 2-6 iterations, the requests.get(url) hangs and stops finding the next page. I have read that it might do something with the server being blocked but that would have blocked the original requests and it also says online that Indeed allows for web scraping. I have also heard that I should set a header but I am unsure how to do that. I am running on the latest version of safari and MacOs 12.4.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

明月夜 2025-02-19 00:04:27

我想到的解决方案,认为这不是专门回答问题,是使用试用期望语句并将超时值设置为请求。一旦达到超时值,它将输入尝试异常语句,设置一个布尔值,然后继续循环并重试。代码在下面插入。

while(i < 10):
    url = get_url('software intern', '', i)
    print("Parsing Page Number:" + str(i + 1))

    error = False

    try:
        response = requests.get(url, timeout = 10)
    except requests.exceptions.Timeout as err:
        error = True
    if error:
        print("Trying to connect to webpage again")
        continue

    i += 1

我现在将这个问题视为目前尚未得到答复,但是我仍然不知道这个问题的根本原因,而这个解决方案只是解决方法。

A solution I came up with, thought this does not answer the question specifically, is by using a try expect statement and setting a timeout value to the request. Once the timeout value is reached, it enters the try except statement, sets a boolean value, and then continues the loop and try again. Code is inserted below.

while(i < 10):
    url = get_url('software intern', '', i)
    print("Parsing Page Number:" + str(i + 1))

    error = False

    try:
        response = requests.get(url, timeout = 10)
    except requests.exceptions.Timeout as err:
        error = True
    if error:
        print("Trying to connect to webpage again")
        continue

    i += 1

I am leaving the question as unanswered for now however as I still don't know the root cause of this issue and this solution is just a workaround.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文