将Heroku用作代理人很好吗?

发布于 2025-01-24 17:25:15 字数 337 浏览 2 评论 0原文

Heroku提供了动态的IP地址,当Dyno重新启动时会发生变化。因此,可以利用这一Heroku功能,而不是为代理付费。

我想进行大规模的自动网络报废,而不会被阻止。

已经存在一个Heroku应用程序,可以精确地进行上述过程。但是我不记得这个名字(类似nameofapp.herokuapp.com)。要使用它,我们必须执行https://nameofapp.herokuapp.com/destination.com之类的事情。我在一个堆栈溢出问题中遇到了它。

我预测域是where.herokuapp.com,但不能使用它。

Heroku provides dynamic ip addresses which change when the dyno is restarted. So, instead of paying for proxies, can take advantage of this Heroku feature.

I want to do large-scale automated web scrapping without being blocked.

There already exists a Heroku app for doing exactly the above-mentioned process. But I cannot remember the name (something like this nameOfApp.herokuapp.com). To use it we have to do something like https://nameOfApp.herokuapp.com/destination.com. I came across it in a Stack Overflow question.

I predict that the domain is anywhere.herokuapp.com but cannot use it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

你的笑 2025-01-31 17:25:15

Heroku提供了动态的IP地址,当Dyno重新启动时会发生变化。因此,可以利用这一Heroku功能。

,而不是为代理付费。

我想进行大规模自动网络报废而不会被阻止

Heroku,这在这方面不太可能有所帮助。

对于许多用例,从云服务中提出请求从来没有有正当的理由。结果,对于不想被刮擦以阻止整个IP范围的网站,例如亚马逊Web服务(Heroku构建的基础云基础架构)是常见的。

许多用户发现,即使在长时间的时间内,他们的刮板在本地工作正常,并且在部署到Heroku时立即失败。

可以帮助吗?当然。但这将在很大程度上取决于您要刮擦的网站,当时的缓解和其他许多因素。


旁注,我确定您会忽略:请尊重站点的服务条款。如果他们不希望您刮擦它们,请不要刮擦它们。如果他们实现技术障碍,那是一个不想被刮擦的好兆头。

Heroku provides dynamic ip addresses which change when the dyno is restarted. So, instead of paying for proxies, can take advantage of this Heroku feature.

I want to do large-scale automated web scrapping without being blocked

Heroku is unlikely to help in this regard.

For many use cases, there's never a valid reason for requests to come from cloud services. As a result, it is common for sites that don't want to get scraped to block entire IP ranges, e.g. ones for Amazon Web Services (the underlying cloud infrastructure that Heroku is built upon).

Many users find that their scrapers work fine locally, even for extended periods of time, and immediately fail when deployed to Heroku.

Could it help? Sure. But this will be highly dependent on the site you're trying to scrape, the mitigations in place at the time, and many other factors.


Side note that I'm sure you'll just ignore: please respect sites' terms of service. If they don't want you scraping them, don't scrape them. If they implement technical barriers, that's a pretty good sign that they don't want to be scraped.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文