将Heroku用作代理人很好吗?
Heroku提供了动态的IP地址,当Dyno重新启动时会发生变化。因此,可以利用这一Heroku功能,而不是为代理付费。
我想进行大规模的自动网络报废,而不会被阻止。
已经存在一个Heroku应用程序,可以精确地进行上述过程。但是我不记得这个名字(类似nameofapp.herokuapp.com
)。要使用它,我们必须执行https://nameofapp.herokuapp.com/destination.com
之类的事情。我在一个堆栈溢出问题中遇到了它。
我预测域是where.herokuapp.com
,但不能使用它。
Heroku provides dynamic ip addresses which change when the dyno is restarted. So, instead of paying for proxies, can take advantage of this Heroku feature.
I want to do large-scale automated web scrapping without being blocked.
There already exists a Heroku app for doing exactly the above-mentioned process. But I cannot remember the name (something like this nameOfApp.herokuapp.com
). To use it we have to do something like https://nameOfApp.herokuapp.com/destination.com
. I came across it in a Stack Overflow question.
I predict that the domain is anywhere.herokuapp.com
but cannot use it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Heroku,这在这方面不太可能有所帮助。
对于许多用例,从云服务中提出请求从来没有有正当的理由。结果,对于不想被刮擦以阻止整个IP范围的网站,例如亚马逊Web服务(Heroku构建的基础云基础架构)是常见的。
许多用户发现,即使在长时间的时间内,他们的刮板在本地工作正常,并且在部署到Heroku时立即失败。
可以帮助吗?当然。但这将在很大程度上取决于您要刮擦的网站,当时的缓解和其他许多因素。
旁注,我确定您会忽略:请尊重站点的服务条款。如果他们不希望您刮擦它们,请不要刮擦它们。如果他们实现技术障碍,那是一个不想被刮擦的好兆头。
Heroku is unlikely to help in this regard.
For many use cases, there's never a valid reason for requests to come from cloud services. As a result, it is common for sites that don't want to get scraped to block entire IP ranges, e.g. ones for Amazon Web Services (the underlying cloud infrastructure that Heroku is built upon).
Many users find that their scrapers work fine locally, even for extended periods of time, and immediately fail when deployed to Heroku.
Could it help? Sure. But this will be highly dependent on the site you're trying to scrape, the mitigations in place at the time, and many other factors.
Side note that I'm sure you'll just ignore: please respect sites' terms of service. If they don't want you scraping them, don't scrape them. If they implement technical barriers, that's a pretty good sign that they don't want to be scraped.