动态网络刮擦-Chromedriver Security
我正在尝试Web-Scrap动态页面,简单的Urllib请求只会从FIST页面上获得结果,而不是返回整个集合。
from urllib import request
from bs4 import BeautifulSoup
URL = "https://www.olx.pl/d/nieruchomosci/mieszkania/warszawa/"
get_url = request.urlopen(URL)
get_page = get_url.read()
get_url.close()
print(get_page)
我正准备实施硒和ChromeDriver进行动态网络剪接,但随后我阅读了此解决方案的安全性。 Chromedriver绝不应该以特权考虑,最佳选择是虚拟机。
当我阅读论坛帖子时,几乎所有动态网络取消的解决方案都涉及Chromedriver / Selenium / Scrapy。这让我想知道所有用户是否设置了防火墙或VM来下载数据。
您是否建议您推荐其他更安全的解决方案用于动态网络报废?
I am trying to web-scrap a dynamic page, simple urllib request gets me results from fist page only, instead of returning the whole set.
from urllib import request
from bs4 import BeautifulSoup
URL = "https://www.olx.pl/d/nieruchomosci/mieszkania/warszawa/"
get_url = request.urlopen(URL)
get_page = get_url.read()
get_url.close()
print(get_page)
I was about to implement selenium and chromedriver for the dynamic webscraping, but then I read about security of this solution. Chromedriver should never be run on account with privileges, best option would be virtual machine.
As I read forum posts, almost all solutions to dynamic web scrapping involve chromedriver / selenium/ scrapy. It makes me wonder if all users set up firewall or VM to download data.
Is there any other, safer solution you would recommend for dynamic web scrapping?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论