动态网络刮擦-Chromedriver Security

发布于 2025-01-28 15:59:33 字数 515 浏览 3 评论 0原文

我正在尝试Web-Scrap动态页面，简单的Urllib请求只会从FIST页面上获得结果，而不是返回整个集合。

from urllib import request
from bs4 import BeautifulSoup

URL = "https://www.olx.pl/d/nieruchomosci/mieszkania/warszawa/"
get_url = request.urlopen(URL)
get_page = get_url.read()
get_url.close()
print(get_page)

我正准备实施硒和ChromeDriver进行动态网络剪接，但随后我阅读了此解决方案的安全性。 Chromedriver绝不应该以特权考虑，最佳选择是虚拟机。

当我阅读论坛帖子时，几乎所有动态网络取消的解决方案都涉及Chromedriver / Selenium / Scrapy。这让我想知道所有用户是否设置了防火墙或VM来下载数据。

您是否建议您推荐其他更安全的解决方案用于动态网络报废？

原文

I am trying to web-scrap a dynamic page, simple urllib request gets me results from fist page only, instead of returning the whole set.

from urllib import request
from bs4 import BeautifulSoup

URL = "https://www.olx.pl/d/nieruchomosci/mieszkania/warszawa/"
get_url = request.urlopen(URL)
get_page = get_url.read()
get_url.close()
print(get_page)

I was about to implement selenium and chromedriver for the dynamic webscraping, but then I read about security of this solution. Chromedriver should never be run on account with privileges, best option would be virtual machine.

As I read forum posts, almost all solutions to dynamic web scrapping involve chromedriver / selenium/ scrapy. It makes me wonder if all users set up firewall or VM to download data.

Is there any other, safer solution you would recommend for dynamic web scrapping?

分享到QQ

分享到微博