使用硒代理无法获得正确的结果
我的功能可以正常工作,但没有代理。 它包含html中的内容,当我从a :
def extract_listing_html(url):
driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(service = Service(driver_path))
driver.get(url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
return soup
我想使用代理和这是我到目前为止所拥有的,但是我没有得到与不使用代理时相同的结果:
def extract_listing_html(url):
PROXY = "164.155.145.1:80"
driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
chrome_options = Options()
chrome_options.add_argument('--proxy-server=%s' "http://" +PROXY)
driver = webdriver.Chrome(service = Service(driver_path), options = chrome_options)
driver.get(url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
return soup
我在Web> Web driver中发现
是导致其不是返回同一html的原因,但我不确定。option = chrome_options
.Chrome()
它们看起来完全不同,不确定是什么原因引起的。
进口:
import time
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I have this function that works properly, but without proxy.
It contains contents in the HTML that I need when I extract it from a website:
def extract_listing_html(url):
driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
driver = webdriver.Chrome(service = Service(driver_path))
driver.get(url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
return soup
I want to use a proxy and this is what I have so far, but I am not getting the same results as when I am not using a proxy:
def extract_listing_html(url):
PROXY = "164.155.145.1:80"
driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
chrome_options = Options()
chrome_options.add_argument('--proxy-server=%s' "http://" +PROXY)
driver = webdriver.Chrome(service = Service(driver_path), options = chrome_options)
driver.get(url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
return soup
I played around with it and found out that adding options = chrome_options
in webdriver.Chrome( )
is what is causing it to not return the same HTML, but I'm not sure.
They look quite different, not sure what is causing it.
Imports:
import time
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
相反,使用用户代理给我带来了结果。
使用
pip install pyyaml ua-parser用户代理nake-useragent
安装facke_useragentUsing a user-agent instead has given me results.
Use
pip install pyyaml ua-parser user-agents fake-useragent
to install fake_useragent