使用硒代理无法获得正确的结果

发布于 2025-02-01 22:03:55 字数 1893 浏览 5 评论 0原文

我的功能可以正常工作,但没有代理。 它包含html中的内容,当我从a

def extract_listing_html(url):
    
    driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
    driver = webdriver.Chrome(service = Service(driver_path))
    driver.get(url)  
    time.sleep(5)
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    
    return soup

我想使用代理和这是我到目前为止所拥有的,但是我没有得到与不使用代理时相同的结果:

def extract_listing_html(url):
    
    PROXY = "164.155.145.1:80" 
    driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
    chrome_options = Options()  
    chrome_options.add_argument('--proxy-server=%s' "http://" +PROXY)
    driver = webdriver.Chrome(service = Service(driver_path), options = chrome_options)
    driver.get(url)  
    time.sleep(5)
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    
    return soup

我在Web> Web driver中发现option = chrome_options .Chrome()是导致其不是返回同一html的原因,但我不确定。

html,没有proxy

html具有代理

它们看起来完全不同,不确定是什么原因引起的。

进口:

import time 
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

I have this function that works properly, but without proxy.
It contains contents in the HTML that I need when I extract it from a website:

def extract_listing_html(url):
    
    driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
    driver = webdriver.Chrome(service = Service(driver_path))
    driver.get(url)  
    time.sleep(5)
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    
    return soup

I want to use a proxy and this is what I have so far, but I am not getting the same results as when I am not using a proxy:

def extract_listing_html(url):
    
    PROXY = "164.155.145.1:80" 
    driver_path = "C:/Users/parkj/Downloads/chromedriver_win32/chromedriver.exe"
    chrome_options = Options()  
    chrome_options.add_argument('--proxy-server=%s' "http://" +PROXY)
    driver = webdriver.Chrome(service = Service(driver_path), options = chrome_options)
    driver.get(url)  
    time.sleep(5)
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    
    return soup

I played around with it and found out that adding options = chrome_options in webdriver.Chrome( ) is what is causing it to not return the same HTML, but I'm not sure.

HTML Without Proxy

HTML With Proxy

They look quite different, not sure what is causing it.

Imports:

import time 
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

枫以 2025-02-08 22:03:55

相反,使用用户代理给我带来了结果。

使用pip install pyyaml ua-parser用户代理nake-useragent安装facke_useragent

from fake_useragent import UserAgent

def extract_listing_html(url):
    
    opts = Options()
    ua = UserAgent()
    userAgent = ua.random
    print(userAgent)
    opts.add_argument(f'user-agent={userAgent}')
    driver = webdriver.Chrome(service = Service(driver_path), options = opts)
    driver.get(url)  
    time.sleep(5)
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    
    return soup

Using a user-agent instead has given me results.

Use pip install pyyaml ua-parser user-agents fake-useragent to install fake_useragent

from fake_useragent import UserAgent

def extract_listing_html(url):
    
    opts = Options()
    ua = UserAgent()
    userAgent = ua.random
    print(userAgent)
    opts.add_argument(f'user-agent={userAgent}')
    driver = webdriver.Chrome(service = Service(driver_path), options = opts)
    driver.get(url)  
    time.sleep(5)
    html = driver.page_source
    soup = BeautifulSoup(html, "html.parser")
    
    return soup
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文