Powershell Invoke-WebRequest 可以工作,但 Python 请求不行

发布于 2025-01-17 01:09:23 字数 5661 浏览 0 评论 0原文

这是一个奇怪的情况,Powershell Invoke-WebRequest 按预期工作,而 Python 请求却没有。

我正在尝试使用 python 抓取电子商务网站。抓取的一部分是测试是否可以将商品添加到购物车。使用 Chrome 开发者工具 F12,我能够提取以下 Powershell 脚本。

第 1 步 - 请求客户会话

$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
  "Accept"="application/json, text/plain, */*"
  "Cache-Control"="no-cache"
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} | Select-Object -Expand RawContent

响应将给我一个“ECOM_SESS”cookie 以及其他一些cookie。

然后我会将 ECOM_SESS cookie 传递到下一步。

第 2 步 - 添加到购物车

$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$session.Cookies.Add((New-Object System.Net.Cookie("ECOM_SESS", "XXXXXXXXXXXXXXXX", "/", ".hermes.com")))
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method "POST" `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
  "Accept"="application/json, text/plain, */*"
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"

使用上面的 Powershell 脚本,该过程完美运行,我将从两个步骤中的每一个步骤中得到响应。请注意,这是使用旋转 IP 代理,它会在每次请求时刷新 IP 以防止机器人检测。

然而,当我尝试将其集成到我的 Python 代码中时,无论使用什么代理服务器,我都会在步骤 2 中遇到验证码的要求。

以下是相关的 Python 代码:

from __future__ import print_function
import bs4
import requests
from requests.cookies import RequestsCookieJar
import jsons

def main():
    url1= "https://bck.hermes.com/customer-session?locale=de_de"
    url2 = "https://bck.hermes.com/add-to-cart"
    proxies1 = {
        "http": "xxxxxxxxxxxxxxxxxx"
    }
    headers1 = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',         
            'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
            'Accept': 'application/json, text/plain, */*',
            'Cache-Control': 'no-cache',
            'DNT': '1',
            'sec-ch-ua-mobile': '?0',
            'sec-ch-ua-platform': '"Windows"',
            'Origin': 'https://www.hermes.com',
            'Sec-Fetch-Site': 'same-site',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Dest': 'document',
            'Referer': 'https://www.hermes.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
    }
    headers2 = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
            'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
            'Accept': 'application/json, text/plain, */*',
            'DNT': '1',
            'sec-ch-ua-mobile': '?0',
            'sec-ch-ua-platform': '"Windows"',
            'Origin': 'https://www.hermes.com',
            'Sec-Fetch-Site': 'same-site',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Dest': 'empty',
            'Referer': 'https://www.hermes.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
    }
    
    body2 = {"locale":"de_de","items":[{"category":"direct","sku":"H079082CCAC"}]}


    #Step 1

    f = requests.get(url1, headers=headers1,proxies=proxies1)
    print(f"1Response Body: {f.text}\n")
    ECOM_SESS = f.cookies['ECOM_SESS']
    cookieJar = RequestsCookieJar()
    cookieJar.set('ECOM_SESS', ECOM_SESS, domain='.hermes.com', path='/')

    #Step 2
    g = requests.post(url2, headers=headers2,cookies=cookieJar,proxies=proxies1,json=body2)
    print(f"2Response Body: {g.text}\n")

   

if __name__ == '__main__':
    main()

在此处运行 Python 代码,步骤 1 会很好地给出预期的响应,以及传递到步骤 2 所需的 cookie。但是,步骤 2 始终会产生验证码响应。

我只是好奇 Powershell Invoke-WebRequest 方法和 Python Requests 方法之间的区别,因为前者必须有一些根本上的不同才能完全避免验证码,而后者总是会受到验证码的影响。

感谢你们的任何想法和见解!谢谢!

This is about a weird situation where the Powershell Invoke-WebRequest works as intended and the Python Requests does not.

I am trying to scrape a ecommerce site using python. Part of the scraping is to test if an item can be added to cart. Using the Chrome Developer tools F12, I was able to extract the following Powershell scripts.

Step 1 - Request a customer session

$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
  "Accept"="application/json, text/plain, */*"
  "Cache-Control"="no-cache"
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} | Select-Object -Expand RawContent

The response would give me a "ECOM_SESS" cookie along with a bunch others.

I would then pass the ECOM_SESS cookie to the next step.

Step 2 - add to cart

$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$session.Cookies.Add((New-Object System.Net.Cookie("ECOM_SESS", "XXXXXXXXXXXXXXXX", "/", ".hermes.com")))
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method "POST" `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
  "Accept"="application/json, text/plain, */*"
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
  "Accept-Encoding"="gzip, deflate, br"
  "Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"

With the Powershell script above, the process works perfectly and I would get responses from each of the two steps. Note this is with a rotating IP proxy which refreshes the IP on each request to prevent bot detection.

However, when I tried to integrate this into my Python code, I would encounter the requirement of captcha upon Step 2, irrespective of the proxy server used.

Here is the relevant python code:

from __future__ import print_function
import bs4
import requests
from requests.cookies import RequestsCookieJar
import jsons

def main():
    url1= "https://bck.hermes.com/customer-session?locale=de_de"
    url2 = "https://bck.hermes.com/add-to-cart"
    proxies1 = {
        "http": "xxxxxxxxxxxxxxxxxx"
    }
    headers1 = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',         
            'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
            'Accept': 'application/json, text/plain, */*',
            'Cache-Control': 'no-cache',
            'DNT': '1',
            'sec-ch-ua-mobile': '?0',
            'sec-ch-ua-platform': '"Windows"',
            'Origin': 'https://www.hermes.com',
            'Sec-Fetch-Site': 'same-site',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Dest': 'document',
            'Referer': 'https://www.hermes.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
    }
    headers2 = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
            'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
            'Accept': 'application/json, text/plain, */*',
            'DNT': '1',
            'sec-ch-ua-mobile': '?0',
            'sec-ch-ua-platform': '"Windows"',
            'Origin': 'https://www.hermes.com',
            'Sec-Fetch-Site': 'same-site',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Dest': 'empty',
            'Referer': 'https://www.hermes.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
    }
    
    body2 = {"locale":"de_de","items":[{"category":"direct","sku":"H079082CCAC"}]}


    #Step 1

    f = requests.get(url1, headers=headers1,proxies=proxies1)
    print(f"1Response Body: {f.text}\n")
    ECOM_SESS = f.cookies['ECOM_SESS']
    cookieJar = RequestsCookieJar()
    cookieJar.set('ECOM_SESS', ECOM_SESS, domain='.hermes.com', path='/')

    #Step 2
    g = requests.post(url2, headers=headers2,cookies=cookieJar,proxies=proxies1,json=body2)
    print(f"2Response Body: {g.text}\n")

   

if __name__ == '__main__':
    main()

Running the Python code here, Step 1 would nicely give the intended response with the cookies needed to pass onto Step 2. However, Step 2 would always result in a captcha response.

I am just curious as to the difference between the Powershell Invoke-WebRequest method and the Python Requests method, as there has to be something fundamentally different for the former to avoid captcha completely and the latter to always get hit with captcha.

Would appreciate any thoughts and insights from you guys! Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

才能让你更想念 2025-01-24 01:09:23

我不确定具体是什么请求触发了网站上的机器人保护,但基于 this 你可能会幸运地使用:

requests.request("POST", url2, headers=headers2, cookies=cookieJar, proxies=proxies1, json=body2)

或者你可以尝试 urllib3 而不是请求。

这是您的 powershell 脚本,也经过简化,只是作为练习。

$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
$headers = @{
"sec-ch-ua"='" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"'
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
}
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-SessionVariable session `
-Headers $headers
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method POST `
-WebSession $session `
-Headers $headers `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"

I'm not sure specifically what it is about requests that's triggering the bot protection on the site, but based on this you might have luck using:

requests.request("POST", url2, headers=headers2, cookies=cookieJar, proxies=proxies1, json=body2)

Alternatively you could try urllib3 instead of Requests.

Here's your powershell script simplified too just as an excercise.

$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
$headers = @{
"sec-ch-ua"='" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"'
  "DNT"="1"
  "sec-ch-ua-mobile"="?0"
  "sec-ch-ua-platform"="`"Windows`""
  "Origin"="https://www.hermes.com"
  "Sec-Fetch-Site"="same-site"
  "Sec-Fetch-Mode"="cors"
  "Sec-Fetch-Dest"="empty"
  "Referer"="https://www.hermes.com/"
}
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-SessionVariable session `
-Headers $headers
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method POST `
-WebSession $session `
-Headers $headers `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文