无法获得“ myntra”的正确html响应。网站使用requests.get()

发布于 2025-02-09 15:02:48 字数 1172 浏览 1 评论 0原文

我在刮擦“ myntra”网站时遇到了一个问题。我正在尝试刮擦价格和可用性。使用requests.get()时,我在Localhost中获取网站的内容,但无法在Google Colab中获得。响应是LT; 200>两次,但是在COLAB中获得了网站维护HTML页面的响应,而在本地主机下,一切正常。我真的不明白发生了什么。如果有人能帮助我解决这个问题,我真的很感激。一切都在本地工作正常,但在服务器中不行

会员

s=requests.session()
url="https://www.myntra.com/jeans/levis/levis-512-men-black-slim-tapered-fit-mid-rise-clean-look-light-fade-stretchable-jeans/16612780/buy?utm_campaign=_3_&utm_medium=affiliate&utm_source=grabon"
page=s.get(url)
page.content

/buy?utm_campaign =

b'<!doctype html> <html> <head>     <title>Site Maintenance</title>     <style type="text/css">body { text-align: center; padding: 150px; }h1 { font-size: 40px; }body { font: 16px Helvetica, sans-serif; color: #333; }#error { display: block; text-align: left; width: 650px; margin: 0 auto; }</style> </head> <body>     <div id="error">     <h1>Oops! Something went wrong</h1>     <div>         <hr>         <p>Please contact your administrator</p>     </div>     </div> </body> </html>'

=

I have come across a problem while scraping "myntra" website. I am trying to scrape prices and availability. when using requests.get(), I get the content of the website in localhost but can't get it in google colab. The response is <200> both times but getting a response of site maintenance HTML page in colab whereas everything is working fine under the local host. I really can't understand what's going on. I would be really thankful if anyone could help me out with this. Everything is working fine locally but not in server
Myntra product link

My Code:

s=requests.session()
url="https://www.myntra.com/jeans/levis/levis-512-men-black-slim-tapered-fit-mid-rise-clean-look-light-fade-stretchable-jeans/16612780/buy?utm_campaign=_3_&utm_medium=affiliate&utm_source=grabon"
page=s.get(url)
page.content

Output:

b'<!doctype html> <html> <head>     <title>Site Maintenance</title>     <style type="text/css">body { text-align: center; padding: 150px; }h1 { font-size: 40px; }body { font: 16px Helvetica, sans-serif; color: #333; }#error { display: block; text-align: left; width: 650px; margin: 0 auto; }</style> </head> <body>     <div id="error">     <h1>Oops! Something went wrong</h1>     <div>         <hr>         <p>Please contact your administrator</p>     </div>     </div> </body> </html>'

getting correct content locally

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

九八野马 2025-02-16 15:02:48

此网络使用反机器人安全 akamai bot Manager
如何绕过 akamai bot Manager
在这种情况下,仅添加用户代理,它起作用。用户代理用于指示您使用的是真正的浏览器。
此外,此站点需要 JavaScript才能渲染所有内容,但是您可以找到脚本标签,其中包含一个带有所有信息的JSON。

在这里,您有一些工作代码:

import requests
from bs4 import BeautifulSoup
import json

headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
src = requests.get(
    'https://www.myntra.com/jeans/levis/levis-512-men-black-slim-tapered-fit-mid-rise-clean-look-light-fade-stretchable-jeans/16612780/buy?utm_campaign=_3_&utm_medium=affiliate&utm_source=grabon',
    headers=headers)
result = src.content
soup = BeautifulSoup(result, 'html.parser')

scripts = soup.find_all('script')

for script in scripts:
    if "price" in script.text:
        json = json.loads(script.text.replace("\t", "").replace("\n", ""))
        print("Name: " + json["name"])
        print("Sku: " + json["sku"])
        print("Description: " + json["description"])
        print("Stock: " + json["offers"]["availability"])
        print("Price: " + json["offers"]["price"])
        break

输出:

Name: Levis 512 Men Black Slim Tapered Fit Mid-Rise Clean Look Light Fade Stretchable Jeans
Sku: 16612780
Description: Levis 512 Men Black Slim Tapered Fit Mid-Rise Clean Look Light Fade Stretchable Jeans
Stock: InStock
Price: 3329

This web use anti-bot security Akamai Bot Manager.
How to bypass Akamai Bot Manager?
In this case just adding User-Agent it works. User-agent's are used to indicate that you are using a real browser.
Also this site needs javascript to render all content, but you can find script tag that has one json with all info you need.

Here you have some working code:

import requests
from bs4 import BeautifulSoup
import json

headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
src = requests.get(
    'https://www.myntra.com/jeans/levis/levis-512-men-black-slim-tapered-fit-mid-rise-clean-look-light-fade-stretchable-jeans/16612780/buy?utm_campaign=_3_&utm_medium=affiliate&utm_source=grabon',
    headers=headers)
result = src.content
soup = BeautifulSoup(result, 'html.parser')

scripts = soup.find_all('script')

for script in scripts:
    if "price" in script.text:
        json = json.loads(script.text.replace("\t", "").replace("\n", ""))
        print("Name: " + json["name"])
        print("Sku: " + json["sku"])
        print("Description: " + json["description"])
        print("Stock: " + json["offers"]["availability"])
        print("Price: " + json["offers"]["price"])
        break

OUTPUT:

Name: Levis 512 Men Black Slim Tapered Fit Mid-Rise Clean Look Light Fade Stretchable Jeans
Sku: 16612780
Description: Levis 512 Men Black Slim Tapered Fit Mid-Rise Clean Look Light Fade Stretchable Jeans
Stock: InStock
Price: 3329
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文