可以使用请求模块从静态网页中刮擦信息

发布于 2025-02-07 10:55:41 字数 911 浏览 1 评论 0原文

我正在尝试获取产品标题，并且它是网页使用请求模块。标题和描述似乎是静态的，因为它们都存在于页面源中。但是，我没有尝试使用以下尝试来抓住它们。脚本访问attributeError此刻。

import requests
from bs4 import BeautifulSoup

link = 'https://www.nordstrom.com/s/anine-bing-womens-plaid-shirt/6638030'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}

with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    product_title = soup.select_one("h1[itemProp='name']").text
    product_desc = soup.select_one("#product-page-selling-statement").text
    print(product_title,product_desc)

如何使用请求模块从上面的页面刮擦标题和描述？

原文

I'm trying to fetch product title and it's description from a webpage using requests module. The title and description appear to be static as they both are present in page source. However, I failed to grab them using following attempt. The script throws AttributeError at this moment.

import requests
from bs4 import BeautifulSoup

link = 'https://www.nordstrom.com/s/anine-bing-womens-plaid-shirt/6638030'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}

with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    product_title = soup.select_one("h1[itemProp='name']").text
    product_desc = soup.select_one("#product-page-selling-statement").text
    print(product_title,product_desc)

How can I scrape title and description from above pages using requests module?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

好多鱼好多余 2025-02-14 10:55:41

页面是动态的。追求来自API源的数据：

import requests
import pandas as pd

api = 'https://www.nordstrom.com/api/ng-looks/styleId/6638030?customerId=f36cf526cfe94a72bfb710e5e155f9ba&limit=7'
jsonData = requests.get(api).json()

df = pd.json_normalize(jsonData['products'].values())

print(df.iloc[0])

输出：

id                                                       6638030-400
name                                  ANINE BING Women's Plaid Shirt
styleId                                                      6638030
styleNumber                                                         
colorCode                                                        400
colorName                                                       BLUE
brandLabelName                                            ANINE BING
hasFlatShot                                                     True
imageUrl           https://n.nordstrommedia.com/id/sr3/6d000f40-8...
price                                                        $149.00
pathAlias          anine-bing-womens-plaid-shirt/6638030?origin=c...
originalPrice                                                $149.00
productTypeLvl1                                                   12
productTypeLvl2                                                  216
isUmap                                                         False
Name: 0, dtype: object

The page is dynamic. go after the data from the api source:

import requests
import pandas as pd

api = 'https://www.nordstrom.com/api/ng-looks/styleId/6638030?customerId=f36cf526cfe94a72bfb710e5e155f9ba&limit=7'
jsonData = requests.get(api).json()

df = pd.json_normalize(jsonData['products'].values())

print(df.iloc[0])

Output:

id                                                       6638030-400
name                                  ANINE BING Women's Plaid Shirt
styleId                                                      6638030
styleNumber                                                         
colorCode                                                        400
colorName                                                       BLUE
brandLabelName                                            ANINE BING
hasFlatShot                                                     True
imageUrl           https://n.nordstrommedia.com/id/sr3/6d000f40-8...
price                                                        $149.00
pathAlias          anine-bing-womens-plaid-shirt/6638030?origin=c...
originalPrice                                                $149.00
productTypeLvl1                                                   12
productTypeLvl2                                                  216
isUmap                                                         False
Name: 0, dtype: object

回复收藏 0 原文

雪落纷纷 2025-02-14 10:55:41

测试此类请求时，您应该输出响应，以查看您的回复。最好使用Postman之类的东西（我认为VSCODE现在具有与之相似的功能）来设置URL，标题，方法和参数，并且还可以看到带标头的完整响应。当您将所有操作都正确时，只需将其转换为Python代码即可。 Postman甚至对普通语言具有一些“导出对代码”功能。

无论如何...

我尝试了您对Postman的请求，并得到了此答复：