用JSON刮擦多个页面

发布于 2025-02-13 03:15:03 字数 1769 浏览 1 评论 0原文

我正在尝试用JSON刮擦多个页面,但它们会为我提供错误

    import requests
    import json
    import pandas as pd
    headers = {
        'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,pt;q=0.7',
        'Connection': 'keep-alive',
        'Origin': 'https://www.nationalhardwareshow.com',
        'Referer': 'https://www.nationalhardwareshow.com/',
        'Sec-Fetch-Dest': 'empty',
        'Sec-Fetch-Mode': 'cors',
        'Sec-Fetch-Site': 'cross-site',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
        'accept': 'application/json',
        'content-type': 'application/x-www-form-urlencoded',
        'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
    }
    
    params = {
        'x-algolia-agent': 'Algolia for vanilla JavaScript 3.27.1',
        'x-algolia-application-id': 'XD0U5M6Y4R',
        'x-algolia-api-key': 'd5cd7d4ec26134ff4a34d736a7f9ad47',
    }
    for i in range(0,4):
        data = '{"params":"query=&page={i}&facetFilters=&optionalFilters=%5B%5D"}'
    
        resp = requests.post('https://xd0u5m6y4r-dsn.algolia.net/1/indexes/event-edition-eve-e6b1ae25-5b9f-457b-83b3-335667332366_en-us/query', params=params, headers=headers, data=data).json()
    
        req_json=resp
        df = pd.DataFrame(req_json['hits'])
        f = pd.DataFrame(df[['name','representedBrands','description']])
        print(f)

错误:

Traceback (most recent call last):
File "e:\ScriptScraping\Extract data from json\uk.py", line 31, in <module>
df = pd.DataFrame(req_json['hits']) KeyError: 'hits'

I am trying to scrape multiple pages with json but they will provide me error

    import requests
    import json
    import pandas as pd
    headers = {
        'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,pt;q=0.7',
        'Connection': 'keep-alive',
        'Origin': 'https://www.nationalhardwareshow.com',
        'Referer': 'https://www.nationalhardwareshow.com/',
        'Sec-Fetch-Dest': 'empty',
        'Sec-Fetch-Mode': 'cors',
        'Sec-Fetch-Site': 'cross-site',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
        'accept': 'application/json',
        'content-type': 'application/x-www-form-urlencoded',
        'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
        'sec-ch-ua-mobile': '?0',
        'sec-ch-ua-platform': '"Windows"',
    }
    
    params = {
        'x-algolia-agent': 'Algolia for vanilla JavaScript 3.27.1',
        'x-algolia-application-id': 'XD0U5M6Y4R',
        'x-algolia-api-key': 'd5cd7d4ec26134ff4a34d736a7f9ad47',
    }
    for i in range(0,4):
        data = '{"params":"query=&page={i}&facetFilters=&optionalFilters=%5B%5D"}'
    
        resp = requests.post('https://xd0u5m6y4r-dsn.algolia.net/1/indexes/event-edition-eve-e6b1ae25-5b9f-457b-83b3-335667332366_en-us/query', params=params, headers=headers, data=data).json()
    
        req_json=resp
        df = pd.DataFrame(req_json['hits'])
        f = pd.DataFrame(df[['name','representedBrands','description']])
        print(f)

the error :

Traceback (most recent call last):
File "e:\ScriptScraping\Extract data from json\uk.py", line 31, in <module>
df = pd.DataFrame(req_json['hits']) KeyError: 'hits'

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

难忘№最初的完美 2025-02-20 03:15:03

尝试将变量i与数据参数相连

import requests
import json
import pandas as pd
headers = {
    'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,pt;q=0.7',
    'Connection': 'keep-alive',
    'Origin': 'https://www.nationalhardwareshow.com',
    'Referer': 'https://www.nationalhardwareshow.com/',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'cross-site',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'accept': 'application/json',
    'content-type': 'application/x-www-form-urlencoded',
    'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"'
    }
    
params = {
    'x-algolia-agent': 'Algolia for vanilla JavaScript 3.27.1',
    'x-algolia-application-id': 'XD0U5M6Y4R',
    'x-algolia-api-key': 'd5cd7d4ec26134ff4a34d736a7f9ad47'
    }
lst=[]
for i in range(0,4):
    data = '{"params":"query=&page='+str(i)+'&facetFilters=&optionalFilters=%5B%5D"}'
    
    resp = requests.post('https://xd0u5m6y4r-dsn.algolia.net/1/indexes/event-edition-eve-e6b1ae25-5b9f-457b-83b3-335667332366_en-us/query', params=params, headers=headers, data=data).json()
    
    req_json=resp
    df = pd.DataFrame(req_json['hits'])
    f = pd.DataFrame(df[['name','representedBrands','description']])
    lst.append(f)
    #print(f)
d=pd.concat(lst)
print(d)

Try to concatenate the variable i with data parameter

import requests
import json
import pandas as pd
headers = {
    'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8,pt;q=0.7',
    'Connection': 'keep-alive',
    'Origin': 'https://www.nationalhardwareshow.com',
    'Referer': 'https://www.nationalhardwareshow.com/',
    'Sec-Fetch-Dest': 'empty',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Site': 'cross-site',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
    'accept': 'application/json',
    'content-type': 'application/x-www-form-urlencoded',
    'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"'
    }
    
params = {
    'x-algolia-agent': 'Algolia for vanilla JavaScript 3.27.1',
    'x-algolia-application-id': 'XD0U5M6Y4R',
    'x-algolia-api-key': 'd5cd7d4ec26134ff4a34d736a7f9ad47'
    }
lst=[]
for i in range(0,4):
    data = '{"params":"query=&page='+str(i)+'&facetFilters=&optionalFilters=%5B%5D"}'
    
    resp = requests.post('https://xd0u5m6y4r-dsn.algolia.net/1/indexes/event-edition-eve-e6b1ae25-5b9f-457b-83b3-335667332366_en-us/query', params=params, headers=headers, data=data).json()
    
    req_json=resp
    df = pd.DataFrame(req_json['hits'])
    f = pd.DataFrame(df[['name','representedBrands','description']])
    lst.append(f)
    #print(f)
d=pd.concat(lst)
print(d)
初熏 2025-02-20 03:15:03

由于请求不好,因此它正在返回status_code 400。您正在发送错误的格式数据。更改:

data ='{“ params”:“ query =&amp; page = {i}&amp; facetfilters =&amp; amp; optionalFilters =%5b%5D“}'}'}'

to

data = '{“ params”:“ query =&amp; page ='+str(i)+'&amp; facetfilters =&amp; optionalFilters =%5b%5D“}'}'

使其工作。希望我能帮忙。

It is returning status_code 400 as the request is bad. You are sending wrongly formatted data. Change:

data = '{"params":"query=&page={i}&facetFilters=&optionalFilters=%5B%5D"}'

To

data = '{"params":"query=&page='+str(i)+'&facetFilters=&optionalFilters=%5B%5D"}'

For it to work. Hope I could help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文