网站上数字和详细信息的数据刮擦

发布于 2025-02-05 07:27:34 字数 933 浏览 3 评论 0原文

我想从网站上刮擦Courier Services的各个详细信息。我无法刮擦所有快递服务的联系电话和其他详细信息,例如名称地址和评级。我分析了数据在脚本标签中。请为此提出解决方案

import requests
import pandas as pd
import json
import csv
from lxml import html
import re
headers ={'authority': 'www.justdial.com',
      'accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 ',
      'accept-encoding': 'gzip, deflate, br',
      'accept-language':'en-US,en;q=0.9',
      'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36" }


produrl = 'https://www.justdial.com/Mumbai/Courier-Services-in-Mumbai-Bazar-Nalasopara-East/nct-10142628'
prodresp = requests.get(produrl, headers=headers, timeout=30)
prodResphtml = html.fromstring(prodresp.text)
partjson = prodResphtml.xpath('/html/head/script[9]/text()')
print(partjson)


  

I want to scrape the contact numbers from the website with the respective details of the Courier Services. I am not able to scrape the Contact numbers and other details like name address and rating from all the Courier services. I analyzed the data is in the script tag. Please suggest a fix for this

import requests
import pandas as pd
import json
import csv
from lxml import html
import re
headers ={'authority': 'www.justdial.com',
      'accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 ',
      'accept-encoding': 'gzip, deflate, br',
      'accept-language':'en-US,en;q=0.9',
      'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36" }


produrl = 'https://www.justdial.com/Mumbai/Courier-Services-in-Mumbai-Bazar-Nalasopara-East/nct-10142628'
prodresp = requests.get(produrl, headers=headers, timeout=30)
prodResphtml = html.fromstring(prodresp.text)
partjson = prodResphtml.xpath('/html/head/script[9]/text()')
print(partjson)


  

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

顾忌 2025-02-12 07:27:34

这是一个数据,即来自那里的Ajax API调用;

https://www.justdial.com/api/india_api_write/20march2020/searchziva.php?city=mumbai = mumbai&area = mumbi = mumbi = mumbi = mumbai-bazar-nalasopara-east&east&east&amp = =spcall&stype=category_list&search=Courier-Services&national_catid=10142628&nextdocid=&attribute_values=&basedon=&sortby=&nearme=0&max=100&pg_no=1

That's data are coming with ajax api call from there;

https://www.justdial.com/api/india_api_write/20march2020/searchziva.php?city=Mumbai&area=Mumbai-Bazar-Nalasopara-East&lat=&long=&darea_flg=0&case=spcall&stype=category_list&search=Courier-Services&national_catid=10142628&nextdocid=&attribute_values=&basedon=&sortby=&nearme=0&max=100&pg_no=1

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文