当前位置：文江博客话题详情

如何在不使用硒和任何API的情况下在Python中刮擦Google Map？

发布于 2025-02-11 17:15:22 字数 1456 浏览 2 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不即不离 2025-02-18 17:15:22

仅使用请求和BS4很难但可能。不完全确定您要解析的信息，但这应该对您有所帮助：

import requests, lxml, re, json
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

# works with different countries, languages
params = {
    "q": "mcdonalds",
    "gl": "jp",
    "hl": "ja", # japanese
}

response = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

local_results = []

for result in soup.select('.VkpGBb'):
  title = result.select_one('.dbg0pd span').text
  try:
      website = result.select_one('.yYlJEf.L48Cpd')['href']
  except:
      website = None

  try:
      directions = f"https://www.google.com{result.select_one('.yYlJEf.VByer')['data-url']}"
  except:
      directions = None
      
  address_not_fixed = result.select_one('.lqhpac div').text
  # removes phone number from "address_not_fixed" variable
  # https://regex101.com/r/cwLdY8/1
  address = re.sub(r' · ?.*', '', address_not_fixed)
  phone = ''.join(re.findall(r' · ?(.*)', address_not_fixed))
  
  try:
      hours = result.select_one('.dXnVAb').previous_element
  except:
      hours = None

  try:
      options = result.select_one('.dXnVAb').text.split('·')
  except:
      options = None

  local_results.append({
      'title': title,
      'phone': phone,
      'address': address,
      'hours': hours,
      'options': options,
      'website': website,
      'directions': directions,
  })

print(json.dumps(local_results, indent=2, ensure_ascii=False))

这是您可以回来的输出，希望这会有所帮助！

# English results:
   {
    "title": "McDonald's",
    "phone": "(620) 251-3330",
    "address": "Coffeyville, KS",
    "hours": " ⋅ Opens 5AM",
    "options": [
      "Curbside pickup",
      "Delivery"
    ],
    "website": "https://www.mcdonalds.com/us/en-us/location/KS/COFFEYVILLE/302-W-11TH/4581.html?cid=RF:YXT:GMB::Clicks",
    "directions": "https://www.google.com/maps/dir//McDonald's,+302+W+11th+St,+Coffeyville,+KS+67337/data=!4m6!4m5!1m1!4e2!1m2!1m1!1s0x87b784f6803e4c81:0xf5af9c9c89f19918?sa=X&hl=en&gl=us"
  }

Using just requests and bs4 is hard but possible. Not entirely sure what information you are trying to parse, but this should help you:

import requests, lxml, re, json
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

# works with different countries, languages
params = {
    "q": "mcdonalds",
    "gl": "jp",
    "hl": "ja", # japanese
}

response = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

local_results = []

for result in soup.select('.VkpGBb'):
  title = result.select_one('.dbg0pd span').text
  try:
      website = result.select_one('.yYlJEf.L48Cpd')['href']
  except:
      website = None

  try:
      directions = f"https://www.google.com{result.select_one('.yYlJEf.VByer')['data-url']}"
  except:
      directions = None
      
  address_not_fixed = result.select_one('.lqhpac div').text
  # removes phone number from "address_not_fixed" variable
  # https://regex101.com/r/cwLdY8/1
  address = re.sub(r' · ?.*', '', address_not_fixed)
  phone = ''.join(re.findall(r' · ?(.*)', address_not_fixed))
  
  try:
      hours = result.select_one('.dXnVAb').previous_element
  except:
      hours = None

  try:
      options = result.select_one('.dXnVAb').text.split('·')
  except:
      options = None

  local_results.append({
      'title': title,
      'phone': phone,
      'address': address,
      'hours': hours,
      'options': options,
      'website': website,
      'directions': directions,
  })

print(json.dumps(local_results, indent=2, ensure_ascii=False))

Here is the output that you will get back, hopefully this helps!:

# English results:
   {
    "title": "McDonald's",
    "phone": "(620) 251-3330",
    "address": "Coffeyville, KS",
    "hours": " ⋅ Opens 5AM",
    "options": [
      "Curbside pickup",
      "Delivery"
    ],
    "website": "https://www.mcdonalds.com/us/en-us/location/KS/COFFEYVILLE/302-W-11TH/4581.html?cid=RF:YXT:GMB::Clicks",
    "directions": "https://www.google.com/maps/dir//McDonald's,+302+W+11th+St,+Coffeyville,+KS+67337/data=!4m6!4m5!1m1!4e2!1m2!1m1!1s0x87b784f6803e4c81:0xf5af9c9c89f19918?sa=X&hl=en&gl=us"
  }

回复收藏 0 原文

~没有更多了~

关于作者

維他命╮

暂无简介

文章

28 人气

关注发私信

李珊平

文章 0 评论 0

关注

Quxin

文章 0 评论 0

关注

范无咎

文章 0 评论 0

关注

github_ZOJ2N8YxBm

文章 0 评论 0

关注

若言

文章 0 评论 0

关注

南…巷孤猫

文章 0 评论 0

友情链接

文江博客

如何在不使用硒和任何API的情况下在Python中刮擦Google Map？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如何在不使用硒和任何API的情况下在Python中刮擦Google Map？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。