如何在不使用硒和任何API的情况下在Python中刮擦Google Map?

发布于 2025-02-11 17:15:22 字数 1456 浏览 2 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

不即不离 2025-02-18 17:15:22

仅使用请求和BS4很难但可能。不完全确定您要解析的信息,但这应该对您有所帮助:

import requests, lxml, re, json
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

# works with different countries, languages
params = {
    "q": "mcdonalds",
    "gl": "jp",
    "hl": "ja", # japanese
}

response = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

local_results = []

for result in soup.select('.VkpGBb'):
  title = result.select_one('.dbg0pd span').text
  try:
      website = result.select_one('.yYlJEf.L48Cpd')['href']
  except:
      website = None

  try:
      directions = f"https://www.google.com{result.select_one('.yYlJEf.VByer')['data-url']}"
  except:
      directions = None
      
  address_not_fixed = result.select_one('.lqhpac div').text
  # removes phone number from "address_not_fixed" variable
  # https://regex101.com/r/cwLdY8/1
  address = re.sub(r' · ?.*', '', address_not_fixed)
  phone = ''.join(re.findall(r' · ?(.*)', address_not_fixed))
  
  try:
      hours = result.select_one('.dXnVAb').previous_element
  except:
      hours = None

  try:
      options = result.select_one('.dXnVAb').text.split('·')
  except:
      options = None

  local_results.append({
      'title': title,
      'phone': phone,
      'address': address,
      'hours': hours,
      'options': options,
      'website': website,
      'directions': directions,
  })

print(json.dumps(local_results, indent=2, ensure_ascii=False))

这是您可以回来的输出,希望这会有所帮助!

# English results:
   {
    "title": "McDonald's",
    "phone": "(620) 251-3330",
    "address": "Coffeyville, KS",
    "hours": " ⋅ Opens 5AM",
    "options": [
      "Curbside pickup",
      "Delivery"
    ],
    "website": "https://www.mcdonalds.com/us/en-us/location/KS/COFFEYVILLE/302-W-11TH/4581.html?cid=RF:YXT:GMB::Clicks",
    "directions": "https://www.google.com/maps/dir//McDonald's,+302+W+11th+St,+Coffeyville,+KS+67337/data=!4m6!4m5!1m1!4e2!1m2!1m1!1s0x87b784f6803e4c81:0xf5af9c9c89f19918?sa=X&hl=en&gl=us"
  }

Using just requests and bs4 is hard but possible. Not entirely sure what information you are trying to parse, but this should help you:

import requests, lxml, re, json
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

# works with different countries, languages
params = {
    "q": "mcdonalds",
    "gl": "jp",
    "hl": "ja", # japanese
}

response = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

local_results = []

for result in soup.select('.VkpGBb'):
  title = result.select_one('.dbg0pd span').text
  try:
      website = result.select_one('.yYlJEf.L48Cpd')['href']
  except:
      website = None

  try:
      directions = f"https://www.google.com{result.select_one('.yYlJEf.VByer')['data-url']}"
  except:
      directions = None
      
  address_not_fixed = result.select_one('.lqhpac div').text
  # removes phone number from "address_not_fixed" variable
  # https://regex101.com/r/cwLdY8/1
  address = re.sub(r' · ?.*', '', address_not_fixed)
  phone = ''.join(re.findall(r' · ?(.*)', address_not_fixed))
  
  try:
      hours = result.select_one('.dXnVAb').previous_element
  except:
      hours = None

  try:
      options = result.select_one('.dXnVAb').text.split('·')
  except:
      options = None

  local_results.append({
      'title': title,
      'phone': phone,
      'address': address,
      'hours': hours,
      'options': options,
      'website': website,
      'directions': directions,
  })

print(json.dumps(local_results, indent=2, ensure_ascii=False))

Here is the output that you will get back, hopefully this helps!:

# English results:
   {
    "title": "McDonald's",
    "phone": "(620) 251-3330",
    "address": "Coffeyville, KS",
    "hours": " ⋅ Opens 5AM",
    "options": [
      "Curbside pickup",
      "Delivery"
    ],
    "website": "https://www.mcdonalds.com/us/en-us/location/KS/COFFEYVILLE/302-W-11TH/4581.html?cid=RF:YXT:GMB::Clicks",
    "directions": "https://www.google.com/maps/dir//McDonald's,+302+W+11th+St,+Coffeyville,+KS+67337/data=!4m6!4m5!1m1!4e2!1m2!1m1!1s0x87b784f6803e4c81:0xf5af9c9c89f19918?sa=X&hl=en&gl=us"
  }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文