从HTML检索具有儿童方向的数据使用Python

发布于 2025-01-25 11:45:47 字数 751 浏览 2 评论 0 原文

我正在尝试从

我使用XPath Finder具有特定的儿童方向[2]/table/tbody/tr [11]/td/b/a 。现在，我正在尝试从此页面检索电子邮件，但我对 fulausp 库的了解很少（我才刚刚开始）。阅读了几种指南后，我设法编写了以下代码，但是我并没有正确地指示孩子路线正确地指示

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
  
for i in child_soup.children:
    print("child :  ", i)

我在做什么错？

原文

I'm trying to get the email from the city from http://www.comuni-italiani.it/110/index.html

I have the speceific child direction using xPath Finder which is /html/body/span[3]/table[2]/tbody/tr[1]/td[2]/table/tbody/tr[11]/td/b/a. Now I'm trying to retrieve the email from this page but I know very little of BeatifulSoup library (I'm just getting started). After reading several guides I managed to write the following code, but I'm not succesfull with indicating the child route correctly

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
  
for i in child_soup.children:
    print("child :  ", i)

What am I doing wrong??

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一个人的夜不怕黑 2025-02-01 11:45:47

请在下面找到我解决您的问题的尝试。它的启动方式与您的代码相同，只有一些魔术可以找到电子邮件并将其打印出来。

from bs4 import BeautifulSoup
import requests
  
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])

Please find my attempt to solve your problem below. It starts the same way as in your code, just has a bit of magic to find the email and print it out.

from bs4 import BeautifulSoup
import requests
  
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])

回复收藏 0 原文

~没有更多了~