从HTML检索具有儿童方向的数据使用Python

发布于 2025-01-25 11:45:47 字数 751 浏览 2 评论 0 原文

我正在尝试从

我使用XPath Finder具有特定的儿童方向[2]/table/tbody/tr [11]/td/b/a 。现在,我正在尝试从此页面检索电子邮件,但我对 fulausp 库的了解很少(我才刚刚开始)。阅读了几种指南后,我设法编写了以下代码,但是我并没有正确地指示孩子路线正确地指示

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
  
for i in child_soup.children:
    print("child :  ", i)

我在做什么错?

I'm trying to get the email from the city from http://www.comuni-italiani.it/110/index.html

I have the speceific child direction using xPath Finder which is /html/body/span[3]/table[2]/tbody/tr[1]/td[2]/table/tbody/tr[11]/td/b/a. Now I'm trying to retrieve the email from this page but I know very little of BeatifulSoup library (I'm just getting started). After reading several guides I managed to write the following code, but I'm not succesfull with indicating the child route correctly

from bs4 import BeautifulSoup
import requests
  
# sample web page
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
  
# call get method to request that page
page = requests.get(sample_web_page)
  
# with the help of beautifulSoup and html parser create soup
soup = BeautifulSoup(page.content, "html.parser")
child_soup = soup.find('span')
  
for i in child_soup.children:
    print("child :  ", i)

What am I doing wrong??

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一个人的夜不怕黑 2025-02-01 11:45:47

请在下面找到我解决您的问题的尝试。它的启动方式与您的代码相同,只有一些魔术可以找到电子邮件并将其打印出来。

from bs4 import BeautifulSoup
import requests
  
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])

Please find my attempt to solve your problem below. It starts the same way as in your code, just has a bit of magic to find the email and print it out.

from bs4 import BeautifulSoup
import requests
  
sample_web_page = 'http://www.comuni-italiani.it/110/index.html'
page = requests.get(sample_web_page)
soup = BeautifulSoup(page.content, "html.parser")
email = soup.select_one('b > a[href^="mail"]')['href']
print(email.split(':')[1])
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文