刮擦刮擦的URL(嵌套)
在刮擦的第一部分中,获取公园名称,其中包含详细信息,包括链接(URL)到公园页面。我想从Scrated URL(链接)中获取电话号码,并将它们全部显示在一起。
from bs4 import BeautifulSoup
import requests
import re
def get_parknames():
html_text = requests.get('http://www.jump-parks.com/en/trampoline-parks/usa/').text
soup = BeautifulSoup(html_text, 'lxml')
parks = soup.find_all('div', class_ = 'grid__item')
for park in parks:
park_name = park.find('h3', class_ = 'card__title').text
state = park.find('span', class_ = "address__country_long")
country = park.find('span', {'itemprop' : 'addressCountry'}).text
link = park.find('a', attrs={'href': re.compile("^https://")})
html_text2 = requests.get(link)
soup2 = BeautifulSoup(html_text2, 'lxml')
phones = soup.find_all('div', class_ = 'single-meta')
for phone in phones:
phone_number = phone.find('a', attrs={'href': re.compile("")})
print(f'''
Park Name: {park_name}
State: {state}
Country: {country}
Link: {link['href']}
Phone: {phone_number}
''')
if __name__ == '__main__':
get_parknames()
In the first part of scraping, getting park names with details including links (URL) to the park pages. I want to get phone numbers from scraped URL's (link) and show them all together.
from bs4 import BeautifulSoup
import requests
import re
def get_parknames():
html_text = requests.get('http://www.jump-parks.com/en/trampoline-parks/usa/').text
soup = BeautifulSoup(html_text, 'lxml')
parks = soup.find_all('div', class_ = 'grid__item')
for park in parks:
park_name = park.find('h3', class_ = 'card__title').text
state = park.find('span', class_ = "address__country_long")
country = park.find('span', {'itemprop' : 'addressCountry'}).text
link = park.find('a', attrs={'href': re.compile("^https://")})
html_text2 = requests.get(link)
soup2 = BeautifulSoup(html_text2, 'lxml')
phones = soup.find_all('div', class_ = 'single-meta')
for phone in phones:
phone_number = phone.find('a', attrs={'href': re.compile("")})
print(f'''
Park Name: {park_name}
State: {state}
Country: {country}
Link: {link['href']}
Phone: {phone_number}
''')
if __name__ == '__main__':
get_parknames()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您看到的数据已加载来自不同URL的JavaScript。要获取所有页面,您可以使用下一个示例:
打印:
编辑:获取电话号码:
The data you see is loaded with JavaScript from different URL. To get all pages you can use next example:
Prints:
EDIT: To get phone numbers: