获取子标签的信息
我正在尝试通过网络刮擦从网站检索信息。我需要的信息是在子标签中找到的,但是我无法获得它,
<div class="ergov3-txtannonce">
<div class="ergov3-h3"><span>
House
3
pièces,
74 m²
</span>
<cite>
New York (11111)
</cite>
</div>
</div>,
<div class="ergov3-txtannonce">
<div class="ergov3-h3"><span>
Appartement
3
pièces,
64 m²
</span>
<cite>
Los Angeles (22222)
</cite>
</div>
<div class="ergov3-txtannonce">
<div class="ergov3-h3"><span>
House
4
pièces,
81 m²
</span>
<cite>
Chicago (33333)
</cite>
</div>
我正在尝试获取广告和城市。我尝试了:
#BeautifulSoup
from bs4 import BeautifulSoup
import requests
#to get: House 3 pièces, 74 m²
ad = [ad.get_text() for ad in soup.find_all("span", class_='ergov3-txtannonce')]
#to get cities
cities = [city.get_text() for city in soup.find_all("cite", class_='ergov3-txtannonce')]
我的输出:
[]
[]
良好的输出:
["House 3 pièces, 74 m²", "Appartement 3 pièces, 64 m²", "House 4 pièces, 81 m²"]
["New York (11111)", "Los Angeles (22222)", "Chicago (33333)"]
I'm trying to retrieve information from a site by web scraping. The information I need is found in sub-tabs, but I'm not able to get it
<div class="ergov3-txtannonce">
<div class="ergov3-h3"><span>
House
3
pièces,
74 m²
</span>
<cite>
New York (11111)
</cite>
</div>
</div>,
<div class="ergov3-txtannonce">
<div class="ergov3-h3"><span>
Appartement
3
pièces,
64 m²
</span>
<cite>
Los Angeles (22222)
</cite>
</div>
<div class="ergov3-txtannonce">
<div class="ergov3-h3"><span>
House
4
pièces,
81 m²
</span>
<cite>
Chicago (33333)
</cite>
</div>
I'm trying to get the ad and the city. I tried:
#BeautifulSoup
from bs4 import BeautifulSoup
import requests
#to get: House 3 pièces, 74 m²
ad = [ad.get_text() for ad in soup.find_all("span", class_='ergov3-txtannonce')]
#to get cities
cities = [city.get_text() for city in soup.find_all("cite", class_='ergov3-txtannonce')]
My output:
[]
[]
Good output:
["House 3 pièces, 74 m²", "Appartement 3 pièces, 64 m²", "House 4 pièces, 81 m²"]
["New York (11111)", "Los Angeles (22222)", "Chicago (33333)"]
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设您
汤
包含提供的html
选择包含您的信息的元素并迭代resultset
以刮擦信息。避免多个列表,尝试一次刮擦所有信息并以更结构化的方式保存:注意: 如果汤中不存在元素,则网站的内容可能会通过
javascript
- 这将是问一个新问题的预定输出
Assuming you
soup
contains the providedHTML
select the elements that holds your information and iterate over theResultSet
to scrape the information. avoid multiple lists, try to scrape all information in one go and save it in a more structured way:Note: If the elements are not present in your soup, content of website may provided dynamically by
JavaScript
- This would be predestined for asking a new question with exact this focusExample
Output