达到某个标签后,如何停止find_next_sibling()?
我正在抓取Athletic.net,这是一个存储田径时间的网站。到目前为止,我已经打印了事件标题和时间,但是我的输出始终包含那个季节的所有时间,而不仅仅是该特定事件的时间。我正在使用一个任意数量的循环使用的for循环,但是我想找到_next_sibling(),直到该兄弟姐妹为H5标签,因为H5标签是每个事件的标题。简而言之,当find_next_sibling是H5标签时,如何停止循环?我认为这应该是一个简单的循环,但是我一直在努力实施它。
for text in soup.find_all('h5'):
if "Season" in str(text):
text_file.write(('\n' + '\n' + str(text.contents[0])) + '\n')
else:
text_file.write(str(text.contents[0]) + '\n')
block = ""
for i in range(0,100):
try:
text = text.find_next_sibling()
block = block + str(text) + '\n'
except:
print("miss")
soupBlock = BeautifulSoup(block)
for t in soupBlock.select('tr td:nth-of-type(2) [href^="/result"]'):
text_file.write(str(t.contents[0]) + '\n')
输出:
2021 Outdoor Season
800 Meters
2:14.81
2:12.32
4:43.62
4:44.21
4:42.11
10:26.85
10:09.89
10:21.49
1600 Meters
4:43.62
4:44.21
4:42.11
10:26.85
10:09.89
10:21.49
3200 Meters
10:26.85
10:09.89
10:21.49
所需的输出:
2021 Outdoor Season
800 Meters
2:14.81
2:12.32
1600 Meters
4:43.62
4:44.21
4:42.11
3200 Meters
10:26.85
10:09.89
10:21.49
I am scraping athletic.net, a website that stores track and field times. So far I have printed event titles and times, but my output contains all times from that season rather than only times for that specific event. I am using a for loop with an arbitrary number of loops, but instead I would like to find_next_sibling() until that sibling is an h5 tag, because h5 tags are the titles of each event. In short, how can I stop my for loop when find_next_sibling is an h5 tag? I think this should be a simple while loop, but I have struggled to implement it.
for text in soup.find_all('h5'):
if "Season" in str(text):
text_file.write(('\n' + '\n' + str(text.contents[0])) + '\n')
else:
text_file.write(str(text.contents[0]) + '\n')
block = ""
for i in range(0,100):
try:
text = text.find_next_sibling()
block = block + str(text) + '\n'
except:
print("miss")
soupBlock = BeautifulSoup(block)
for t in soupBlock.select('tr td:nth-of-type(2) [href^="/result"]'):
text_file.write(str(t.contents[0]) + '\n')
Output:
2021 Outdoor Season
800 Meters
2:14.81
2:12.32
4:43.62
4:44.21
4:42.11
10:26.85
10:09.89
10:21.49
1600 Meters
4:43.62
4:44.21
4:42.11
10:26.85
10:09.89
10:21.49
3200 Meters
10:26.85
10:09.89
10:21.49
Desired output:
2021 Outdoor Season
800 Meters
2:14.81
2:12.32
1600 Meters
4:43.62
4:44.21
4:42.11
3200 Meters
10:26.85
10:09.89
10:21.49
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是一个非常简单的问题,我对此进行了过度思考。筛选兄弟姐妹时,我只需要检查H5标签即可。
This is a very simple problem, I was overthinking it. I simply had to check for an h5 tag when sifting through the siblings.
尝试在您的问题中提供更多上下文,以便每个人都可以复制。
您可以以这种方式迭代树:
示例
输出
Try to give some more context in your questions so everybody could reproduce.
You could iterate the tree this way:
Example
Output