如何刮擦嵌套的Div类
嗨,我有一个本地HTML文件,其中包含聊天中的消息:
<div class="body">
<div class="pull_right date details" title="01.01.2022 01:01:01">
01:01
</div>
<div class="from_name">
XYZ
</div>
<div class="reply_to details">
In reply to <a href="#go_to_message23" onclick="return GoToMessage(747)">this message</a>
</div>
<div class="text">
Eat some chocolate
</div>
现在我想创建一个DF,显示每条消息的某些信息。例如,我提取了用以下内容写消息的用户名称:
# doc.select('div[id]')[2].select_one('.from_name').text.strip()
messages = doc.select('div[id]')
for message in messages:
print('---')
try:
print([message.select_one('.from_name').text.strip()])
except:
print("Couldn't find a name")
,但我不知道如何提取发送消息的日期。有人可以帮忙吗?谢谢
Hi, I have a local html file containing messages from a chat:
<div class="body">
<div class="pull_right date details" title="01.01.2022 01:01:01">
01:01
</div>
<div class="from_name">
XYZ
</div>
<div class="reply_to details">
In reply to <a href="#go_to_message23" onclick="return GoToMessage(747)">this message</a>
</div>
<div class="text">
Eat some chocolate
</div>
Now I want to create a df showing certain information for each message. E.g I extracted the name of the user writing the message with:
# doc.select('div[id]')[2].select_one('.from_name').text.strip()
messages = doc.select('div[id]')
for message in messages:
print('---')
try:
print([message.select_one('.from_name').text.strip()])
except:
print("Couldn't find a name")
But I can't figure out how to extract the date the message was sent. Can somebody help? Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只需选择您的元素并调用其属性
标题
提取值:示例
输出
您可以简单地将结果转换为数据框:
输出
Simply select your element and call its attribute
title
to extract the value:Example
Output
You could simply turn your results into a dataframe:
Output