Beautifulsoup 解析-详细信息
我已经问过一个问题,但似乎我的解释不清楚。 因此,我再次询问更多详细信息。
<h2 class="sectionTitle">
CORPORATE HEADQUARTERS </h2>
277 Park Avenue<br />
New York, New York 10172
<br /><br />United States<br /><br />
我只想提取纽约,纽约没有邮政编码 10172
这是另一个问题..
<h2 class="sectionTitle">
BACKGROUND</h2>
He graduated Blabala
</span>
我只想提取他毕业的 Blabla
我已经花了几天时间,所以我觉得我可能会变得疯狂.. 请帮助我..提前感谢您的帮助。
I already asked a question, but it seems my explnation was not clear..
So, I am asking again with more detail info.
<h2 class="sectionTitle">
CORPORATE HEADQUARTERS </h2>
277 Park Avenue<br />
New York, New York 10172
<br /><br />United States<br /><br />
I would like to extract only New York, New York without postal code 10172
And this is another question..
<h2 class="sectionTitle">
BACKGROUND</h2>
He graduated Blabala
</span>
I would like to extract only He graduated Blabla
I have been spending few days, so I feel I could become crazy..
Please help me.. thank you for your kind help in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您仍然需要更多细节才能编写好的正则表达式。
例如,如果要提取“CORPORATE HEADQUARTERS”的第二行,而没有始终存在的邮政编码,则可以这样写:
You still need more detail to write a good regex.
For example, if you want to extract the second line of "CORPORATE HEADQUARTERS" without a postal code that always exists, it can be written like this:
您应该使用
tag.contents
使用.split('\n') 逐行分割,
.rsplit(' ', 1)` 仅分割最右边的空格分隔字符串。You should use a combination of
tag.contents
with.split('\n') to split on lines and
.rsplit(' ', 1)` to split only the right most space-separated string.