查找 > > 之间的字符串的正则表达式是什么？且<

发布于 12-28 16:09 字数 357 浏览 4 评论 0 原文

我有一个 HTML 文件，

 ...<b>Breakfast</b><hr>...

我想要 Breakfast，该文件位于 > 和 < 之间。

我尝试过

...for test_string in line:
        if re.match(r'(>.*<$)',test_string):...

，但也没有提供 >Breakfast< 。

谢谢。

原文

I have a HTML file

 ...<b>Breakfast</b><hr>...

I want Breakfast which is between > and <.

I tried

...for test_string in line:
        if re.match(r'(>.*<$)',test_string):...

That didn't give >Breakfast< either.

Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤蝉 2025-01-04 16:09:47

一般来说正则表达式不能解析html。您可以使用 html 解析器来代替：

from BeautifulSoup import BeautifulSoup # pip install BeautifulSoup

html = """...<b>Breakfast</b><hr>..."""

soup = BeautifulSoup(html)
print soup(text=True) # get all text
# -> [u'...', u'Breakfast', u'...']
print [b.text for b in soup('b')] # get all text for <b> tags
# -> [u'Breakfast']

In general regular expression can't parse html. You could use an html parser instead:

from BeautifulSoup import BeautifulSoup # pip install BeautifulSoup

html = """...<b>Breakfast</b><hr>..."""

soup = BeautifulSoup(html)
print soup(text=True) # get all text
# -> [u'...', u'Breakfast', u'...']
print [b.text for b in soup('b')] # get all text for <b> tags
# -> [u'Breakfast']

回复收藏 0 原文

誰ツ都不明白 2025-01-04 16:09:47

$ 表示“输入结束”，不属于此正则表达式。

相反，请执行以下操作：

m = re.search(r'>([^<]*)<', test_string)
if m:
    print m.group(1)

这将搜索 >，然后搜索所有以下非 < 的字符，然后搜索 <。 > 和 < 之间的字符被标记为一个组，您可以使用 m.group(1) 获得该组

The $ means "end of input" and doesn't belong in this regex.

Instead, do the following:

m = re.search(r'>([^<]*)<', test_string)
if m:
    print m.group(1)

This searches for >, then all the following characters that are not <, and then <. The characters betweens > and < are marked as a group, which you get using m.group(1)

回复收藏 0 原文