如何使用 C#.net 使用正则表达式从字符串中查找所有标签?

发布于 2024-10-20 01:19:59 字数 998 浏览 0 评论 0原文

我想从输入字符串中找到所有 HTML 标签并删除/替换为一些文本。 假设我有字符串
输入=>

<img align="right" src="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg" /><p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, <a href="http://www.tenrestaurantgroup.com/">Il Giardino Ristorante</a> in Newport Beach.</p>

输出=>

string strSrc="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg";

<p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, http://www.tenrestaurantgroup.com in Newport Beach.</p>

来自上面的字符串
如果找到 标签,那么我想获取该标签的 SRC
如果找到 标签,那么我想从该标签获取 HREF 。 和所有其他标签一样。

我如何在 C#.net 中使用正则表达式来实现?

I want to find all HTML tags from the input strings and removed/replace with some text.
suppose that I have string

INPUT=>

<img align="right" src="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg" /><p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, <a href="http://www.tenrestaurantgroup.com/">Il Giardino Ristorante</a> in Newport Beach.</p>

OUTPUT=>

string strSrc="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg";

<p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, http://www.tenrestaurantgroup.com in Newport Beach.</p>

From above string
if <IMG> tag found then I want to get SRC of the tag,
if <A> tag found then I want get HREF from the tag.
and all other tag as same it is..

How can I achieved using Regex in C#.net?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

琉璃繁缕 2024-10-27 01:19:59

你真的真的不应该为此,请使用正则表达式。 事实上,使用正则表达式无法完美地解析 HTML。您是否考虑过使用 XML 解析器或 HTML DOM 库?

You really, really shouldn't use regex for this. In fact, parsing HTML cannot be done perfectly with regex. Have you considered using an XML parser or HTML DOM library?

情栀口红 2024-10-27 01:19:59

您可以使用 HtmlAgilityPack 来解析(有效/无效)html 并获得您想要的内容。

You can use HtmlAgilityPack for parsing (valid/non valid) html and get what you want.

满地尘埃落定 2024-10-27 01:19:59

我同意 Justin 的观点,正则表达式确实不是实现此目的的最佳方法,如果您需要做很多事情,那么 HTML 敏捷性非常值得一看。

话虽如此,下面的表达式会将属性存储到一个组中,您应该能够将它们拉到文本中,同时忽略元素的其余部分。 :

]+)( [^=]+?="(.+?)")*>

希望这有帮助。

I agree with Justin, Regex really isn't the best way to do this, and the HTML Agility is well worth a look if this is something you will need to be doing alot of.

With that said, the expression below will store attributes into a group from where you should be able to pull them into your text while ignoring the rest of the element. :

</?([^ >]+)( [^=]+?="(.+?)")*>

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文