如何使用 C#.net 使用正则表达式从字符串中查找所有标签?
我想从输入字符串中找到所有 HTML 标签并删除/替换为一些文本。 假设我有字符串
输入=>
<img align="right" src="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg" /><p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, <a href="http://www.tenrestaurantgroup.com/">Il Giardino Ristorante</a> in Newport Beach.</p>
输出=>
string strSrc="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg";
<p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, http://www.tenrestaurantgroup.com in Newport Beach.</p>
来自上面的字符串
如果找到 标签,那么我想获取该标签的 SRC,
如果找到 标签,那么我想从该标签获取 HREF 。 和所有其他标签一样。
我如何在 C#.net 中使用正则表达式来实现?
I want to find all HTML tags from the input strings and removed/replace with some text.
suppose that I have string
INPUT=>
<img align="right" src="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg" /><p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, <a href="http://www.tenrestaurantgroup.com/">Il Giardino Ristorante</a> in Newport Beach.</p>
OUTPUT=>
string strSrc="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg";
<p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, http://www.tenrestaurantgroup.com in Newport Beach.</p>
From above string
if <IMG>
tag found then I want to get SRC of the tag,
if <A>
tag found then I want get HREF from the tag.
and all other tag as same it is..
How can I achieved using Regex in C#.net?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你真的真的不应该为此,请使用正则表达式。 事实上,使用正则表达式无法完美地解析 HTML。您是否考虑过使用 XML 解析器或 HTML DOM 库?
You really, really shouldn't use regex for this. In fact, parsing HTML cannot be done perfectly with regex. Have you considered using an XML parser or HTML DOM library?
您可以使用 HtmlAgilityPack 来解析(有效/无效)html 并获得您想要的内容。
You can use HtmlAgilityPack for parsing (valid/non valid) html and get what you want.
我同意 Justin 的观点,正则表达式确实不是实现此目的的最佳方法,如果您需要做很多事情,那么 HTML 敏捷性非常值得一看。
话虽如此,下面的表达式会将属性存储到一个组中,您应该能够将它们拉到文本中,同时忽略元素的其余部分。 :
]+)( [^=]+?="(.+?)")*>
希望这有帮助。
I agree with Justin, Regex really isn't the best way to do this, and the HTML Agility is well worth a look if this is something you will need to be doing alot of.
With that said, the expression below will store attributes into a group from where you should be able to pull them into your text while ignoring the rest of the element. :
</?([^ >]+)( [^=]+?="(.+?)")*>
Hope this helps.