C#中通过正则表达式获取图像的SRC

发布于 2024-10-03 15:36:25 字数 250 浏览 12 评论 0原文

我正在寻找一个正则表达式来隔离 img 的 src 值。（我知道这不是最好的方法，但这是我在这种情况下必须做的）

我有一个字符串，其中包含简单的 html 代码、一些文本和图像。我需要从该字符串获取 src 属性的值。到目前为止我只能隔离整个标签。

string matchString = Regex.Match(original_text, @"(<img([^>]+)>)").Value;

原文

I'm looking for a regular expression to isolate the src value of an img.
(I know that this is not the best way to do this but this is what I have to do in this case)

I have a string which contains simple html code, some text and an image. I need to get the value of the src attribute from that string. I have managed only to isolate the whole tag till now.

string matchString = Regex.Match(original_text, @"(<img([^>]+)>)").Value;

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

划一舟意中人 2024-10-10 15:36:25

string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

回复收藏 0 原文

帅冕 2024-10-10 15:36:25

我知道你说你必须使用正则表达式，但如果可能的话我真的会给这个开源项目一个机会：
HtmlAgilityPack

它真的很容易使用，我刚刚发现它，它对我帮助很大，因为我正在做一些较重的 html 解析。它基本上允许您使用 XPATHS 来获取元素。

他们的示例页面有点过时，但 API 确实很容易理解，如果您对 xpath 有点熟悉，您现在就会了解它。

您的查询代码将如下所示：（未编译的代码）

 List<string> imgScrs = new List<string>();
 HtmlDocument doc = new HtmlDocument();
 doc.LoadHtml(htmlText);//or doc.Load(htmlFileStream)
 var nodes = doc.DocumentNode.SelectNodes(@"//img[@src]"); s
 foreach (var img in nodes)
 {
    HtmlAttribute att = img["src"];
    imgScrs.Add(att.Value)
 }

I know you say you have to use regex, but if possible i would really give this open source project a chance:
HtmlAgilityPack

It is really easy to use, I just discovered it and it helped me out a lot, since I was doing some heavier html parsing. It basically lets you use XPATHS to get your elements.

Their example page is a little outdated, but the API is really easy to understand, and if you are a little bit familiar with xpaths you will get head around it in now time

The code for your query would look something like this: (uncompiled code)

 List<string> imgScrs = new List<string>();
 HtmlDocument doc = new HtmlDocument();
 doc.LoadHtml(htmlText);//or doc.Load(htmlFileStream)
 var nodes = doc.DocumentNode.SelectNodes(@"//img[@src]"); s
 foreach (var img in nodes)
 {
    HtmlAttribute att = img["src"];
    imgScrs.Add(att.Value)
 }

回复收藏 0 原文

你对谁都笑 2024-10-10 15:36:25

我尝试了 Francisco Noriega 的建议，但看起来 HtmlAgilityPack 的 api 已被更改。我是这样解决的：

        List<string> images = new List<string>();
        WebClient client = new WebClient();
        string site = "http://www.mysite.com";
        var htmlText = client.DownloadString(site);

        var htmlDoc = new HtmlDocument()
                    {
                        OptionFixNestedTags = true,
                        OptionAutoCloseOnEnd = true
                    };

        htmlDoc.LoadHtml(htmlText);

        foreach (HtmlNode img in htmlDoc.DocumentNode.SelectNodes("//img"))
        {
            HtmlAttribute att = img.Attributes["src"];
            images.Add(att.Value);
        }

I tried what Francisco Noriega suggested, but it looks that the api to the HtmlAgilityPack has been altered. Here is how I solved it:

        List<string> images = new List<string>();
        WebClient client = new WebClient();
        string site = "http://www.mysite.com";
        var htmlText = client.DownloadString(site);

        var htmlDoc = new HtmlDocument()
                    {
                        OptionFixNestedTags = true,
                        OptionAutoCloseOnEnd = true
                    };

        htmlDoc.LoadHtml(htmlText);

        foreach (HtmlNode img in htmlDoc.DocumentNode.SelectNodes("//img"))
        {
            HtmlAttribute att = img.Attributes["src"];
            images.Add(att.Value);
        }

回复收藏 0 原文

宛菡 2024-10-10 15:36:25

这应该捕获所有 img 标签和仅 src 部分，无论其位于何处（在类之前或之后等）并支持 html/xhtml :D

<img.+?src="(.+?)".+?/?>

This should capture all img tags and just the src part no matter where its located (before or after class etc) and supports html/xhtml :D

<img.+?src="(.+?)".+?/?>

回复收藏 0 原文

北城半夏 2024-10-10 15:36:25

您想要的正则表达式应该类似于：

(<img.*?src="([^"])".*?>)

希望这有帮助。

The regex you want should be along the lines of:

(<img.*?src="([^"])".*?>)

Hope this helps.

回复收藏 0 原文

悲念泪 2024-10-10 15:36:25

您还可以使用后视来完成此操作，而无需拉出一组，

(?<=<img.*?src=")[^"]*

请记住在需要时转义引号

you can also use a look behind to do it without needing to pull out a group

(?<=<img.*?src=")[^"]*

remember to escape the quotes if needed

回复收藏 0 原文

请持续率性 2024-10-10 15:36:25

这就是我用来从字符串中获取标签的方法：

</? *img[^>]*>

This is what I use to get the tags out of strings:

</? *img[^>]*>

回复收藏 0 原文

温折酒 2024-10-10 15:36:25

这是我使用的：

<img.*?src\s*?=\s*?(?:(['"])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))[^>]*?>

好的部分是它匹配以下任何一个：

<img src='test.jpg'>
<img src=test.jpg>
<img src="test.jpg">

它还可以匹配一些意想不到的场景，例如额外的属性，例如：

<img src = "test.jpg" width="300">

Here is the one I use:

<img.*?src\s*?=\s*?(?:(['"])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))[^>]*?>

The good part is that it matches any of the below:

<img src='test.jpg'>
<img src=test.jpg>
<img src="test.jpg">

And it can also match some unexpected scenarios like extra attributes, e.g:

<img src = "test.jpg" width="300">

回复收藏 0 原文

~没有更多了~

关于作者

凹づ凸ル

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

C#中通过正则表达式获取图像的SRC

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

C#中通过正则表达式获取图像的SRC

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。