C#中通过正则表达式获取图像的SRC

发布于 2024-10-03 15:36:25 字数 250 浏览 12 评论 0原文

我正在寻找一个正则表达式来隔离 img 的 src 值。 (我知道这不是最好的方法,但这是我在这种情况下必须做的)

我有一个字符串,其中包含简单的 html 代码、一些文本和图像。我需要从该字符串获取 src 属性的值。到目前为止我只能隔离整个标签。

string matchString = Regex.Match(original_text, @"(<img([^>]+)>)").Value;

I'm looking for a regular expression to isolate the src value of an img.
(I know that this is not the best way to do this but this is what I have to do in this case)

I have a string which contains simple html code, some text and an image. I need to get the value of the src attribute from that string. I have managed only to isolate the whole tag till now.

string matchString = Regex.Match(original_text, @"(<img([^>]+)>)").Value;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

划一舟意中人 2024-10-10 15:36:25
string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;
帅冕 2024-10-10 15:36:25

我知道你说你必须使用正则表达式,但如果可能的话我真的会给这个开源项目一个机会:
HtmlAgilityPack

它真的很容易使用,我刚刚发现它,它对我帮助很大,因为我正在做一些较重的 html 解析。它基本上允许您使用 XPATHS 来获取元素。

他们的示例页面有点过时,但 API 确实很容易理解,如果您对 xpath 有点熟悉,您现在就会了解它。

您的查询代码将如下所示:(未编译的代码)

 List<string> imgScrs = new List<string>();
 HtmlDocument doc = new HtmlDocument();
 doc.LoadHtml(htmlText);//or doc.Load(htmlFileStream)
 var nodes = doc.DocumentNode.SelectNodes(@"//img[@src]"); s
 foreach (var img in nodes)
 {
    HtmlAttribute att = img["src"];
    imgScrs.Add(att.Value)
 }

I know you say you have to use regex, but if possible i would really give this open source project a chance:
HtmlAgilityPack

It is really easy to use, I just discovered it and it helped me out a lot, since I was doing some heavier html parsing. It basically lets you use XPATHS to get your elements.

Their example page is a little outdated, but the API is really easy to understand, and if you are a little bit familiar with xpaths you will get head around it in now time

The code for your query would look something like this: (uncompiled code)

 List<string> imgScrs = new List<string>();
 HtmlDocument doc = new HtmlDocument();
 doc.LoadHtml(htmlText);//or doc.Load(htmlFileStream)
 var nodes = doc.DocumentNode.SelectNodes(@"//img[@src]"); s
 foreach (var img in nodes)
 {
    HtmlAttribute att = img["src"];
    imgScrs.Add(att.Value)
 }
你对谁都笑 2024-10-10 15:36:25

我尝试了 Francisco Noriega 的建议,但看起来 HtmlAgilityPack 的 api 已被更改。我是这样解决的:

        List<string> images = new List<string>();
        WebClient client = new WebClient();
        string site = "http://www.mysite.com";
        var htmlText = client.DownloadString(site);

        var htmlDoc = new HtmlDocument()
                    {
                        OptionFixNestedTags = true,
                        OptionAutoCloseOnEnd = true
                    };

        htmlDoc.LoadHtml(htmlText);

        foreach (HtmlNode img in htmlDoc.DocumentNode.SelectNodes("//img"))
        {
            HtmlAttribute att = img.Attributes["src"];
            images.Add(att.Value);
        }

I tried what Francisco Noriega suggested, but it looks that the api to the HtmlAgilityPack has been altered. Here is how I solved it:

        List<string> images = new List<string>();
        WebClient client = new WebClient();
        string site = "http://www.mysite.com";
        var htmlText = client.DownloadString(site);

        var htmlDoc = new HtmlDocument()
                    {
                        OptionFixNestedTags = true,
                        OptionAutoCloseOnEnd = true
                    };

        htmlDoc.LoadHtml(htmlText);

        foreach (HtmlNode img in htmlDoc.DocumentNode.SelectNodes("//img"))
        {
            HtmlAttribute att = img.Attributes["src"];
            images.Add(att.Value);
        }
宛菡 2024-10-10 15:36:25

这应该捕获所有 img 标签和仅 src 部分,无论其位于何处(在类之前或之后等)并支持 html/xhtml :D

<img.+?src="(.+?)".+?/?>

This should capture all img tags and just the src part no matter where its located (before or after class etc) and supports html/xhtml :D

<img.+?src="(.+?)".+?/?>
北城半夏 2024-10-10 15:36:25

您想要的正则表达式应该类似于:

(<img.*?src="([^"])".*?>)

希望这有帮助。

The regex you want should be along the lines of:

(<img.*?src="([^"])".*?>)

Hope this helps.

悲念泪 2024-10-10 15:36:25

您还可以使用后视来完成此操作,而无需拉出一组,

(?<=<img.*?src=")[^"]*

请记住在需要时转义引号

you can also use a look behind to do it without needing to pull out a group

(?<=<img.*?src=")[^"]*

remember to escape the quotes if needed

请持续率性 2024-10-10 15:36:25

这就是我用来从字符串中获取标签的方法:

</? *img[^>]*>

This is what I use to get the tags out of strings:

</? *img[^>]*>
温折酒 2024-10-10 15:36:25

这是我使用的:

<img.*?src\s*?=\s*?(?:(['"])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))[^>]*?>

好的部分是它匹配以下任何一个:

<img src='test.jpg'>
<img src=test.jpg>
<img src="test.jpg">

它还可以匹配一些意想不到的场景,例如额外的属性,例如:

<img src = "test.jpg" width="300">

Here is the one I use:

<img.*?src\s*?=\s*?(?:(['"])(?<src>(?:(?!\1).)*)\1|(?<src>[^\s>]+))[^>]*?>

The good part is that it matches any of the below:

<img src='test.jpg'>
<img src=test.jpg>
<img src="test.jpg">

And it can also match some unexpected scenarios like extra attributes, e.g:

<img src = "test.jpg" width="300">
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文