从存储为字符串的HTML获取图像URL

发布于 2025-02-07 22:22:07 字数 1652 浏览 0 评论 0原文

我想从文本中获取所有图像链接并将其保存到列表中。问题本身并不困难，但是我想知道如何以更有效的方式做到这一点。如果HTML代码很长，我的程序需要很长时间才能运行。我当前的代码：

string html ="<h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'><h1>This is heading 1</h1><img src='/media/w3schools2.jpg'><p>This is another paragraph.</p><p>This is another paragraph.</p><p>This is another paragraph.</p><h1>This is heading 1</h1><h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'>";
            
string tempHtml = html;
List<string> imagesUrl = new List<string>();

while (true)
{
  if (!tempHtml.Contains("src"))
  {
    break;
  }

  //get first index of src
  var index = tempHtml.IndexOf("src");

  //get index where image src was started
  var startIndex = index + 5;

  //remove text before image src
  tempHtml = tempHtml.Substring(startIndex, tempHtml.Length-startIndex);

  //find and of image src
  var nextIndex = tempHtml.IndexOf("'");

  //add image src to list
  imagesUrl.Add(tempHtml.Substring(0, nextIndex));

  //get string from end of image src
  tempHtml = tempHtml.Substring(nextIndex,tempHtml.Length-nextIndex);
}

foreach (var item in imagesUrl)
{
   html = html.Replace(item, $"ns.{item}");
}

可能会提高代码效率吗？

原文

I would like to get all image links from the text and save them to a list.Then I have to add a prefix to each src and replace it in the html code. The problem itself is not that difficult, but I wonder how I could do it in a more effective way. If the html code is long, my program takes a long time to run. My current code:

string html ="<h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'><h1>This is heading 1</h1><img src='/media/w3schools2.jpg'><p>This is another paragraph.</p><p>This is another paragraph.</p><p>This is another paragraph.</p><h1>This is heading 1</h1><h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'>";
            
string tempHtml = html;
List<string> imagesUrl = new List<string>();

while (true)
{
  if (!tempHtml.Contains("src"))
  {
    break;
  }

  //get first index of src
  var index = tempHtml.IndexOf("src");

  //get index where image src was started
  var startIndex = index + 5;

  //remove text before image src
  tempHtml = tempHtml.Substring(startIndex, tempHtml.Length-startIndex);

  //find and of image src
  var nextIndex = tempHtml.IndexOf("'");

  //add image src to list
  imagesUrl.Add(tempHtml.Substring(0, nextIndex));

  //get string from end of image src
  tempHtml = tempHtml.Substring(nextIndex,tempHtml.Length-nextIndex);
}

foreach (var item in imagesUrl)
{
   html = html.Replace(item, quot;ns.{item}");
}

It is possible make code more efficient?

分享到QQ

分享到微博