从存储为字符串的HTML获取图像URL

发布于 2025-02-07 22:22:07 字数 1652 浏览 0 评论 0原文

我想从文本中获取所有图像链接并将其保存到列表中。问题本身并不困难,但是我想知道如何以更有效的方式做到这一点。如果HTML代码很长,我的程序需要很长时间才能运行。我当前的代码:

string html ="<h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'><h1>This is heading 1</h1><img src='/media/w3schools2.jpg'><p>This is another paragraph.</p><p>This is another paragraph.</p><p>This is another paragraph.</p><h1>This is heading 1</h1><h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'>";
            
string tempHtml = html;
List<string> imagesUrl = new List<string>();

while (true)
{
  if (!tempHtml.Contains("src"))
  {
    break;
  }

  //get first index of src
  var index = tempHtml.IndexOf("src");

  //get index where image src was started
  var startIndex = index + 5;

  //remove text before image src
  tempHtml = tempHtml.Substring(startIndex, tempHtml.Length-startIndex);

  //find and of image src
  var nextIndex = tempHtml.IndexOf("'");

  //add image src to list
  imagesUrl.Add(tempHtml.Substring(0, nextIndex));

  //get string from end of image src
  tempHtml = tempHtml.Substring(nextIndex,tempHtml.Length-nextIndex);
}

foreach (var item in imagesUrl)
{
   html = html.Replace(item, $"ns.{item}");
}

可能会提高代码效率吗?

I would like to get all image links from the text and save them to a list.Then I have to add a prefix to each src and replace it in the html code. The problem itself is not that difficult, but I wonder how I could do it in a more effective way. If the html code is long, my program takes a long time to run. My current code:

string html ="<h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'><h1>This is heading 1</h1><img src='/media/w3schools2.jpg'><p>This is another paragraph.</p><p>This is another paragraph.</p><p>This is another paragraph.</p><h1>This is heading 1</h1><h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'>";
            
string tempHtml = html;
List<string> imagesUrl = new List<string>();

while (true)
{
  if (!tempHtml.Contains("src"))
  {
    break;
  }

  //get first index of src
  var index = tempHtml.IndexOf("src");

  //get index where image src was started
  var startIndex = index + 5;

  //remove text before image src
  tempHtml = tempHtml.Substring(startIndex, tempHtml.Length-startIndex);

  //find and of image src
  var nextIndex = tempHtml.IndexOf("'");

  //add image src to list
  imagesUrl.Add(tempHtml.Substring(0, nextIndex));

  //get string from end of image src
  tempHtml = tempHtml.Substring(nextIndex,tempHtml.Length-nextIndex);
}

foreach (var item in imagesUrl)
{
   html = html.Replace(item, 
quot;ns.{item}");
}

It is possible make code more efficient?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文