从存储为字符串的HTML获取图像URL
我想从文本中获取所有图像链接并将其保存到列表中。问题本身并不困难,但是我想知道如何以更有效的方式做到这一点。如果HTML代码很长,我的程序需要很长时间才能运行。我当前的代码:
string html ="<h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'><h1>This is heading 1</h1><img src='/media/w3schools2.jpg'><p>This is another paragraph.</p><p>This is another paragraph.</p><p>This is another paragraph.</p><h1>This is heading 1</h1><h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'>";
string tempHtml = html;
List<string> imagesUrl = new List<string>();
while (true)
{
if (!tempHtml.Contains("src"))
{
break;
}
//get first index of src
var index = tempHtml.IndexOf("src");
//get index where image src was started
var startIndex = index + 5;
//remove text before image src
tempHtml = tempHtml.Substring(startIndex, tempHtml.Length-startIndex);
//find and of image src
var nextIndex = tempHtml.IndexOf("'");
//add image src to list
imagesUrl.Add(tempHtml.Substring(0, nextIndex));
//get string from end of image src
tempHtml = tempHtml.Substring(nextIndex,tempHtml.Length-nextIndex);
}
foreach (var item in imagesUrl)
{
html = html.Replace(item, $"ns.{item}");
}
可能会提高代码效率吗?
I would like to get all image links from the text and save them to a list.Then I have to add a prefix to each src and replace it in the html code. The problem itself is not that difficult, but I wonder how I could do it in a more effective way. If the html code is long, my program takes a long time to run. My current code:
string html ="<h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'><h1>This is heading 1</h1><img src='/media/w3schools2.jpg'><p>This is another paragraph.</p><p>This is another paragraph.</p><p>This is another paragraph.</p><h1>This is heading 1</h1><h1>This is heading 1</h1><p>This is another paragraph.</p><a href='https://www.w3schools.com'>This is a link</a><img src='/media/w3schools.jpg' alt='W3Schools.com' width='104' height='142'>";
string tempHtml = html;
List<string> imagesUrl = new List<string>();
while (true)
{
if (!tempHtml.Contains("src"))
{
break;
}
//get first index of src
var index = tempHtml.IndexOf("src");
//get index where image src was started
var startIndex = index + 5;
//remove text before image src
tempHtml = tempHtml.Substring(startIndex, tempHtml.Length-startIndex);
//find and of image src
var nextIndex = tempHtml.IndexOf("'");
//add image src to list
imagesUrl.Add(tempHtml.Substring(0, nextIndex));
//get string from end of image src
tempHtml = tempHtml.Substring(nextIndex,tempHtml.Length-nextIndex);
}
foreach (var item in imagesUrl)
{
html = html.Replace(item, quot;ns.{item}");
}
It is possible make code more efficient?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论