HTML Agility Pack 问题(尝试从源代码解析字符串)
我正在尝试使用敏捷包来解析各个页面中的某些信息。我有点担心使用它可能会超出我的需要,如果是这种情况请随时告诉我。不管怎样,我正在尝试解析来自杂牌傻瓜的页面,以根据股票代码获取公司的名称。我将解析几个页面以类似的方式获取股票信息。
我想要解析的 HTML 看起来像:
<h1 class="subHead">
Microsoft Corp <span>(NASDAQ:MSFT)</span>
</h1>
另外,我想要解析的页面是: http:// /caps.fool.com/Ticker/MSFT.aspx
所以,我想我的问题是如何从 html 中简单地获取 Microsoft Corp,我是否应该使用敏捷包来做这样的事情?
编辑:当前代码
public String getStockName(String ticker)
{
String text ="";
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://caps.fool.com/Ticker/" + ticker + ".aspx");
var node = doc.DocumentNode.SelectSingleNode("/h1[@class='subHead']");
text = node.FirstChild.InnerText.Trim();
return text;
}
I am attempting to use the Agility pack to parse certain bits of info from various pages. I am kind of worried that using this might be overkill for what I need, if that is case feel free to let me know. Anyway, I am attempting to parse a page from motley fool to get the name of a company based on the ticker. I will be parsing several pages to get stock info in a similar way.
The HTML that I want to parse looks like:
<h1 class="subHead">
Microsoft Corp <span>(NASDAQ:MSFT)</span>
</h1>
Also, the page I want to parse is: http://caps.fool.com/Ticker/MSFT.aspx
So, I guess my question is how do I simply get the Microsoft Corp from the html and should I even be using the agility pack to do things like this?
Edit: Current code
public String getStockName(String ticker)
{
String text ="";
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("http://caps.fool.com/Ticker/" + ticker + ".aspx");
var node = doc.DocumentNode.SelectSingleNode("/h1[@class='subHead']");
text = node.FirstChild.InnerText.Trim();
return text;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这将为您提供所有股票名称的列表,仅适用于您的 Microsoft 示例 Html:
根据更新的问题编辑 - 这应该适合您:
This would give you a list of all stock names, for your sample Html just of Microsoft:
Edit based on updated question - this should work for you:
使用 xpath 表达式选择元素,然后拾取文本。
Use an xpath expression to select the element then pickup the text.