从html文件中获取链接
我使用 htmlcleaner 来解析 HTML 文件。这是 html 文件的示例。
.......<div class="name"><a href="http://example.com">Name</a></div>;......
我在代码中使用此构造得到了单词 Name
HtmlCleaner cleaner = new HtmlCleaner();
CleanerProperties props = cleaner.getProperties();
props.setAllowHtmlInsideAttributes(true);
props.setAllowMultiWordAttributes(true);
props.setRecognizeUnicodeChars(true);
props.setOmitComments(true);
rootNode = cleaner.clean(htmlPage);
TagNode linkElements[] = rootNode.getElementsByName("div",true);
for (int i = 0; linkElements != null && i < linkElements.length; i++)
{
String classType = linkElements.getAttributeByName("name");
if (classType != null)
{
if(classType.equals(class)&& classType.equals(CSSClassname)) { linkList.add(linkElements); }
}
System.out.println("TagNode" + linkElements.getText());
linkList.add(linkElements);
}
and then add all of this name's to listview using
TagNode=linkelements.getText().toString()
;
但我不明白如何在我的示例中获取链接。我想获取链接 http://exxample.com 但我不知道该怎么做。
请帮我。我阅读了教程并使用了该功能,但不能。
PS抱歉我的英语不好
I use htmlcleaner to parse HTML files. here is example of an html file.
.......<div class="name"><a href="http://example.com">Name</a></div>;......
I get the word Name
using this construction in my code
HtmlCleaner cleaner = new HtmlCleaner();
CleanerProperties props = cleaner.getProperties();
props.setAllowHtmlInsideAttributes(true);
props.setAllowMultiWordAttributes(true);
props.setRecognizeUnicodeChars(true);
props.setOmitComments(true);
rootNode = cleaner.clean(htmlPage);
TagNode linkElements[] = rootNode.getElementsByName("div",true);
for (int i = 0; linkElements != null && i < linkElements.length; i++)
{
String classType = linkElements.getAttributeByName("name");
if (classType != null)
{
if(classType.equals(class)&& classType.equals(CSSClassname)) { linkList.add(linkElements); }
}
System.out.println("TagNode" + linkElements.getText());
linkList.add(linkElements);
}
and then add all of this name's to listview using
TagNode=linkelements.getText().toString()
;
But I don't understand how I can get the link in my example. I want to get the link http://exxample.com but I don't know what to do.
Please help me. I read the tutorial and used the function but can't.
P.S. Sorry for my bad English
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不使用 HtmlCleaner,但根据 javadoc 你这样做这样:
PS:你发布了明显无法编译的代码
PPS:为什么不使用一些从 html 创建普通 DOM 树的库呢?这样您就可以使用众所周知的 API 来处理已解析的文档。
I don't use HtmlCleaner, but according to the javadoc you do it this way:
P.S.: you posted clearly uncompilable code
P.P.S.: why don't you use some library that creates an ordinary DOM tree from html? This way you'll be able to work with parsed document using a common-known API.