从html文件中获取链接

发布于 2024-12-05 03:27:33 字数 1438 浏览 5 评论 0原文

我使用 htmlcleaner 来解析 HTML 文件。这是 html 文件的示例。

.......<div class="name"><a href="http://example.com">Name</a></div>;......

我在代码中使用此构造得到了单词 Name

HtmlCleaner cleaner = new HtmlCleaner();
            CleanerProperties props = cleaner.getProperties();
            props.setAllowHtmlInsideAttributes(true);
            props.setAllowMultiWordAttributes(true);
            props.setRecognizeUnicodeChars(true);
            props.setOmitComments(true);
            rootNode = cleaner.clean(htmlPage);
TagNode linkElements[] = rootNode.getElementsByName("div",true);
            for (int i = 0; linkElements != null && i < linkElements.length; i++)
            {
            String classType = linkElements.getAttributeByName("name");
              if (classType != null)
              {
                  if(classType.equals(class)&& classType.equals(CSSClassname)) {  linkList.add(linkElements); }
                }

                System.out.println("TagNode" + linkElements.getText());
               linkList.add(linkElements);
            }
            and then add all of this name's to listview using
TagNode=linkelements.getText().toString()

；

但我不明白如何在我的示例中获取链接。我想获取链接 http://exxample.com 但我不知道该怎么做。

请帮我。我阅读了教程并使用了该功能，但不能。

PS抱歉我的英语不好

原文

I use htmlcleaner to parse HTML files. here is example of an html file.

.......<div class="name"><a href="http://example.com">Name</a></div>;......

I get the word Name using this construction in my code

HtmlCleaner cleaner = new HtmlCleaner();
            CleanerProperties props = cleaner.getProperties();
            props.setAllowHtmlInsideAttributes(true);
            props.setAllowMultiWordAttributes(true);
            props.setRecognizeUnicodeChars(true);
            props.setOmitComments(true);
            rootNode = cleaner.clean(htmlPage);
TagNode linkElements[] = rootNode.getElementsByName("div",true);
            for (int i = 0; linkElements != null && i < linkElements.length; i++)
            {
            String classType = linkElements.getAttributeByName("name");
              if (classType != null)
              {
                  if(classType.equals(class)&& classType.equals(CSSClassname)) {  linkList.add(linkElements); }
                }

                System.out.println("TagNode" + linkElements.getText());
               linkList.add(linkElements);
            }
            and then add all of this name's to listview using
TagNode=linkelements.getText().toString()

;

But I don't understand how I can get the link in my example. I want to get the link http://exxample.com but I don't know what to do.

Please help me. I read the tutorial and used the function but can't.

P.S. Sorry for my bad English

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

少女七分熟 2024-12-12 03:27:33

我不使用 HtmlCleaner，但根据 javadoc 你这样做这样：

List<String> links = new ArrayList<String> ();
for (TagNode aTag : linkElements[i].getElementListByName ("a", false))
{
    String link = aTag.getAttributeByName ("href");
    if (link != null && link.length () > 0) links.add (link);
}

PS：你发布了明显无法编译的代码
PPS：为什么不使用一些从 html 创建普通 DOM 树的库呢？这样您就可以使用众所周知的 API 来处理已解析的文档。

I don't use HtmlCleaner, but according to the javadoc you do it this way:

List<String> links = new ArrayList<String> ();
for (TagNode aTag : linkElements[i].getElementListByName ("a", false))
{
    String link = aTag.getAttributeByName ("href");
    if (link != null && link.length () > 0) links.add (link);
}

P.S.: you posted clearly uncompilable code
P.P.S.: why don't you use some library that creates an ordinary DOM tree from html? This way you'll be able to work with parsed document using a common-known API.

回复收藏 0 原文

~没有更多了~