Htmlunit getByXPath 不返回图像标签

发布于 2025-01-03 17:59:31 字数 854 浏览 5 评论 0原文

我正在尝试搜索特定页面上的所有图像标签。示例页面是 www.chapitre.com

我正在使用以下代码来搜索页面上的所有图像：

HtmlPage page = HTMLParser.parseHtml(webResponse, webClient.openWindow(null,"testwindow"));
List<?> imageList = page.getByXPath("//img");
ListIterator li = imageList.listIterator();

while (li.hasNext() ) {
    HtmlImage image = (HtmlImage)li.next();
    URL url = new URL(image.getSrcAttribute());

    //For now, only load 1X1 pixels
    if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
System.out.println("This is an image: " + url + " from page " + webRequest.getUrl() );
}

}

这不会返回页面中的所有图像标签。例如，具有属性“src=”http://ace-lb.advertising.com/site=703223/mnum=1516/bins=1/rich=0/logs=0/betr=A2099=[+ ]LP2" width="1" height="1"" 应该被捕获，但事实并非如此。我在这里做错了什么吗？

非常感谢任何帮助。

干杯!

原文

I am trying to search all image tags on a specific page. An example page would be www.chapitre.com

I am using the following code to search for all images on the page:

HtmlPage page = HTMLParser.parseHtml(webResponse, webClient.openWindow(null,"testwindow"));
List<?> imageList = page.getByXPath("//img");
ListIterator li = imageList.listIterator();

while (li.hasNext() ) {
    HtmlImage image = (HtmlImage)li.next();
    URL url = new URL(image.getSrcAttribute());

    //For now, only load 1X1 pixels
    if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
System.out.println("This is an image: " + url + " from page " + webRequest.getUrl() );
}

}

This doesn't return me all the image tags in the page. For example, an image tag with attributes "src="http://ace-lb.advertising.com/site=703223/mnum=1516/bins=1/rich=0/logs=0/betr=A2099=[+]LP2" width="1" height="1"" should be captured, but its not. Am I doing something wrong here?

Any help is really appreciated.

Cheers!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我的影子我的梦 2025-01-10 17:59:31

那是因为

URL url = new URL(image.getSrcAttribute());

Is 抛出了一个异常:)

尝试这个代码：

public Main() throws Exception {
    WebClient webClient = new WebClient();
    webClient.setJavaScriptEnabled(false);
    HtmlPage page = webClient.getPage("http://www.chapitre.com");
    List<HtmlImage> imageList = (List<HtmlImage>) page.getByXPath("//img");
    for (HtmlImage image : imageList) {
        try {
            new URL(image.getSrcAttribute());
            if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
                System.out.println(image.getSrcAttribute());
            }
        } catch (Exception e) {
            System.out.println("You didn't see this comming :)");
        }
    }
}

您甚至可以通过 xpath 获取那些 1x1 像素图像。

希望这有帮助。

That's because

URL url = new URL(image.getSrcAttribute());

Is throwing you an exception :)

Try this code:

public Main() throws Exception {
    WebClient webClient = new WebClient();
    webClient.setJavaScriptEnabled(false);
    HtmlPage page = webClient.getPage("http://www.chapitre.com");
    List<HtmlImage> imageList = (List<HtmlImage>) page.getByXPath("//img");
    for (HtmlImage image : imageList) {
        try {
            new URL(image.getSrcAttribute());
            if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
                System.out.println(image.getSrcAttribute());
            }
        } catch (Exception e) {
            System.out.println("You didn't see this comming :)");
        }
    }
}

You can even get those 1x1 pixel images by xpath.

Hope this helps.

回复收藏 0 原文

~没有更多了~