无法返回 href (jSoup) 中的文本

发布于 2024-12-29 08:17:53 字数 471 浏览 3 评论 0原文

这是我用来从下面的 html 片段访问“test”的代码片段。如何从 html 中访问网址 https://www.google.com ？

Elements e = doc.getElementsByAttribute("href");
Iterator<Element> href = e.iterator();
    while ( href.hasNext() ){
    Element link = href.next();
    String text = link.text();
    }



   <a href="javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')">Test</a>

原文

Here is a code snippet I am using to access "test" from below html snippet. How can I access the URL https://www.google.com from within html ?

Elements e = doc.getElementsByAttribute("href");
Iterator<Element> href = e.iterator();
    while ( href.hasNext() ){
    Element link = href.next();
    String text = link.text();
    }



   <a href="javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')">Test</a>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我也只是我 2025-01-05 08:17:53

我不是 Jsoup 专家，但 Jsoup 是一个 html 解析器，你不能用它来解析 javascript 标记内的内容。

因此，您的方法应该是提取

"javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','') ”

使用 Jsoup。

比使用正则表达式来获取内容/url。

回复收藏 0 原文

↙厌世 2025-01-05 08:17:53

HREF 是一个属性，您可以使用 Jsoup 元素的 attr 方法访问它。这将为您提供属性的全部内容，当然，您需要一些模式匹配来检索 URL。

回复收藏 0 原文

羁〃客ぐ 2025-01-05 08:17:53

    String html = "<a href=\"javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')\">Test</a>";
    Document doc = Jsoup.parse(html);
    Element e = doc.select("a[href]").first();
    String href = e.attr("href");   
    String arg[] = href.split("'");
    String url = arg[1];
    // Output: 'https://www.google.com'
    System.out.println(url);

    String html = "<a href=\"javascript:linkToExternalSite('https://www.google.com','','61x38pxls','','','','','')\">Test</a>";
    Document doc = Jsoup.parse(html);
    Element e = doc.select("a[href]").first();
    String href = e.attr("href");   
    String arg[] = href.split("'");
    String url = arg[1];
    // Output: 'https://www.google.com'
    System.out.println(url);

回复收藏 0 原文

~没有更多了~