当前位置：文江博客话题详情

如何在 Java 中获取 HTML

发布于 2024-07-04 15:25:16 字数 50 浏览 7 评论 0原文

在不使用任何外部库的情况下，将网站的 HTML 内容提取到字符串中的最简单方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

挖鼻大婶 2024-07-11 15:25:16

我目前正在使用这个：

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
  scanner.close();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

但不确定是否有更好的方法。

I'm currently using this:

String content = null;
URLConnection connection = null;
try {
  connection =  new URL("http://www.google.com").openConnection();
  Scanner scanner = new Scanner(connection.getInputStream());
  scanner.useDelimiter("\\Z");
  content = scanner.next();
  scanner.close();
}catch ( Exception ex ) {
    ex.printStackTrace();
}
System.out.println(content);

But not sure if there's a better way.

回复收藏 0 原文

友谊不毕业 2024-07-11 15:25:16

它不是库，而是一个名为curl的工具，通常安装在大多数服务器中，或者您可以通过以下方式轻松安装在ubuntu中然后

sudo apt install curl

获取任何html页面并将其存储到本地文件（如示例）

curl https://www.facebook.com/ > fb.html

您将获得主页html。您可以运行它也在您的浏览器中。

Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by

sudo apt install curl

Then fetch any html page and store it to your local file like an example

curl https://www.facebook.com/ > fb.html

You will get the home page html.You can run it in your browser as well.

回复收藏 0 原文

转角预定愛 2024-07-11 15:25:16

这对我来说效果很好：

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

不确定提供的其他解决方案是否更有效。

This has worked well for me:

URL url = new URL(theURL);
InputStream is = url.openStream();
int ptr = 0;
StringBuffer buffer = new StringBuffer();
while ((ptr = is.read()) != -1) {
    buffer.append((char)ptr);
}

Not sure at to whether the other solution(s) provided are any more efficient or not.

回复收藏 0 原文

随风而去 2024-07-11 15:25:16

我刚刚在您的其他帖子中留下了这篇文章，尽管你上面的方法也可能有效。我认为其中任何一个都不比另一个更容易。只需在代码顶部使用 import org.apache.commons.HttpClient 即可访问 Apache 包。

编辑：忘记链接了；）

回复收藏 0 原文

风情万种。 2024-07-11 15:25:16

 try {
        URL u = new URL("https"+':'+'/'+'/'+"www.Samsung.com"+'/'+"in"+'/');
        URLConnection urlconnect = u.openConnection();
        InputStream stream = urlconnect.getInputStream();
        int i;
        while ((i = stream.read()) != -1) {
            System.out.print((char)i);
        }
    }
    catch (Exception e) {
        System.out.println(e);
    }

 try {
        URL u = new URL("https"+':'+'/'+'/'+"www.Samsung.com"+'/'+"in"+'/');
        URLConnection urlconnect = u.openConnection();
        InputStream stream = urlconnect.getInputStream();
        int i;
        while ((i = stream.read()) != -1) {
            System.out.print((char)i);
        }
    }
    catch (Exception e) {
        System.out.println(e);
    }

回复收藏 0 原文

无声无音无过去 2024-07-11 15:25:16

虽然不是普通的 Java，但我将提供一个更简单的解决方案。使用 Groovy ;-)

String siteContent = new URL("http://www.google.com").text

Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)

String siteContent = new URL("http://www.google.com").text

回复收藏 0 原文

~没有更多了~

关于作者

岁月流歌

暂无简介

0 文章

0 评论

24 人气

关注发私信

謌踐踏愛綪

文章 0 评论 0

关注

开始看清了

文章 0 评论 0

关注

高速公鹿

文章 0 评论 0

关注

alipaysp_PLnULTzf66

文章 0 评论 0

关注

热情消退

文章 0 评论 0

关注

白色月光

文章 0 评论 0

友情链接

文江博客

如何在 Java 中获取 HTML

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如何在 Java 中获取 HTML

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

关于作者

相关话题

热门标签

推荐作者

謌踐踏愛綪

开始看清了

高速公鹿

alipaysp_PLnULTzf66

热情消退

白色月光

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。