读取网页内容

发布于 2024-11-10 16:08:22 字数 407 浏览 8 评论 0原文

你好我想使用java读取包含德语字符的网页内容，不幸的是，德语字符显示为奇怪的字符。请提供任何帮助这是我的代码：

String link = "some german link";

            URL url = new URL(link);
            BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
            String inputLine;
            while ((inputLine = in.readLine()) != null) {
                System.out.println(inputLine);
            }

原文

Hi
I want to read the content of a web page that contains a German characters using java , unfortunately , the German characters appear as strange characters .
Any help please
here is my code:

String link = "some german link";

            URL url = new URL(link);
            BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
            String inputLine;
            while ((inputLine = in.readLine()) != null) {
                System.out.println(inputLine);
            }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

余罪 2024-11-17 16:08:23

您需要为您的InputStreamReader指定字符集，例如

InputStreamReader(url.openStream(), "UTF-8")

You need to specify the character set for your InputStreamReader, like

InputStreamReader(url.openStream(), "UTF-8")

回复收藏 0 原文

若水微香 2024-11-17 16:08:23

您必须设置正确的编码。您可以在 HTTP 标头中找到编码：

Content-Type: text/html; charset=ISO-8859-1

这可能会在 (X)HTML 文档中被覆盖，请参阅 HTML 字符编码

我可以想象，您必须考虑许多不同的附加问题才能无错误地解析网页。但有不同的 HTTP 客户端库可用于 Java，例如 org.apache.httpcomponents。代码将如下所示：

DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://www.spiegel.de");

try
{
  HttpResponse response = httpclient.execute(httpGet);
  HttpEntity entity = response.getEntity();
  if (entity != null)
  {
    System.out.println(EntityUtils.toString(entity));
  }
}
catch (ClientProtocolException e) {e.printStackTrace();}
catch (IOException e) {e.printStackTrace();}

这是 Maven 工件：

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.1.1</version>
  <type>jar</type>
  <scope>compile</scope>
</dependency>

You have to set the correct encoding. You can find the encoding in the HTTP header:

Content-Type: text/html; charset=ISO-8859-1

This may be overwritten in the (X)HTML document, see HTML Character encodings

I can imagine that you have to consider many different additional issues to pars a web page error free. But there are different HTTP client libraries available for Java, e.g. org.apache.httpcomponents. The code will look like this:

DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://www.spiegel.de");

try
{
  HttpResponse response = httpclient.execute(httpGet);
  HttpEntity entity = response.getEntity();
  if (entity != null)
  {
    System.out.println(EntityUtils.toString(entity));
  }
}
catch (ClientProtocolException e) {e.printStackTrace();}
catch (IOException e) {e.printStackTrace();}

This is the maven artifact:

<dependency>
  <groupId>org.apache.httpcomponents</groupId>
  <artifactId>httpclient</artifactId>
  <version>4.1.1</version>
  <type>jar</type>
  <scope>compile</scope>
</dependency>

回复收藏 0 原文

花开雨落又逢春i 2024-11-17 16:08:23

尝试设置一个字符集。

new BufferedReader(new InputStreamReader(url.openStream(), Charset.forName("UTF-8") ));

Try to set an Charset.

new BufferedReader(new InputStreamReader(url.openStream(), Charset.forName("UTF-8") ));

回复收藏 0 原文

小女人ら 2024-11-17 16:08:23

首先，验证您使用的字体是否可以支持您尝试显示的特定德语字符。许多字体并不包含所有字符，当这是一个简单的“丢失字符”问题时，寻找其他原因是一个很大的痛苦。

如果这不是问题，那么您输入或输出的字符集是错误的。字符集决定了代表字符的数字如何映射到字形（或代表字符的图片）。 Java内部通常使用UTF-8；所以输出流可能不是问题。检查输入流。

回复收藏 0 原文

~没有更多了~

关于作者

┾廆蒐ゝ

暂无简介

文章

29 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

读取网页内容

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

读取网页内容

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。