使用Java获取以下页面的源代码

发布于 2024-11-03 13:38:53 字数 1688 浏览 0 评论 0原文

我正在尝试获取以下页面的源代码： http://www.amazon.com/gp/offer-listing/082470732X/ref=dp_olp_0?ie=UTF8&redirect=true&condition=all （请注意，如果您单击该链接，亚马逊会将您带到另一个页面。要访问我有兴趣阅读的页面，请复制该链接并将其粘贴到浏览器中的空选项卡中。谢谢！）

通常使用 java。 NET API，我几乎可以毫无问题地获取大多数 URL 的源代码，但是对于上面的链接我什么也得不到。事实证明，连接生成的输入流是由 gzip 编码的，所以我尝试了以下操作：

URL url = new URL(urlString);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
InputStream is = urlConnection.getInputStream();
HttpURLConnection.setFollowRedirects(true);
urlConnection.setRequestProperty("Accept-Encoding", "gzip, deflate");
String encoding = urlConnection.getContentEncoding();
if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
     is = new GZIPInputStream(is);
} else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
     is = new InflaterInputStream((is), new Inflater(true));
}

但是这次我确定性地得到以下错误：

java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:249)
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:239)
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:142)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:67)
at domain.logic.ItemScraper.loadURL(ItemScraper.java:405)
at domain.logic.ItemScraper.main(ItemScraper.java:510)

有人能看到我的错误吗？还有其他方法可以阅读此特定页面吗？有人能解释一下为什么我的浏览器（firefox）可以读取它，但我无法使用 Java 读取源代码吗？

预先感谢，最诚挚的问候，

原文

I am trying to get the source code for the following page: http://www.amazon.com/gp/offer-listing/082470732X/ref=dp_olp_0?ie=UTF8&redirect=true&condition=all
(Please note that Amazon takes you to another page if you click on the link. To get to the page that I am interested in reading please copy the link and paste it to an empty tab in your browser. Thanks!)

Normally using java.net API, I can get the source code for most of the URLs with almost no problem, however for the above link I get nothing. It turned out that the input stream generated by the connection is encoded by gzip, so I tried the following:

URL url = new URL(urlString);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
InputStream is = urlConnection.getInputStream();
HttpURLConnection.setFollowRedirects(true);
urlConnection.setRequestProperty("Accept-Encoding", "gzip, deflate");
String encoding = urlConnection.getContentEncoding();
if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
     is = new GZIPInputStream(is);
} else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
     is = new InflaterInputStream((is), new Inflater(true));
}

However this time I get the following error deterministically:

java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:249)
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:239)
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:142)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:67)
at domain.logic.ItemScraper.loadURL(ItemScraper.java:405)
at domain.logic.ItemScraper.main(ItemScraper.java:510)

Can anybody see my mistake? Is there another way to read this particular page? Can somebody explain me why my browser (firefox) can read it, however I cannot read the source using Java?

Thanks in advance, best regards,

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幸福％小乖 2024-11-10 13:38:53

而不是

is = new GZIPInputStream(is);

尝试

is = new GZIPInputStream(urlConnection.getInputStream());

至于EOFException，如果您添加

urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24");

它就会消失。

Instead of

is = new GZIPInputStream(is);

try

is = new GZIPInputStream(urlConnection.getInputStream());

As for the EOFException, if you add

urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24");

it would go away.

回复收藏 0 原文

水晶透心 2024-11-10 13:38:53

您可以使用标准 BufferedReader 读取给定 URL 的 Web 服务器的响应。

URLIn = new BufferedReader(new InputStreamReader(new URL(URLOrFilename).openStream()));

然后使用 ...

while ((incomingLine = URLIn.readLine()) != null) {
 ...
}

... 来获取响应。

You can use a standard BufferedReader to read the response of a webserver of a given URL.

URLIn = new BufferedReader(new InputStreamReader(new URL(URLOrFilename).openStream()));

Then use ...

while ((incomingLine = URLIn.readLine()) != null) {
 ...
}

... to get the response.

回复收藏 0 原文

~没有更多了~

关于作者

メ斷腸人バ

暂无简介

0 文章

0 评论

23 人气

关注发私信

lorenzathorton8

文章 0 评论 0

关注

Zero

文章 0 评论 0

关注

萧瑟寒风

文章 0 评论 0

关注

mylayout

文章 0 评论 0

关注

tkewei

文章 0 评论 0

关注

17818769742

文章 0 评论 0

友情链接

文江博客

使用Java获取以下页面的源代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

使用Java获取以下页面的源代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

lorenzathorton8

Zero

萧瑟寒风

mylayout

tkewei

17818769742

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。