使用Java获取以下页面的源代码
我正在尝试获取以下页面的源代码: http://www.amazon.com/gp/offer-listing/082470732X/ref=dp_olp_0?ie=UTF8&redirect=true&condition=all (请注意,如果您单击该链接,亚马逊会将您带到另一个页面。要访问我有兴趣阅读的页面,请复制该链接并将其粘贴到浏览器中的空选项卡中。谢谢!)
通常使用 java。 NET API,我几乎可以毫无问题地获取大多数 URL 的源代码,但是对于上面的链接我什么也得不到。事实证明,连接生成的输入流是由 gzip 编码的,所以我尝试了以下操作:
URL url = new URL(urlString);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
InputStream is = urlConnection.getInputStream();
HttpURLConnection.setFollowRedirects(true);
urlConnection.setRequestProperty("Accept-Encoding", "gzip, deflate");
String encoding = urlConnection.getContentEncoding();
if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
is = new GZIPInputStream(is);
} else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
is = new InflaterInputStream((is), new Inflater(true));
}
但是这次我确定性地得到以下错误:
java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:249)
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:239)
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:142)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:67)
at domain.logic.ItemScraper.loadURL(ItemScraper.java:405)
at domain.logic.ItemScraper.main(ItemScraper.java:510)
有人能看到我的错误吗?还有其他方法可以阅读此特定页面吗?有人能解释一下为什么我的浏览器(firefox)可以读取它,但我无法使用 Java 读取源代码吗?
预先感谢,最诚挚的问候,
I am trying to get the source code for the following page: http://www.amazon.com/gp/offer-listing/082470732X/ref=dp_olp_0?ie=UTF8&redirect=true&condition=all
(Please note that Amazon takes you to another page if you click on the link. To get to the page that I am interested in reading please copy the link and paste it to an empty tab in your browser. Thanks!)
Normally using java.net API, I can get the source code for most of the URLs with almost no problem, however for the above link I get nothing. It turned out that the input stream generated by the connection is encoded by gzip, so I tried the following:
URL url = new URL(urlString);
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
InputStream is = urlConnection.getInputStream();
HttpURLConnection.setFollowRedirects(true);
urlConnection.setRequestProperty("Accept-Encoding", "gzip, deflate");
String encoding = urlConnection.getContentEncoding();
if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
is = new GZIPInputStream(is);
} else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
is = new InflaterInputStream((is), new Inflater(true));
}
However this time I get the following error deterministically:
java.io.EOFException
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:249)
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:239)
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:142)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:67)
at domain.logic.ItemScraper.loadURL(ItemScraper.java:405)
at domain.logic.ItemScraper.main(ItemScraper.java:510)
Can anybody see my mistake? Is there another way to read this particular page? Can somebody explain me why my browser (firefox) can read it, however I cannot read the source using Java?
Thanks in advance, best regards,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
而不是
尝试
至于
EOFException
,如果您添加它就会消失。
Instead of
try
As for the
EOFException
, if you addit would go away.
您可以使用标准 BufferedReader 读取给定 URL 的 Web 服务器的响应。
然后使用 ...
... 来获取响应。
You can use a standard BufferedReader to read the response of a webserver of a given URL.
Then use ...
... to get the response.