java httpurlconnection 切断html
嘿,我正在尝试从 Twitter 个人资料页面获取 html,但 httpurlconnection 仅返回 html 的一小段。我的代码
for(int i = 0; i < urls.size(); i++)
{
URL url = new URL(urls.get(i));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
System.out.println(connection.getResponseCode());
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null)
{
builder.append(line);
}
String html = builder.toString();
}
我总是得到 200 作为每次调用的响应代码。然而,大约 1/3 的时间返回整个 html 文档,另一半只返回前几百行。 html 被截断时返回的金额并不总是相同的。
有什么想法吗?感谢您的帮助!
附加信息:查看标题后,我似乎收到了重复的内容长度标题。第一个是完整长度,另一个要短得多(可能代表我有时得到的长度)如何处理重复的标头?
Hey, I'm trying to get the html from a twitter profile page, but httpurlconnection is only returning a small snippet of the html. My code
for(int i = 0; i < urls.size(); i++)
{
URL url = new URL(urls.get(i));
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("User-Agent","Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6");
System.out.println(connection.getResponseCode());
String line;
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
while((line = reader.readLine()) != null)
{
builder.append(line);
}
String html = builder.toString();
}
I always get 200 as the response code for each call. However about 1/3 of the time the entire html document is returned, and the other half only the first few hundred lines. The amount returned when the html is cutoff is not always the same.
Any ideas? Thanks for any help!
Additional Info: After viewing the headers it seems I'm getting duplicate content-length headers. The first is the full length, the other is much shorter (and probably representative of the length I'm getting some of the time) How can I handle duplicate headers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这对我来说效果很好,我在
builder.append(line);
之后添加了一个换行符,以使其在控制台中更具可读性,但除此之外,它返回了该页面的所有 HTML:This worked fine for me, I added a newline after
builder.append(line);
to make it more readable in the console, but other than that it returned all the HTML for this page:查看我的 HTTP 类
基于此 API。随意改变一些东西。
Check out my HTTP class
based on this API. Feel free to change some stuff.