如何将大网页加载到字符串中

发布于 2024-12-03 09:21:24 字数 995 浏览 0 评论 0原文

我是 Java 和 Android 新手,但不是编程和 HTTP 新手。此 HTTP GET 方法主要是从使用 Apache HTTP 类的其他示例复制的,仅检索大型网页的前几 K。我检查了该网页没有超过 8192 字节的行(这可能吗?),但是在 40K 左右的网页中,我可能会返回 6K,可能是 20K。读取的字节数似乎与网页总大小、网页模数8192、网页内容没有简单的关系。

大家有什么想法吗?

谢谢!

public static String myHttpGet(String url) throws Exception {
BufferedReader in = null;
try {
    HttpClient client = getHttpClient();
    HttpGet request = new HttpGet();
    request.setURI(new URI(url));
    HttpResponse response = client.execute(request);
    in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));

    StringBuffer sbuffer = new StringBuffer("");
    String line = "";

    while ((line = in.readLine()) != null) {
        sbuffer.append(line + "\n");
    }
    in.close();

    String result = sbuffer.toString();
    return result; 
} finally {
    if (in != null) {
        try {
            in.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
}

I'm a novice with Java and Android, but not to programming and HTTP. This HTTP GET method, mostly copied from other examples using the Apache HTTP classes, only retrieves the first few K of a large webpage. I checked that the webpage does not have lines longer than 8192 bytes (is that possible?), but out of webpages around 40K I get back maybe 6K, maybe 20K. The number of bytes read does not seem to have a simple realtionship with the total webpage size, or the webpage modulus 8192, or with the webpage content.

Any ideas folks?

Thanks!

public static String myHttpGet(String url) throws Exception {
BufferedReader in = null;
try {
    HttpClient client = getHttpClient();
    HttpGet request = new HttpGet();
    request.setURI(new URI(url));
    HttpResponse response = client.execute(request);
    in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));

    StringBuffer sbuffer = new StringBuffer("");
    String line = "";

    while ((line = in.readLine()) != null) {
        sbuffer.append(line + "\n");
    }
    in.close();

    String result = sbuffer.toString();
    return result; 
} finally {
    if (in != null) {
        try {
            in.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

凉城 2024-12-10 09:21:24

无需编写自己的 HttpEntity-to-String 代码,请尝试 EntityUtils 相反:

// this uses the charset the server encoded the entity in
String result = EntityUtils.toString(entity);

No need to write you own HttpEntity-to-String code, try EntityUtils instead:

// this uses the charset the server encoded the entity in
String result = EntityUtils.toString(entity);
蒗幽 2024-12-10 09:21:24

看起来问题好像是来自某个以 Goo 开头的网站的页面...我对其他网站的大页面没有这个问题。所以代码可能没问题。

It looks as if the problem is with pages from a certain website starting Goo... I'm not having this problem with large pages from other sites. So the code is probably OK.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文