Android 导致 HTTPS 页面被截断

发布于 2024-10-03 20:42:21 字数 4371 浏览 0 评论 0原文

我正在使用 HTTPS 在 Android 上获取网页(忽略证书,因为它是自签名且已过时,如所见 这里 - 不要问,这不是我的服务器:))。

我已经定义了

public class MyHttpClient extends DefaultHttpClient {


    public MyHttpClient() {
        super();
        final HttpParams params = getParams();
        HttpConnectionParams.setConnectionTimeout(params,
                REGISTRATION_TIMEOUT);
        HttpConnectionParams.setSoTimeout(params, REGISTRATION_TIMEOUT);
        ConnManagerParams.setTimeout(params, REGISTRATION_TIMEOUT);
    }

    @Override
    protected ClientConnectionManager createClientConnectionManager() {
        SchemeRegistry registry = new SchemeRegistry();
        registry.register(new Scheme("http", PlainSocketFactory
                .getSocketFactory(), 80));
        registry.register(new Scheme("https", new UnsecureSSLSocketFactory(), 443));
        return new SingleClientConnManager(getParams(), registry);
    }
}

提到的 UnsecureSSLSocketFactory 的位置,它是基于上述 主题

然后,我使用此类来影响页面

public class HTTPHelper {

    private final static String TAG = "HTTPHelper";
    private final static String CHARSET = "ISO-8859-1";

    public static final String USER_AGENT = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)";
    public static final String ACCEPT_CHARSET = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    public static final String ACCEPT = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";


    /**
     * Sends an HTTP request
     * @param url
     * @param post
     * @return
     */
    public String sendRequest(String url, String post) throws ConnectionException {

        MyHttpClient httpclient = new MyHttpClient();

        HttpGet httpget = new HttpGet(url);
        httpget.addHeader("User-Agent", USER_AGENT);
        httpget.addHeader("Accept", ACCEPT);
        httpget.addHeader("Accept-Charset", ACCEPT_CHARSET);

        HttpResponse response;
        try {
            response = httpclient.execute(httpget);
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }

        HttpEntity entity = response.getEntity();

        try {
            pageSource = convertStreamToString(entity.getContent());
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }
        finally {
            if (entity != null) {
                try {
                    entity.consumeContent();
                } catch (IOException e) {
                    throw new ConnectionException(e.getMessage());
                }
            }
        }

        httpclient.getConnectionManager().shutdown();
        return pageSource;

    }

    /**
     * Converts a stream to a string
     * @param is
     * @return
     */
    private static String convertStreamToString(InputStream is) 
    {
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, CHARSET));
            StringBuilder stringBuilder = new StringBuilder();
            String line = null;
            try {
                while ((line = reader.readLine()) != null) {
                    stringBuilder.append(line + "\n");
                }
            } catch (IOException e) {
                Log.d(TAG, "Exception in convertStreamToString", e);
            } finally {
                try {
                    is.close();
                } catch (IOException e) {}
            }
            return stringBuilder.toString();
        } catch (Exception e) {
            throw new Error("Unsupported charset");
        }
    }

}

。我得到的页面在大约一百行后被截断。它在精确点被截断,其中“_”(下划线)字符后跟“r”字符。它不是页面中的第一个下划线。

我认为这可能是一个编码问题,所以我尝试了UTF-8和ISO-8859-1,但它仍然被截断。如果我用 Firefox 打开该页面,它会报告编码为 ISO-8851-1。

如果您想知道,该网页为 https://ricarichiamoci.dsu.pisa.it/ 它在第 169 行被截断,

function ChangeOffset(NewOffset) {
  document.mainForm.last

而它应该在哪里

function ChangeOffset(NewOffset) {
  document.mainForm.last_record.value = NewOffset;

有人知道为什么页面被截断吗?

I am fetching a web page on Android using HTTPS (ignoring the certificate as it is both self-signed and outdated, as seen here - don't ask, it's not my server :)).

I've defined my

public class MyHttpClient extends DefaultHttpClient {


    public MyHttpClient() {
        super();
        final HttpParams params = getParams();
        HttpConnectionParams.setConnectionTimeout(params,
                REGISTRATION_TIMEOUT);
        HttpConnectionParams.setSoTimeout(params, REGISTRATION_TIMEOUT);
        ConnManagerParams.setTimeout(params, REGISTRATION_TIMEOUT);
    }

    @Override
    protected ClientConnectionManager createClientConnectionManager() {
        SchemeRegistry registry = new SchemeRegistry();
        registry.register(new Scheme("http", PlainSocketFactory
                .getSocketFactory(), 80));
        registry.register(new Scheme("https", new UnsecureSSLSocketFactory(), 443));
        return new SingleClientConnManager(getParams(), registry);
    }
}

where the UnsecureSSLSocketFactory mentioned is based on the suggestion given on the aforementioned topic.

I'm then using this class to fecth a page

public class HTTPHelper {

    private final static String TAG = "HTTPHelper";
    private final static String CHARSET = "ISO-8859-1";

    public static final String USER_AGENT = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 (.NET CLR 3.5.30729)";
    public static final String ACCEPT_CHARSET = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    public static final String ACCEPT = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";


    /**
     * Sends an HTTP request
     * @param url
     * @param post
     * @return
     */
    public String sendRequest(String url, String post) throws ConnectionException {

        MyHttpClient httpclient = new MyHttpClient();

        HttpGet httpget = new HttpGet(url);
        httpget.addHeader("User-Agent", USER_AGENT);
        httpget.addHeader("Accept", ACCEPT);
        httpget.addHeader("Accept-Charset", ACCEPT_CHARSET);

        HttpResponse response;
        try {
            response = httpclient.execute(httpget);
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }

        HttpEntity entity = response.getEntity();

        try {
            pageSource = convertStreamToString(entity.getContent());
        } catch (Exception e) {
            throw new ConnectionException(e.getMessage());
        }
        finally {
            if (entity != null) {
                try {
                    entity.consumeContent();
                } catch (IOException e) {
                    throw new ConnectionException(e.getMessage());
                }
            }
        }

        httpclient.getConnectionManager().shutdown();
        return pageSource;

    }

    /**
     * Converts a stream to a string
     * @param is
     * @return
     */
    private static String convertStreamToString(InputStream is) 
    {
        try {
            BufferedReader reader = new BufferedReader(new InputStreamReader(is, CHARSET));
            StringBuilder stringBuilder = new StringBuilder();
            String line = null;
            try {
                while ((line = reader.readLine()) != null) {
                    stringBuilder.append(line + "\n");
                }
            } catch (IOException e) {
                Log.d(TAG, "Exception in convertStreamToString", e);
            } finally {
                try {
                    is.close();
                } catch (IOException e) {}
            }
            return stringBuilder.toString();
        } catch (Exception e) {
            throw new Error("Unsupported charset");
        }
    }

}

The page I get is truncated after about a hundred of lines. It's truncated at a precise point, where a '_' (underscore) char is followed by a 'r' char. It's not the first underscore in the page.

I thought it might have been an encoding issue, so I tried both UTF-8 and ISO-8859-1, but it's still truncated. If I open the page with Firefox, it reports the encoding being ISO-8851-1.

In case you are wondering, the webpage is https://ricarichiamoci.dsu.pisa.it/
and it gets truncated at line 169,

function ChangeOffset(NewOffset) {
  document.mainForm.last

where it should instead be

function ChangeOffset(NewOffset) {
  document.mainForm.last_record.value = NewOffset;

Does anyone have an idea of why the page is truncated?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

冰火雁神 2024-10-10 20:42:21

我发现下载的页面没有被截断,但我用来打印它的函数(Log.d)确实截断了字符串。

因此,下载页面源代码的方法工作正常,但 Log.d() 可能并不意味着打印那么多文本。

I figured out the page downloaded is not truncated, but the function I'm using to print it out (Log.d) does truncate the string.

So the method to download the page source code is working fine, but Log.d() is probably not meant to print that much amount of text.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文