如何使用 Jsoup 通过 HTTPS 连接?

发布于 2024-12-09 12:20:17 字数 1969 浏览 0 评论 0 原文

它在 HTTP 上工作正常,但是当我尝试使用 HTTPS 源时,它会抛出以下异常:

10-12 13:22:11.169: WARN/System.err(332): javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:477)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:328)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpConnection.setupSecureSocket(HttpConnection.java:185)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeSslConnection(HttpsURLConnectionImpl.java:433)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeConnection(HttpsURLConnectionImpl.java:378)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:205)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:152)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:377)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)

这是相关代码:

try {
    doc = Jsoup.connect("https url here").get();
} catch (IOException e) {
    Log.e("sys","coudnt get the html");
    e.printStackTrace();
}

It's working fine over HTTP, but when I try and use an HTTPS source it throws the following exception:

10-12 13:22:11.169: WARN/System.err(332): javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:477)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:328)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpConnection.setupSecureSocket(HttpConnection.java:185)
10-12 13:22:11.179: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeSslConnection(HttpsURLConnectionImpl.java:433)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeConnection(HttpsURLConnectionImpl.java:378)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:205)
10-12 13:22:11.189: WARN/System.err(332):     at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:152)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:377)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
10-12 13:22:11.189: WARN/System.err(332):     at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)

Here's the relevant code:

try {
    doc = Jsoup.connect("https url here").get();
} catch (IOException e) {
    Log.e("sys","coudnt get the html");
    e.printStackTrace();
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

软糖 2024-12-16 12:20:18

如果您想以正确的方式执行此操作,并且/或者您只需要处理一个网站,那么您基本上需要获取相关网站的 SSL 证书并将其导入您的 Java 密钥存储中。这将生成一个 JKS 文件,您可以在使用 Jsoup(或 java.net.URLConnection)之前将其设置为 SSL 信任存储。

您可以从网络浏览器的商店获取证书。假设您使用的是 Firefox。

  1. 使用 Firefox 访问相关网站,即您的情况 https:// /web2.uconn.edu/driver/old/timepoints.php?stopid=10
  2. 在地址栏中,您会看到蓝色的“uconn.edu”(这表示有效的 SSL 证书),
  3. 单击它即可详细信息,然后单击点击更多信息按钮。
  4. 在出现的安全对话框中,单击“查看证书”按钮。
  5. 在出现的证书面板中,转到“详细信息”选项卡。
  6. 单击证书层次结构的最深项目,在本例中为“web2.uconn.edu”,最后单击导出按钮。

现在您已经有了一个 web2.uconn.edu.crt 文件。

接下来,打开命令提示符并使用 keytool 命令(它是 JRE 的一部分)将其导入到 Java 密钥存储中:

keytool -import -v -file /path/to/web2.uconn.edu.crt -keystore /path/to/web2.uconn.edu.jks -storepass drowssap

-file 必须指向您刚刚下载的 .crt 文件。 -keystore 必须指向生成的 .jks 文件的位置(您又希望将其设置为 SSL 信任存储)。 -storepass 是必需的,您可以输入任何您想要的密码,只要它至少为 6 个字符即可。

现在,您已经有了一个 web2.uconn.edu.jks 文件。您最终可以在连接之前将其设置为 SSL 信任存储,如下所示:

System.setProperty("javax.net.ssl.trustStore", "/path/to/web2.uconn.edu.jks");
Document document = Jsoup.connect("https://web2.uconn.edu/driver/old/timepoints.php?stopid=10").get();
// ...

作为完全不同的替代方案,特别是当您需要处理多个站点时(即您正在创建一个万维网爬虫),那么您还可以指示 Jsoup (基本上) 、java.net.URLConnection)来盲目信任所有 SSL 证书。另请参阅本答案最底部的“处理不受信任或配置错误的 HTTPS 站点”部分: 使用 java.net.URLConnection 触发和处理 HTTP 请求

If you want to do it the right way, and/or you need to deal with only one site, then you basically need to grab the SSL certificate of the website in question and import it in your Java key store. This will result in a JKS file which you in turn set as SSL trust store before using Jsoup (or java.net.URLConnection).

You can grab the certificate from your webbrowser's store. Let's assume that you're using Firefox.

  1. Go to the website in question using Firefox, which is in your case https://web2.uconn.edu/driver/old/timepoints.php?stopid=10
  2. Left in the address bar you'll see "uconn.edu" in blue (this indicates a valid SSL certificate)
  3. Click on it for details and then click on the More information button.
  4. In the security dialogue which appears, click the View Certificate button.
  5. In the certificate panel which appears, go to the Details tab.
  6. Click the deepest item of the certificate hierarchy, which is in this case "web2.uconn.edu" and finally click the Export button.

Now you've a web2.uconn.edu.crt file.

Next, open the command prompt and import it in the Java key store using the keytool command (it's part of the JRE):

keytool -import -v -file /path/to/web2.uconn.edu.crt -keystore /path/to/web2.uconn.edu.jks -storepass drowssap

The -file must point to the location of the .crt file which you just downloaded. The -keystore must point to the location of the generated .jks file (which you in turn want to set as SSL trust store). The -storepass is required, you can just enter whatever password you want as long as it's at least 6 characters.

Now, you've a web2.uconn.edu.jks file. You can finally set it as SSL trust store before connecting as follows:

System.setProperty("javax.net.ssl.trustStore", "/path/to/web2.uconn.edu.jks");
Document document = Jsoup.connect("https://web2.uconn.edu/driver/old/timepoints.php?stopid=10").get();
// ...

As a completely different alternative, particularly when you need to deal with multiple sites (i.e. you're creating a world wide web crawler), then you can also instruct Jsoup (basically, java.net.URLConnection) to blindly trust all SSL certificates. See also section "Dealing with untrusted or misconfigured HTTPS sites" at the very bottom of this answer: Using java.net.URLConnection to fire and handle HTTP requests

稚气少女 2024-12-16 12:20:18

就我而言,我需要做的就是在我的连接中添加 .validateTLSCertificates(false)

Document doc  = Jsoup.connect(httpsURLAsString)
            .timeout(60000).validateTLSCertificates(false).get();

我还必须增加读取超时,但我认为这是无关紧要的

In my case, all I needed to do was to add the .validateTLSCertificates(false) in my connection

Document doc  = Jsoup.connect(httpsURLAsString)
            .timeout(60000).validateTLSCertificates(false).get();

I also had to increase the read timeout but I think this is irrelevant

初懵 2024-12-16 12:20:18

我偶然发现了这里的答案以及搜索中链接问题的答案,并想添加两条信息,因为接受的答案不适合我非常相似的场景,但还有一个即使在这种情况下也适合的附加解决方案(cert和主机名与测试系统不匹配)。

  1. 有一个 github 请求添加这样的功能。所以也许很快问题就会得到解决: https://github.com/jhy/jsoup/pull/第343章
    编辑:Github请求已解决,禁用证书验证的方法是: validateTLSCertificates(boolean validate)
  2. 基于 http://www.nakov.com/blog/2009/07/16/disable-certificate-validation-in-java-ssl-connections/ 我找到了一个似乎有效的解决方案(至少在我的场景中,jsoup 1.7.3 被称为 Maven 任务的一部分)。我将其包装在一个方法 disableSSLCertCheck() 中,我在第一个 Jsoup.connect() 之前调用该方法。

在使用此方法之前,您应该真正确定自己了解自己所做的事情 - 不检查 SSL 证书是一件非常愚蠢的事情。始终为您的服务器使用由普遍接受的 CA 签名的正确 SSL 证书。如果您买不起普遍接受的 CA,请使用正确的 SSL 证书,尽管如此,@BalusC 接受了上面的答案。如果您无法配置正确的 SSL 证书(在生产环境中永远不会出现这种情况),可以使用以下方法:

    private void disableSSLCertCheck() throws NoSuchAlgorithmException, KeyManagementException {
    // Create a trust manager that does not validate certificate chains
    TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
            public java.security.cert.X509Certificate[] getAcceptedIssuers() {
                return null;
            }
            public void checkClientTrusted(X509Certificate[] certs, String authType) {
            }
            public void checkServerTrusted(X509Certificate[] certs, String authType) {
            }
        }
    };

    // Install the all-trusting trust manager
    SSLContext sc = SSLContext.getInstance("SSL");
    sc.init(null, trustAllCerts, new java.security.SecureRandom());
    HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());

    // Create all-trusting host name verifier
    HostnameVerifier allHostsValid = new HostnameVerifier() {
        public boolean verify(String hostname, SSLSession session) {
            return true;
        }
    };

    // Install the all-trusting host verifier
    HttpsURLConnection.setDefaultHostnameVerifier(allHostsValid);
    }

I stumbled over the answers here and in the linked question in my search and want to add two pieces of information, as the accepted answer doesn't fit my quite similar scenario, but there is an additional solution that fits even in that case (cert and hostname don't match for test systems).

  1. There is a github request to add such a functionality. So perhaps soon the problem will be solved: https://github.com/jhy/jsoup/pull/343
    edit: Github request was resolved and the method to disable certificate validation is: validateTLSCertificates(boolean validate)
  2. Based on http://www.nakov.com/blog/2009/07/16/disable-certificate-validation-in-java-ssl-connections/ I found a solution which seems to work (at least in my scenario where jsoup 1.7.3 is called as part of a maven task). I wrapped it in a method disableSSLCertCheck() that I call before the very first Jsoup.connect().

Before you use this method, you should be really sure that you understand what you do there - not checking SSL certificates is a really stupid thing. Always use correct SSL certificates for your servers which are signed by a commonly accepted CA. If you can't afford a commonly accepted CA use correct SSL certificates nevertheless with @BalusC accepted answer above. If you can't configure correct SSL certificates (which should never be the case in production environments) the following method could work:

    private void disableSSLCertCheck() throws NoSuchAlgorithmException, KeyManagementException {
    // Create a trust manager that does not validate certificate chains
    TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
            public java.security.cert.X509Certificate[] getAcceptedIssuers() {
                return null;
            }
            public void checkClientTrusted(X509Certificate[] certs, String authType) {
            }
            public void checkServerTrusted(X509Certificate[] certs, String authType) {
            }
        }
    };

    // Install the all-trusting trust manager
    SSLContext sc = SSLContext.getInstance("SSL");
    sc.init(null, trustAllCerts, new java.security.SecureRandom());
    HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());

    // Create all-trusting host name verifier
    HostnameVerifier allHostsValid = new HostnameVerifier() {
        public boolean verify(String hostname, SSLSession session) {
            return true;
        }
    };

    // Install the all-trusting host verifier
    HttpsURLConnection.setDefaultHostnameVerifier(allHostsValid);
    }
雨夜星沙 2024-12-16 12:20:18

要抑制特定 JSoup 连接的证书警告可以使用以下方法:

Kotlin


val document = Jsoup.connect("url")
        .sslSocketFactory(socketFactory())
        .get()


private fun socketFactory(): SSLSocketFactory {
    val trustAllCerts = arrayOf<TrustManager>(object : X509TrustManager {
        @Throws(CertificateException::class)
        override fun checkClientTrusted(chain: Array<X509Certificate>, authType: String) {
        }

        @Throws(CertificateException::class)
        override fun checkServerTrusted(chain: Array<X509Certificate>, authType: String) {
        }

        override fun getAcceptedIssuers(): Array<X509Certificate> {
            return arrayOf()
        }
    })

    try {
        val sslContext = SSLContext.getInstance("TLS")
        sslContext.init(null, trustAllCerts, java.security.SecureRandom())
        return sslContext.socketFactory
    } catch (e: Exception) {
        when (e) {
            is RuntimeException, is KeyManagementException -> {
                throw RuntimeException("Failed to create a SSL socket factory", e)
            }
            else -> throw e
        }
    }
}

Java



 Document document = Jsoup.connect("url")
        .sslSocketFactory(socketFactory())
        .get();


  private SSLSocketFactory socketFactory() {
    TrustManager[] trustAllCerts = new TrustManager[]{new X509TrustManager() {
      public java.security.cert.X509Certificate[] getAcceptedIssuers() {
        return null;
      }

      public void checkClientTrusted(X509Certificate[] certs, String authType) {
      }

      public void checkServerTrusted(X509Certificate[] certs, String authType) {
      }
    }};

    try {
      SSLContext sslContext = SSLContext.getInstance("TLS");
      sslContext.init(null, trustAllCerts, new java.security.SecureRandom());
      return sslContext.getSocketFactory();
    } catch (NoSuchAlgorithmException | KeyManagementException e) {
      throw new RuntimeException("Failed to create a SSL socket factory", e);
    }
  }

NB。正如之前提到的,忽略证书不是一个好主意。

To suppress certificate warnings for specific JSoup connection can use following approach:

Kotlin


val document = Jsoup.connect("url")
        .sslSocketFactory(socketFactory())
        .get()


private fun socketFactory(): SSLSocketFactory {
    val trustAllCerts = arrayOf<TrustManager>(object : X509TrustManager {
        @Throws(CertificateException::class)
        override fun checkClientTrusted(chain: Array<X509Certificate>, authType: String) {
        }

        @Throws(CertificateException::class)
        override fun checkServerTrusted(chain: Array<X509Certificate>, authType: String) {
        }

        override fun getAcceptedIssuers(): Array<X509Certificate> {
            return arrayOf()
        }
    })

    try {
        val sslContext = SSLContext.getInstance("TLS")
        sslContext.init(null, trustAllCerts, java.security.SecureRandom())
        return sslContext.socketFactory
    } catch (e: Exception) {
        when (e) {
            is RuntimeException, is KeyManagementException -> {
                throw RuntimeException("Failed to create a SSL socket factory", e)
            }
            else -> throw e
        }
    }
}

Java



 Document document = Jsoup.connect("url")
        .sslSocketFactory(socketFactory())
        .get();


  private SSLSocketFactory socketFactory() {
    TrustManager[] trustAllCerts = new TrustManager[]{new X509TrustManager() {
      public java.security.cert.X509Certificate[] getAcceptedIssuers() {
        return null;
      }

      public void checkClientTrusted(X509Certificate[] certs, String authType) {
      }

      public void checkServerTrusted(X509Certificate[] certs, String authType) {
      }
    }};

    try {
      SSLContext sslContext = SSLContext.getInstance("TLS");
      sslContext.init(null, trustAllCerts, new java.security.SecureRandom());
      return sslContext.getSocketFactory();
    } catch (NoSuchAlgorithmException | KeyManagementException e) {
      throw new RuntimeException("Failed to create a SSL socket factory", e);
    }
  }

NB. As mentioned before ignoring certificates is not a good idea.

风透绣罗衣 2024-12-16 12:20:18

我遇到了同样的问题,但采取了懒惰的路线 - 告诉您的应用程序忽略证书并继续进行。

我从这里得到了代码: How do I use a java 中的本地 HTTPS URL?

您必须导入这些类才能使其工作:

import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSession;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;

只需在尝试建立连接之前在某处运行该方法即可,瞧,无论如何它都信任证书。当然,如果您确实想确保证书是真实的,这没有任何帮助,但对于监控您自己的内部网站等很有帮助。

I've had the same problem but took the lazy route - tell your app to ignore the cert and carry on anyway.

I got the code from here: How do I use a local HTTPS URL in java?

You'll have to import these classes for it to work:

import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSession;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;

Just run that method somewhere before you try to make the connection and voila, it just trusts the cert no matter what. Of course this isn't any help if you actually want to make sure the cert is real, but good for monitoring your own internal websites etc.

扮仙女 2024-12-16 12:20:18

在这里测试解决方案后。奇怪的是,Jsoup 中的 sslSocketFactory 设置完全没用,而且永远不起作用。所以不需要获取和设置SSLSocketFactory

事实上,森解决方案的后半部分是有效的。在使用 Jsoup 之前只需要执行以下操作:

// Create all-trusting host name verifier
HostnameVerifier allHostsValid = new HostnameVerifier() {
    public boolean verify(String hostname, SSLSession session) {
        return true;
    }
};

// Install the all-trusting host verifier
HttpsURLConnection.setDefaultHostnameVerifier(allHostsValid);

这是使用 Jsoup 1.13.1 进行测试的。

After testing the solutions here. It is strange that sslSocketFactory setting in Jsoup is completely useless and it never works. So there is no need to get and set SSLSocketFactory.

Actually the second half of Mori solution works. Just need the following before using Jsoup:

// Create all-trusting host name verifier
HostnameVerifier allHostsValid = new HostnameVerifier() {
    public boolean verify(String hostname, SSLSession session) {
        return true;
    }
};

// Install the all-trusting host verifier
HttpsURLConnection.setDefaultHostnameVerifier(allHostsValid);

This is tested with Jsoup 1.13.1.

幻梦 2024-12-16 12:20:18

我不是该领域的专家,但在尝试使用 java.net API 通过 HTTPS 连接到网站时遇到了类似的异常。当您访问使用 HTTPS 的站点时,浏览器会为您执行大量有关 SSL 证书的工作。但是,当您手动连接到站点(手动使用 HTTP 请求)时,仍然需要完成所有工作。现在我不知道所有这些工作到底是什么,但它与下载证书并将它们放在 Java 可以找到的地方有关。这是一个链接,希望能为您指明正确的方向。

http://confluence.atlassian.com/display/JIRA/Connecting+to +SSL+服务

I'm no expert in this field but I ran into a similar exception when trying to connect to a website over HTTPS using java.net APIs. The browser does a lot of work for you regarding SSL certificates when you visit a site using HTTPS. However, when you are manually connecting to sites (using HTTP requests manually), all that work still needs to be done. Now I don't know what all this work is exactly, but it has to do with downloading certificates and putting them where Java can find them. Here's a link that will hopefully point you in the right direction.

http://confluence.atlassian.com/display/JIRA/Connecting+to+SSL+services

黎歌 2024-12-16 12:20:18

我在使用 Jsoup 时遇到了同样的问题,我无法连接并获取 https url 的文档,但是当我将 JDK 版本从 1.7 更改为 1.8 时,问题得到了解决。

它可能对你有帮助:)

I was facing the same issue with Jsoup, I was not able to connect and get the document for https urls but when I changed my JDK version from 1.7 to 1.8, the issue got resolved.

It may help you :)

巨坚强 2024-12-16 12:20:18

我只在开发环境中遇到过这个问题。解决这个问题的解决方案只是添加一些标志来忽略 VM 的 SSL:

-Ddeployment.security.TLSv1.1=false 
-Ddeployment.security.TLSv1.2=false

I've had that problem only in dev environment. The solution to solve it was just to add a few flags to ignore SSL to VM:

-Ddeployment.security.TLSv1.1=false 
-Ddeployment.security.TLSv1.2=false
呢古 2024-12-16 12:20:18

尝试以下操作(只需将其放在 Jsoup.connect("https://example.com") 之前:

    Authenticator.setDefault(new Authenticator() {
        @Override
        protected PasswordAuthentication getPasswordAuthentication() {
            return new PasswordAuthentication(username, password.toCharArray());
        }
    });

Try following (just put it before Jsoup.connect("https://example.com"):

    Authenticator.setDefault(new Authenticator() {
        @Override
        protected PasswordAuthentication getPasswordAuthentication() {
            return new PasswordAuthentication(username, password.toCharArray());
        }
    });
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文