如何使用 Jsoup 通过 HTTPS 连接?
它在 HTTP 上工作正常,但是当我尝试使用 HTTPS 源时,它会抛出以下异常:
10-12 13:22:11.169: WARN/System.err(332): javax.net.ssl.SSLHandshakeException: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:477)
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:328)
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpConnection.setupSecureSocket(HttpConnection.java:185)
10-12 13:22:11.179: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeSslConnection(HttpsURLConnectionImpl.java:433)
10-12 13:22:11.189: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl$HttpsEngine.makeConnection(HttpsURLConnectionImpl.java:378)
10-12 13:22:11.189: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.connect(HttpURLConnectionImpl.java:205)
10-12 13:22:11.189: WARN/System.err(332): at org.apache.harmony.luni.internal.net.www.protocol.https.HttpsURLConnectionImpl.connect(HttpsURLConnectionImpl.java:152)
10-12 13:22:11.189: WARN/System.err(332): at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:377)
10-12 13:22:11.189: WARN/System.err(332): at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
10-12 13:22:11.189: WARN/System.err(332): at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
这是相关代码:
try {
doc = Jsoup.connect("https url here").get();
} catch (IOException e) {
Log.e("sys","coudnt get the html");
e.printStackTrace();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
如果您想以正确的方式执行此操作,并且/或者您只需要处理一个网站,那么您基本上需要获取相关网站的 SSL 证书并将其导入您的 Java 密钥存储中。这将生成一个 JKS 文件,您可以在使用 Jsoup(或
java.net.URLConnection
)之前将其设置为 SSL 信任存储。您可以从网络浏览器的商店获取证书。假设您使用的是 Firefox。
现在您已经有了一个
web2.uconn.edu.crt
文件。接下来,打开命令提示符并使用
keytool
命令(它是 JRE 的一部分)将其导入到 Java 密钥存储中:-file
必须指向您刚刚下载的.crt
文件。-keystore
必须指向生成的.jks
文件的位置(您又希望将其设置为 SSL 信任存储)。-storepass
是必需的,您可以输入任何您想要的密码,只要它至少为 6 个字符即可。现在,您已经有了一个
web2.uconn.edu.jks
文件。您最终可以在连接之前将其设置为 SSL 信任存储,如下所示:作为完全不同的替代方案,特别是当您需要处理多个站点时(即您正在创建一个万维网爬虫),那么您还可以指示 Jsoup (基本上) 、
java.net.URLConnection
)来盲目信任所有 SSL 证书。另请参阅本答案最底部的“处理不受信任或配置错误的 HTTPS 站点”部分: 使用 java.net.URLConnection 触发和处理 HTTP 请求If you want to do it the right way, and/or you need to deal with only one site, then you basically need to grab the SSL certificate of the website in question and import it in your Java key store. This will result in a JKS file which you in turn set as SSL trust store before using Jsoup (or
java.net.URLConnection
).You can grab the certificate from your webbrowser's store. Let's assume that you're using Firefox.
Now you've a
web2.uconn.edu.crt
file.Next, open the command prompt and import it in the Java key store using the
keytool
command (it's part of the JRE):The
-file
must point to the location of the.crt
file which you just downloaded. The-keystore
must point to the location of the generated.jks
file (which you in turn want to set as SSL trust store). The-storepass
is required, you can just enter whatever password you want as long as it's at least 6 characters.Now, you've a
web2.uconn.edu.jks
file. You can finally set it as SSL trust store before connecting as follows:As a completely different alternative, particularly when you need to deal with multiple sites (i.e. you're creating a world wide web crawler), then you can also instruct Jsoup (basically,
java.net.URLConnection
) to blindly trust all SSL certificates. See also section "Dealing with untrusted or misconfigured HTTPS sites" at the very bottom of this answer: Using java.net.URLConnection to fire and handle HTTP requests就我而言,我需要做的就是在我的连接中添加 .validateTLSCertificates(false)
我还必须增加读取超时,但我认为这是无关紧要的
In my case, all I needed to do was to add the .validateTLSCertificates(false) in my connection
I also had to increase the read timeout but I think this is irrelevant
我偶然发现了这里的答案以及搜索中链接问题的答案,并想添加两条信息,因为接受的答案不适合我非常相似的场景,但还有一个即使在这种情况下也适合的附加解决方案(cert和主机名与测试系统不匹配)。
编辑:Github请求已解决,禁用证书验证的方法是: validateTLSCertificates(boolean validate)
disableSSLCertCheck()
中,我在第一个 Jsoup.connect() 之前调用该方法。在使用此方法之前,您应该真正确定自己了解自己所做的事情 - 不检查 SSL 证书是一件非常愚蠢的事情。始终为您的服务器使用由普遍接受的 CA 签名的正确 SSL 证书。如果您买不起普遍接受的 CA,请使用正确的 SSL 证书,尽管如此,@BalusC 接受了上面的答案。如果您无法配置正确的 SSL 证书(在生产环境中永远不会出现这种情况),可以使用以下方法:
I stumbled over the answers here and in the linked question in my search and want to add two pieces of information, as the accepted answer doesn't fit my quite similar scenario, but there is an additional solution that fits even in that case (cert and hostname don't match for test systems).
edit: Github request was resolved and the method to disable certificate validation is: validateTLSCertificates(boolean validate)
disableSSLCertCheck()
that I call before the very first Jsoup.connect().Before you use this method, you should be really sure that you understand what you do there - not checking SSL certificates is a really stupid thing. Always use correct SSL certificates for your servers which are signed by a commonly accepted CA. If you can't afford a commonly accepted CA use correct SSL certificates nevertheless with @BalusC accepted answer above. If you can't configure correct SSL certificates (which should never be the case in production environments) the following method could work:
要抑制特定 JSoup 连接的证书警告可以使用以下方法:
Kotlin
Java
NB。正如之前提到的,忽略证书不是一个好主意。
To suppress certificate warnings for specific JSoup connection can use following approach:
Kotlin
Java
NB. As mentioned before ignoring certificates is not a good idea.
我遇到了同样的问题,但采取了懒惰的路线 - 告诉您的应用程序忽略证书并继续进行。
我从这里得到了代码: How do I use a java 中的本地 HTTPS URL?
您必须导入这些类才能使其工作:
只需在尝试建立连接之前在某处运行该方法即可,瞧,无论如何它都信任证书。当然,如果您确实想确保证书是真实的,这没有任何帮助,但对于监控您自己的内部网站等很有帮助。
I've had the same problem but took the lazy route - tell your app to ignore the cert and carry on anyway.
I got the code from here: How do I use a local HTTPS URL in java?
You'll have to import these classes for it to work:
Just run that method somewhere before you try to make the connection and voila, it just trusts the cert no matter what. Of course this isn't any help if you actually want to make sure the cert is real, but good for monitoring your own internal websites etc.
在这里测试解决方案后。奇怪的是,Jsoup 中的
sslSocketFactory
设置完全没用,而且永远不起作用。所以不需要获取和设置SSLSocketFactory
。事实上,森解决方案的后半部分是有效的。在使用 Jsoup 之前只需要执行以下操作:
这是使用 Jsoup 1.13.1 进行测试的。
After testing the solutions here. It is strange that
sslSocketFactory
setting in Jsoup is completely useless and it never works. So there is no need to get and setSSLSocketFactory
.Actually the second half of Mori solution works. Just need the following before using Jsoup:
This is tested with Jsoup 1.13.1.
我不是该领域的专家,但在尝试使用 java.net API 通过 HTTPS 连接到网站时遇到了类似的异常。当您访问使用 HTTPS 的站点时,浏览器会为您执行大量有关 SSL 证书的工作。但是,当您手动连接到站点(手动使用 HTTP 请求)时,仍然需要完成所有工作。现在我不知道所有这些工作到底是什么,但它与下载证书并将它们放在 Java 可以找到的地方有关。这是一个链接,希望能为您指明正确的方向。
http://confluence.atlassian.com/display/JIRA/Connecting+to +SSL+服务
I'm no expert in this field but I ran into a similar exception when trying to connect to a website over HTTPS using java.net APIs. The browser does a lot of work for you regarding SSL certificates when you visit a site using HTTPS. However, when you are manually connecting to sites (using HTTP requests manually), all that work still needs to be done. Now I don't know what all this work is exactly, but it has to do with downloading certificates and putting them where Java can find them. Here's a link that will hopefully point you in the right direction.
http://confluence.atlassian.com/display/JIRA/Connecting+to+SSL+services
我在使用 Jsoup 时遇到了同样的问题,我无法连接并获取 https url 的文档,但是当我将 JDK 版本从 1.7 更改为 1.8 时,问题得到了解决。
它可能对你有帮助:)
I was facing the same issue with Jsoup, I was not able to connect and get the document for https urls but when I changed my JDK version from 1.7 to 1.8, the issue got resolved.
It may help you :)
我只在开发环境中遇到过这个问题。解决这个问题的解决方案只是添加一些标志来忽略 VM 的 SSL:
I've had that problem only in dev environment. The solution to solve it was just to add a few flags to ignore SSL to VM:
尝试以下操作(只需将其放在
Jsoup.connect("https://example.com")
之前:Try following (just put it before
Jsoup.connect("https://example.com")
: