使用cheerio报废网页时如何解决“错误:无法验证第一个证书”?
我正在尝试使用cheerio 学习网络抓取。但是当我试图废弃内容时。在其中一个网站中,我收到以下错误:
Error: unable to verify the first certificate
at TLSSocket.onConnectSecure (_tls_wrap.js:1515:34)
at TLSSocket.emit (events.js:400:28)
at TLSSocket._finishInit (_tls_wrap.js:937:8)
at TLSWrap.ssl.onhandshakedone (_tls_wrap.js:709:12) {
code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE',
我无法理解我尝试废弃的其他网站,但没有收到错误。
这是我的代码:
const express = require('express');
const axios = require('axios');
const cheerio = require('cheerio');
const app = express();
const url = "--------------Link Of the site--------------";
axios.get(url)
.then(response => {
const html = response.data;
const $ = cheerio.load(html);
const articles = [];
$('.text-left a',html).each(function(){
const title= $(this).text();
const url= $(this).attr('href');
articles.push({
title,
url
})
})
console.log(articles);
})
.catch(err => {
console.log(err);
})
app.listen(8080, () => console.log('Server running'));
请指导我如何解决此错误。
I am trying to learn web scrapping using cheerio. But when I am trying to scrap the content. In one of the site i am getting the following error:
Error: unable to verify the first certificate
at TLSSocket.onConnectSecure (_tls_wrap.js:1515:34)
at TLSSocket.emit (events.js:400:28)
at TLSSocket._finishInit (_tls_wrap.js:937:8)
at TLSWrap.ssl.onhandshakedone (_tls_wrap.js:709:12) {
code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE',
I am unable to understand as for other sites which i tried to scrap i did not got the error.
Here is my code:
const express = require('express');
const axios = require('axios');
const cheerio = require('cheerio');
const app = express();
const url = "--------------Link Of the site--------------";
axios.get(url)
.then(response => {
const html = response.data;
const $ = cheerio.load(html);
const articles = [];
$('.text-left a',html).each(function(){
const title= $(this).text();
const url= $(this).attr('href');
articles.push({
title,
url
})
})
console.log(articles);
})
.catch(err => {
console.log(err);
})
app.listen(8080, () => console.log('Server running'));
Please guide me on how to resolve this error.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您收到错误的网站提供了无效的证书(证书本身创建错误或损坏,或者与其声明的颁发者不匹配);其他站点提供有效的证书,因此它们不会收到此错误,而是可以正常工作。
如果该网站可公开访问(在 443 上),请尝试使用 https://www.ssllabs.com/ssltest 毫不费力地获得相当彻底的分析。 (它实际上会检查并显示有关站点安全性的很多信息,不仅是证书,而且您可以忽略其他部分。)
否则,或者如果这没有足够的帮助,假设您拥有或(可以)获得 OpenSSL,请执行以下
操作:然后拆分握手中的证书(从
-----BEGIN CERTIFICATE-----
到-----END CERTIFICATE-----
的块>) 到单独的文件中,并且至少对于前两个文件执行并确定如果它们似乎正确链接(first.issuer = secondary.subject 且first.AKI = secondary.SKI(如果存在))并且还测试
PS:您的意思是刮/刮(在上下文中,从中获取信息)而不是刮/刮(丢弃为有缺陷或无法使用)。
The site on which you get the error is providing an invalid certificate (either the cert itself is wrongly created or damaged, or it doesn't match its claimed issuer); the other sites are providing valid certificates so they don't get this error and instead work correctly.
If the site is publicly accessible (on 443), try using https://www.ssllabs.com/ssltest to get a pretty thorough analysis with little effort. (It will actually check and display a lot of things about the site's security not only the certificate, but you can ignore the other parts.)
Otherwise or if that doesn't help enough, assuming you have or (can) get OpenSSL, do
then split apart the certificates in the handshake (the blocks from
-----BEGIN CERTIFICATE-----
to-----END CERTIFICATE-----
) into separate files and for at least the first two doand determine if they appear to chain correctly (first.issuer = second.subject and first.AKI = second.SKI if present) and also test
PS: you mean scrape/scraping (in context, obtaining information from) not scrap/scrapping (discarding as defective or unusable).