使用cheerio报废网页时如何解决“错误:无法验证第一个证书”?

发布于 2025-01-10 03:39:38 字数 1218 浏览 1 评论 0原文

我正在尝试使用cheerio 学习网络抓取。但是当我试图废弃内容时。在其中一个网站中,我收到以下错误:

Error: unable to verify the first certificate
    at TLSSocket.onConnectSecure (_tls_wrap.js:1515:34)
    at TLSSocket.emit (events.js:400:28)
    at TLSSocket._finishInit (_tls_wrap.js:937:8)
    at TLSWrap.ssl.onhandshakedone (_tls_wrap.js:709:12) {
  code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE',

我无法理解我尝试废弃的其他网站,但没有收到错误。

这是我的代码:

const express = require('express');
const axios = require('axios');
const cheerio = require('cheerio');

const app = express();
const url = "--------------Link Of the site--------------";


axios.get(url)
    .then(response => {
        const html = response.data;
        const $ = cheerio.load(html);
        const articles = [];
        
        $('.text-left a',html).each(function(){
            const title= $(this).text();
            const url= $(this).attr('href');

            articles.push({
                title,
                url    
            })
        })        

        console.log(articles);

    })
    .catch(err => {
        console.log(err);
    })



app.listen(8080, () => console.log('Server running'));


请指导我如何解决此错误。

I am trying to learn web scrapping using cheerio. But when I am trying to scrap the content. In one of the site i am getting the following error:

Error: unable to verify the first certificate
    at TLSSocket.onConnectSecure (_tls_wrap.js:1515:34)
    at TLSSocket.emit (events.js:400:28)
    at TLSSocket._finishInit (_tls_wrap.js:937:8)
    at TLSWrap.ssl.onhandshakedone (_tls_wrap.js:709:12) {
  code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE',

I am unable to understand as for other sites which i tried to scrap i did not got the error.

Here is my code:

const express = require('express');
const axios = require('axios');
const cheerio = require('cheerio');

const app = express();
const url = "--------------Link Of the site--------------";


axios.get(url)
    .then(response => {
        const html = response.data;
        const $ = cheerio.load(html);
        const articles = [];
        
        $('.text-left a',html).each(function(){
            const title= $(this).text();
            const url= $(this).attr('href');

            articles.push({
                title,
                url    
            })
        })        

        console.log(articles);

    })
    .catch(err => {
        console.log(err);
    })



app.listen(8080, () => console.log('Server running'));


Please guide me on how to resolve this error.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

白馒头 2025-01-17 03:39:38

您收到错误的网站提供了无效的证书(证书本身创建错误或损坏,或者与其声明的颁发者不匹配);其他站点提供有效的证书,因此它们不会收到此错误,而是可以正常工作。

如果该网站可公开访问(在 443 上),请尝试使用 https://www.ssllabs.com/ssltest 毫不费力地获得相当彻底的分析。 (它实际上会检查并显示有关站点安全性的很多信息,不仅是证书,而且您可以忽略其他部分。)

否则,或者如果这没有足够的帮助,假设您拥有或(可以)获得 OpenSSL,请执行以下

openssl s_client -connect host:port -servername host -showcerts
# on OpenSSL 1.1.1 up you can omit the -servername host

操作:然后拆分握手中的证书(从 -----BEGIN CERTIFICATE----------END CERTIFICATE----- 的块>) 到单独的文件中,并且至少对于前两个文件执行

openssl x509 -in fileN -text

并确定如果它们似乎正确链接(first.issuer = secondary.subject 且first.AKI = secondary.SKI(如果存在))并且还测试

openssl verify -CAfile file2 -partial_chain file1 
# assuming OpenSSL 1.0.2 up; if older post details, this will be much harder

PS:您的意思是刮/刮(在上下文中,从中获取信息)而不是刮/刮(丢弃为有缺陷或无法使用)。

The site on which you get the error is providing an invalid certificate (either the cert itself is wrongly created or damaged, or it doesn't match its claimed issuer); the other sites are providing valid certificates so they don't get this error and instead work correctly.

If the site is publicly accessible (on 443), try using https://www.ssllabs.com/ssltest to get a pretty thorough analysis with little effort. (It will actually check and display a lot of things about the site's security not only the certificate, but you can ignore the other parts.)

Otherwise or if that doesn't help enough, assuming you have or (can) get OpenSSL, do

openssl s_client -connect host:port -servername host -showcerts
# on OpenSSL 1.1.1 up you can omit the -servername host

then split apart the certificates in the handshake (the blocks from -----BEGIN CERTIFICATE----- to -----END CERTIFICATE-----) into separate files and for at least the first two do

openssl x509 -in fileN -text

and determine if they appear to chain correctly (first.issuer = second.subject and first.AKI = second.SKI if present) and also test

openssl verify -CAfile file2 -partial_chain file1 
# assuming OpenSSL 1.0.2 up; if older post details, this will be much harder

PS: you mean scrape/scraping (in context, obtaining information from) not scrap/scrapping (discarding as defective or unusable).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文