木偶 - 有时会出现新错误

发布于 2025-02-10 00:36:18 字数 4019 浏览 1 评论 0原文

我尝试使用Puppeteer进行一些网络报废,我的脚本工作,但有时在理解中没有理由的情况下,我会遇到此错误:

file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/assert.js:23
        throw new Error(message);
              ^
Error
    at assert (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/assert.js:23:15)
    at FrameManager._FrameManager_onFrameAttached (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/FrameManager.js:318:5)
    at file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/FrameManager.js:89:103
    at file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/vendor/mitt/src/index.js:49:68
    at Array.map (<anonymous>)
    at Object.emit (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/vendor/mitt/src/index.js:49:43)
    at CDPSession.emit (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/EventEmitter.js:66:22)  
    at CDPSession._onMessage (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/Connection.js:273:18)
    at Connection._Connection_onMessage (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/Connection.js:160:21)
    at WebSocket.<anonymous> (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/node/NodeWebSocketTransport.js:37:32)

此错误可以在开始时出现,也可以在我的循环中出现任何URL。我认为撞车事故在浏览器反对时到达,但我不确定。

import puppeteer from 'puppeteer';

let urls = [
            "https://octopart.com/search?q=SI7020-A20-GM1&currency=USD&specs=0",
            "https://octopart.com/search?q=RN41N-I%2FRM&currency=USD&specs=0",
            "https://octopart.com/search?autosugg_idx=1&currency=USD&oq=adxl1004&q=adxl1004bcpz&specs=1",
            "https://octopart.com/search?q=SI7021-A20-GM1&currency=USD&specs=0"
          ];

(async () => {
  for (let i = 0; i < urls.length; i++) {
  let url = urls[i];
  let browser = await puppeteer.launch({headless: false});
  try {
      let page = await browser.newPage();
      await page.goto(url, { 
        waitUntil: 'networkidle0'
      });
      await page.waitForTimeout(2000);
      const buttons = await page.$$('button[class="jsx-3623225293"')
      for (let btn of buttons) {
        await btn.click()
      }
      await page.waitForTimeout(2000);
      await page.waitForSelector('tbody');
      await page.waitForTimeout(2000);
      let data = [];
      data = await page.evaluate(() => {
        let d = new Date();
        var date = d.getFullYear()+'-'+(d.getMonth()+1)+'-'+d.getDate();

        let root = Array.from(document.querySelectorAll("tbody > tr"));
        let components = root.map(component => ({
            distributor: component.querySelector("td:nth-child(2)").innerText,
            link: component.querySelector("td:nth-child(2) > div > a").href,
            stock: component.querySelector("td:nth-child(4)").innerText,
            price: (component.querySelector("td:nth-child(8)").innerText) ? component.querySelector("td:nth-child(8)").innerText : "missing",
            date: date,
            autorized: (component.querySelector("td:nth-child(1) > a") && component.querySelector("td:nth-child(1) > a").title) ? component.querySelector("td:nth-child(1) > a").title : "missing"
        }));
        return components;
      });
      console.log("data",data);
  } catch (error) {
      console.log(error);
  } finally {
      await browser.close();
    }
}})();

我尝试添加一些页面。Waitfortimeout(2000)或错误处理而没有成功。我是JavaScript和Web Crapping的新手,如果有人想知道此错误,那就太好了。

I try to do some web scrapping with puppeteer,my script work but sometimes, without reason in my comprehension, i get this error :

file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/assert.js:23
        throw new Error(message);
              ^
Error
    at assert (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/assert.js:23:15)
    at FrameManager._FrameManager_onFrameAttached (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/FrameManager.js:318:5)
    at file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/FrameManager.js:89:103
    at file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/vendor/mitt/src/index.js:49:68
    at Array.map (<anonymous>)
    at Object.emit (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/vendor/mitt/src/index.js:49:43)
    at CDPSession.emit (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/EventEmitter.js:66:22)  
    at CDPSession._onMessage (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/Connection.js:273:18)
    at Connection._Connection_onMessage (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/common/Connection.js:160:21)
    at WebSocket.<anonymous> (file:///C:/Users/aemba/OneDrive/Bureau/nodejs/octopart-scraping/node_modules/puppeteer/lib/esm/puppeteer/node/NodeWebSocketTransport.js:37:32)

This error can be appear at the start, or on any url in my for loop. I think the crash arrive during the browser oppening but i'm not sure.

import puppeteer from 'puppeteer';

let urls = [
            "https://octopart.com/search?q=SI7020-A20-GM1¤cy=USD&specs=0",
            "https://octopart.com/search?q=RN41N-I%2FRM¤cy=USD&specs=0",
            "https://octopart.com/search?autosugg_idx=1¤cy=USD&oq=adxl1004&q=adxl1004bcpz&specs=1",
            "https://octopart.com/search?q=SI7021-A20-GM1¤cy=USD&specs=0"
          ];

(async () => {
  for (let i = 0; i < urls.length; i++) {
  let url = urls[i];
  let browser = await puppeteer.launch({headless: false});
  try {
      let page = await browser.newPage();
      await page.goto(url, { 
        waitUntil: 'networkidle0'
      });
      await page.waitForTimeout(2000);
      const buttons = await page.$('button[class="jsx-3623225293"')
      for (let btn of buttons) {
        await btn.click()
      }
      await page.waitForTimeout(2000);
      await page.waitForSelector('tbody');
      await page.waitForTimeout(2000);
      let data = [];
      data = await page.evaluate(() => {
        let d = new Date();
        var date = d.getFullYear()+'-'+(d.getMonth()+1)+'-'+d.getDate();

        let root = Array.from(document.querySelectorAll("tbody > tr"));
        let components = root.map(component => ({
            distributor: component.querySelector("td:nth-child(2)").innerText,
            link: component.querySelector("td:nth-child(2) > div > a").href,
            stock: component.querySelector("td:nth-child(4)").innerText,
            price: (component.querySelector("td:nth-child(8)").innerText) ? component.querySelector("td:nth-child(8)").innerText : "missing",
            date: date,
            autorized: (component.querySelector("td:nth-child(1) > a") && component.querySelector("td:nth-child(1) > a").title) ? component.querySelector("td:nth-child(1) > a").title : "missing"
        }));
        return components;
      });
      console.log("data",data);
  } catch (error) {
      console.log(error);
  } finally {
      await browser.close();
    }
}})();

I try to add some page.waitForTimeout(2000) or error handling without success. I'm preety new to javascript and webscrapping, if someone have an idea for this error, it will be great.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

赴月观长安 2025-02-17 00:36:18

感谢Pranav Choudhary证实了我的印象。打开Chrome浏览器时显示代码错误。
我开始使用该选项对脚本进行编码:

let browser = await puppeteer.launch({headless: true});

但是在没有理解原因的情况下它不起作用。因此,我去了:

let browser = await puppeteer.launch({headless: false});

但是经过多次研究,网站可以阻止无头刮纸器的连接,但是您可以通过在浏览器后添加用户配置来绕过此阻塞。newpage:NewPage:

wait page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36");

使用此配置,我的脚本可以正常工作。谢谢大家所指出的建议和良好实践。

Thanks to Pranav Choudhary for confirming my impression. The code error is displayed when opening the Chrome browser.
I start coding my script with the option:

let browser = await puppeteer.launch({headless: true});

But it didn't work without me understanding why. So I went to:

let browser = await puppeteer.launch({headless: false});

But after several researches, websites can block connections from headless scrappers, but you can bypass this blocking by adding users configuration after browser.newPage :

wait page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36");

With this configuration, my script works fine. Thank you all for the advice and good practices that I have noted.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文