如何在我的分析结果以及其他被报废的数据中添加URL

发布于 2025-02-03 09:06:14 字数 831 浏览 0 评论 0原文

我想一次刮擦许多网站。因此,我希望将URL与被废弃的其他数据一起写入结果。但是我不知道如何。

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://www.amazon.com/')


 await page.waitForTimeout( 10000 );
const localStorageData = await page.evaluate(() => {
let json = {};
for (let i = 0; i < localStorage.length; i++) {
  const key = localStorage.key(i);
  json[key] = localStorage.getItem(key);
}
return json;
});
const data = {};
for (let entry of Object.entries(data)) {
data[entry.key] = entry.value;
}
console.log(localStorageData)

await browser.close()
})()

I want to scrape many websites at once. So, I would prefer to have URL written in the result alongside with the other data that get scrapped. But I don't know how. enter image description here

const puppeteer = require('puppeteer');

(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://www.amazon.com/')


 await page.waitForTimeout( 10000 );
const localStorageData = await page.evaluate(() => {
let json = {};
for (let i = 0; i < localStorage.length; i++) {
  const key = localStorage.key(i);
  json[key] = localStorage.getItem(key);
}
return json;
});
const data = {};
for (let entry of Object.entries(data)) {
data[entry.key] = entry.value;
}
console.log(localStorageData)

await browser.close()
})()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

嗫嚅 2025-02-10 09:06:14

您可以将使用的URL添加到JSON:

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const url = "https://www.amazon.com/";
  await page.goto(url);

  await page.waitForTimeout(10000);
  const localStorageData = await page.evaluate((url) => {
    let json = {};
    for (let i = 0; i < localStorage.length; i++) {
      const key = localStorage.key(i);
      json[key] = localStorage.getItem(key);
    }
    return json;
  });
  localStorageData["url"] = url;
  const data = {};
  for (let entry of Object.entries(data)) {
    data[entry.key] = entry.value;
  }
  console.log(localStorageData);

  await browser.close();
})();


You can simply add the URL you use to your JSON:

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  const url = "https://www.amazon.com/";
  await page.goto(url);

  await page.waitForTimeout(10000);
  const localStorageData = await page.evaluate((url) => {
    let json = {};
    for (let i = 0; i < localStorage.length; i++) {
      const key = localStorage.key(i);
      json[key] = localStorage.getItem(key);
    }
    return json;
  });
  localStorageData["url"] = url;
  const data = {};
  for (let entry of Object.entries(data)) {
    data[entry.key] = entry.value;
  }
  console.log(localStorageData);

  await browser.close();
})();


~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文