如何用Unirest和Cheerio刮擦Google图像？

发布于 2025-02-06 12:49:09 字数 999 浏览 3 评论 0原文

我正在尝试使用Unirest和Cheerio刮擦Google图像，但是当我发现解析无法正确进行时，我就卡住了。这是我目前的代码：

const unirest = require("unirest");
const cheerio = require("cheerio");


const getData = async() => {
    let count= [] , page_url = [];
    let url =
    "https://www.google.com/search?q=india&oq=india&tbm=isch&asearch=ichunk&async=_id:rg_s,_pms:s,_fmt:pc&sourceid=chrome&ie=UTF-8";
const response = await unirest
.get(
    url
)
.headers({
  "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
})
.proxy(
  "proxy"
);

const $ = cheerio.load(response.body)
console.log(response.body)//html file returned successsfully
let title = [] , link = [];
$(".vbC6V").each((i,el) => {
title[i] = $(el).find(".iKjWAf .mVDMnf").text()//not parsing
link[i] = $(el).find(".rg_l .rg_ic").attr("src")//not parsing
})
console.log(title)//returned empty
console.log(link)//returned empty
}

getData();

原文

I am trying to scrape google images by using unirest and cheerio, but I got stuck when I found that parsing was not happening correctly.
This is my code currently :

const unirest = require("unirest");
const cheerio = require("cheerio");


const getData = async() => {
    let count= [] , page_url = [];
    let url =
    "https://www.google.com/search?q=india&oq=india&tbm=isch&asearch=ichunk&async=_id:rg_s,_pms:s,_fmt:pc&sourceid=chrome&ie=UTF-8";
const response = await unirest
.get(
    url
)
.headers({
  "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.54 Safari/537.36",
})
.proxy(
  "proxy"
);

const $ = cheerio.load(response.body)
console.log(response.body)//html file returned successsfully
let title = [] , link = [];
$(".vbC6V").each((i,el) => {
title[i] = $(el).find(".iKjWAf .mVDMnf").text()//not parsing
link[i] = $(el).find(".rg_l .rg_ic").attr("src")//not parsing
})
console.log(title)//returned empty
console.log(link)//returned empty
}

getData();

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鲜肉鲜肉永远不皱 2025-02-13 12:49:10

是的，我发现用于解析的父类是rg_bx而不是vbc6v。因此，更新的代码将是：

$(".rg_bx").each((i,el) => {
title[i] = $(el).find(".iKjWAf .mVDMnf").text()
link[i] = $(el).find(".rg_l .rg_ic").attr("src")
})

So yeah I found out that the parent class for parsing will be rg_bx and not vbC6V. So the updated code will be :

$(".rg_bx").each((i,el) => {
title[i] = $(el).find(".iKjWAf .mVDMnf").text()
link[i] = $(el).find(".rg_l .rg_ic").attr("src")
})

回复收藏 0 原文

~没有更多了~

关于作者

ぺ禁宫浮华殁

暂无简介

文章

27 人气

关注发私信

Mr.HU

文章 0 评论 0

关注

疯到世界奔溃

文章 0 评论 0

关注

隔纱相望

文章 0 评论 0

关注

萌无敌

文章 0 评论 0

关注

梦幻的味道

文章 0 评论 0

关注

自在安然

文章 0 评论 0

友情链接

文江博客

如何用Unirest和Cheerio刮擦Google图像？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Mr.HU

疯到世界奔溃

隔纱相望

萌无敌

梦幻的味道

自在安然

友情链接

如何用Unirest和Cheerio刮擦Google图像？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

Mr.HU

疯到世界奔溃

隔纱相望

萌无敌

梦幻的味道

自在安然

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。