可以与Cheerio一起刮擦文字
我正在尝试使用Cheerio https://en.dict.naver.com/#/search?query=%EC%B6%94%9B%9B%9B%9B%EC%9A%9A%94&range=all=all 但是我什么也没得到。我试图获得“单词idiom”文本,但我一无所获。
这是我的代码
app.get("/conjugation", (req, res) => {
axios(
"https://en.dict.naver.com/#/search?query=%EC%B6%94%EC%9B%8C%EC%9A%94&range=all"
)
.then((response) => {
const htmlData = response.data;
const $ = cheerio.load(htmlData);
const element = $(
"#searchPage_entry > h3 > span.title_text.myScrollNavQuick.my_searchPage"
);
console.log(element.text());
})
.catch((err) => console.log(err));
});
i'm trying to scrape this page with cheerio https://en.dict.naver.com/#/search?query=%EC%B6%94%EC%9B%8C%EC%9A%94&range=all
But i can't get anything. I tried to get that 'Word-Idiom' text but i get nothing as response.
Here's my code
app.get("/conjugation", (req, res) => {
axios(
"https://en.dict.naver.com/#/search?query=%EC%B6%94%EC%9B%8C%EC%9A%94&range=all"
)
.then((response) => {
const htmlData = response.data;
const $ = cheerio.load(htmlData);
const element = $(
"#searchPage_entry > h3 > span.title_text.myScrollNavQuick.my_searchPage"
);
console.log(element.text());
})
.catch((err) => console.log(err));
});
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
该URL处的服务器不会返回HTML响应中的任何车身DOM结构。接收响应后,通过链接的JavaScript渲染身体DOM。 Cheerio在HTML响应中没有执行JavaScript,因此不可能使用Cheerio 来表现该页面。取而代之的是,您需要使用另一种可以执行第页的JavaScript的方法(例如 puppeteer )。
The server at that URL doesn't return any body DOM structure in the HTML response. The body DOM is rendered by linked JavaScript after the response is received. Cheerio doesn't execute the JavaScript in the HTML response, so it won't be possible to scape that page using Cheerio. Instead, you'll need to use another method which can execute the in-page JavaScript (e.g. Puppeteer).
这是网络刮擦时的一个常见问题,页面会动态加载,这就是为什么当您从该网站获取初始获取响应的内容时,您所获得的只是脚本标签,打印
htmldata
您可以明白我的意思。响应中没有加载的html元素,您要做的就是使用 selenium 等待您需要渲染的元素。This is a common issue while web scraping, the page loads dynamically, that's why when you fetch the content of the initial get response from that website, all you're getting is script tags, print the
htmlData
so you can see what I mean. There are no loaded html elements in your response, what you'll have to do is use something like selenium to wait for the elements that you're requiring to get rendered.