我需要循环在异步函数上
因此,我正在构建一个网络刮板,我需要(想要)循环浏览我的Axios请求。
大多数代码都在这里...其余的是返回零件的数据和函数。
当我运行此功能时,它没有等待任何东西,它只是飞过它。我要去哪里?
不知道您是否需要此信息,但是页面检查器似乎认为您这样做了。。。。。。。。。。。。。。。。。。。。。。。。。。。
我 此时的代码尚未完成,但要旨就是这样。我有一些代理。如果我获得验证码页面,我会更改代理。如果我获得搜索结果,我将解析他们想要的数据。如果有下一个链接,我转到下一页。如果一切都是正确的,我应该立即通过Google进行拉链。
while( (task == "new") || (task == "next") || (task=="proxy"))
{
switch(task) {
case "new":
url = getNextUrl(1);
if(url == "end") {
console.log("Search completed!");
break;
}
break;
case "next":
url = getNextUrl(pageNumber)
break;
case "proxy":
break;
}
proxy = getNextProxy();
(async () => {
let proxy = getNextProxy().toString().split(":");
console.log(proxy);
console.log(url);
await axios.get(url,{
headers: {
'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.66 Safari/537.36"
},
proxy: false,
httpsAgent: new HttpsProxyAgent.HttpsProxyAgent(`http://${proxy[0]}:${proxy[1]}`)
},
{timeout:5000})
.then((response) =>{
const $ = cheerio.load(response.data);
let captcha = $("#captcha-form");
if(captcha.length > 0)
{
task = "proxy";
return;
}
let searchitems = $(".jtfYYd");
for(let i=0; i < searchitems.length; i++)
{
let element = searchitems[i];
let c = $(element).attr("class");
let link = $(element).find(".yuRUbf").find("a").attr('href');
let title = $(element).find("h3").text();
let details = $(element).find(".MUxGbd.wuQ4Ob.WZ8Tjf").find("span");
let stub = $(element).find(".VwiC3b.yXK7lf.MUxGbd.yDYNvb.lyLwlc.lEBKkf").find("span").text();
console.log();
console.log("Title: " + title);
console.log("Link: " + link);
console.log("Details: " + details);
console.log("Stub: " + stub);
}
let botstuff = $("#botstuff");
console.log("botstuff: " + botstuff.text());
if(botstuff.text().indexOf("Next") > 1)
{
task = "next";
}
else
{
if(searchitems.length == 0) {
task = "proxy";
}
else
{
task="new"
}
}
//botstuff: Page Navigation123Next
})
.catch((error) => {
task = "proxy";
//console.log(error);
});
})();
}
So I'm building a web scraper, and I need (want) to loop over my axios request.
most of the code is here... the rest is data and functions to return pieces.
When I run this, it's not waiting for anything, it just flies over it. Where am I going wrong?
I dont' know if you need this info, but the page checker seems to think you do, so....
I'm trying to scrape google for LinkedIn pages that contain a plainly visible email address. The code at this point is unfinished, but the gist is this. I've got some proxies. If I get a captcha page, I change proxies. If I get search results, I parse them for the data I want. If there is a next link, I go to the next page. If everything is right, I should be zipping through google in no time.
while( (task == "new") || (task == "next") || (task=="proxy"))
{
switch(task) {
case "new":
url = getNextUrl(1);
if(url == "end") {
console.log("Search completed!");
break;
}
break;
case "next":
url = getNextUrl(pageNumber)
break;
case "proxy":
break;
}
proxy = getNextProxy();
(async () => {
let proxy = getNextProxy().toString().split(":");
console.log(proxy);
console.log(url);
await axios.get(url,{
headers: {
'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.66 Safari/537.36"
},
proxy: false,
httpsAgent: new HttpsProxyAgent.HttpsProxyAgent(`http://${proxy[0]}:${proxy[1]}`)
},
{timeout:5000})
.then((response) =>{
const $ = cheerio.load(response.data);
let captcha = $("#captcha-form");
if(captcha.length > 0)
{
task = "proxy";
return;
}
let searchitems = $(".jtfYYd");
for(let i=0; i < searchitems.length; i++)
{
let element = searchitems[i];
let c = $(element).attr("class");
let link = $(element).find(".yuRUbf").find("a").attr('href');
let title = $(element).find("h3").text();
let details = $(element).find(".MUxGbd.wuQ4Ob.WZ8Tjf").find("span");
let stub = $(element).find(".VwiC3b.yXK7lf.MUxGbd.yDYNvb.lyLwlc.lEBKkf").find("span").text();
console.log();
console.log("Title: " + title);
console.log("Link: " + link);
console.log("Details: " + details);
console.log("Stub: " + stub);
}
let botstuff = $("#botstuff");
console.log("botstuff: " + botstuff.text());
if(botstuff.text().indexOf("Next") > 1)
{
task = "next";
}
else
{
if(searchitems.length == 0) {
task = "proxy";
}
else
{
task="new"
}
}
//botstuff: Page Navigation123Next
})
.catch((error) => {
task = "proxy";
//console.log(error);
});
})();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
<代码> 循环同步执行。
异步IIFE函数同步返回函数主体内获得的结果(通常在以后的时间异步)的承诺。因此
,(条件性主张){
同步评估iife哪个值是一个承诺对象。代码中未使用承诺对象。循环继续进行,而无需等待对正在测试的变量进行任何更改,这很可能导致无限循环。
这个问题的解决方案将涉及从
周围删除异步IIFE(...)
等待
操作员。这也将允许编码然后/catch
Promise链,请使用等待
运算符。对于那些不熟悉异步JavaScript的人,值得注意的是,JS没有睡眠功能,并且您无法从同步代码中获得异步结果。请参阅如何从异步呼叫中返回响应? 和 async/等待隐式返回承诺吗?以获取更多信息。
The
while
loop executes synchronously.The async IIFE function synchronously returns a promise for results obtained (typically asynchronously at a later time) within the function body. Hence
while( conditional-expresion) {
synchronously evaluates the IIFE which value is a promise object. The promise object is not used in the code. Looping continues without waiting for any changes to be made to the variables being tested, resulting most likely in an infinite loop.
The solution for this question would involve removing the async IIFE from around the
await axios(...)
statement and placing thewhile
loop in an asynchronous function to allow the use of theawait
operator. This would also allow coding thethen/catch
promise chain usingawait
operators if you wish.For those unfamiliar with asynchronous JavaScript it's worth noting that JS does not have a sleep function and you can't obtain asynchronous results inline from synchronous code. See How do I return the response from an asynchronous call? and async/await implicitly returns promise? for more information.