从HTML代码中删除URL和描述
我是编码的新手,我正在努力从HTML代码的块返回多个URL和描述。我知道我必须以某种方式循环,但不知道如何循环。希望有人可以帮助我。我所拥有的是
function pageFunction(context) {
const $ = context.jQuery;
const venueName = $('div.collist a').first().text() + ' /';
const venueURL = $('div.collist a[href]').attr('href');
return {
venueName,
venueURL
};
}
我要刮擦的HTML代码是
</div>
<div class="collist">
<h3>Properties</h3>x
<ul class="list-unstyled">
<li>
<a href="search?query=500" rel="nofollow" title="Venue1"><span>London, UK</span></a> (<span>810</span>)
</li><li>
<a href="search?query=600" rel="nofollow" title="Venue2"><span>Pretoria, South Africa</span></a> (<span>820</span>)
</li><li>
<a href="search?query=700" rel="nofollow" title="Venue3"><span>New York, USA</span></a> (<span>830</span>)
</li><li>
<a href="search?query=800" rel="nofollow" title="Venue4"><span>Paris, France</span></a> (<span>840</span>)
</li><li>
<a href="search?query=900" rel="nofollow" title="Venue5"><span>Denver, USA</span></a> (<span>850</span>)
</li><li>
<a href="search?query=1000" rel="nofollow" title="Venue6"><span>Deli, India</span></a> (<span>860</span>)
</li><li>
<a href="search?query=1100" rel="nofollow" title="Venue7"><span>Lisbon, Protugal</span></a> (<span>870</span>)
</li><li>
<a href="search?query=1200" rel="nofollow" title="Venue8"><span>Madrid, Spain/span></a> (<span>880</span>)
</li><li>
<a href="search?query=1300" rel="nofollow" title="Venue9><span>Berlin, Germany</span></a> (<span>890</span>)
</li><li>
<a href="search?query=1400" rel="nofollow" title="Venue10"><span>Stockholm, Sweden</span></a> (<span>900</span>)
</li>
</ul>
我当前的结果是,
[{
"venueName": "London, UK /",
"venueURL": "search?query=500"
}]
但是我想看到的是
[{
"venueName": "London, UK /","venueURL": "search?query=500",
},{
"venueName": "Pretoria, South Africa /","venueURL": "search?query=600",
},{
"venueName": "New York, USA /","venueURL": "search?query=700",
},{
"venueName": "Paris, France /","venueURL": "search?query=800",
},{
"venueName": "Denver, USA / ","venueURL": "search?query=900",
}]
我尝试放入.east()
function pageFunction(context) {
const $ = context.jQuery;
const venueURL = $('div.collist a[href]').each(function(index, value){console.log(this.href);})
return {
venueURL
};
}
时,当我在浏览器控制台中运行时,似乎有效,但是一旦我在应用程序中运行它,我会收到以下错误
错误puppeteercrawler:handlesequestfunction失败 返回列表或队列
失败的请求错误:评估失败:连击:最大呼叫堆栈大小超过
I'm quite new to coding and i'm struggling with returning multiple URLs and Descriptions from a block of HTML code. I know i have to loop it somehow but don't know how. Hope someone can help me please. What i have is
function pageFunction(context) {
const $ = context.jQuery;
const venueName = $('div.collist a').first().text() + ' /';
const venueURL = $('div.collist a[href]').attr('href');
return {
venueName,
venueURL
};
}
The HTML code i'm trying to scrape is
</div>
<div class="collist">
<h3>Properties</h3>x
<ul class="list-unstyled">
<li>
<a href="search?query=500" rel="nofollow" title="Venue1"><span>London, UK</span></a> (<span>810</span>)
</li><li>
<a href="search?query=600" rel="nofollow" title="Venue2"><span>Pretoria, South Africa</span></a> (<span>820</span>)
</li><li>
<a href="search?query=700" rel="nofollow" title="Venue3"><span>New York, USA</span></a> (<span>830</span>)
</li><li>
<a href="search?query=800" rel="nofollow" title="Venue4"><span>Paris, France</span></a> (<span>840</span>)
</li><li>
<a href="search?query=900" rel="nofollow" title="Venue5"><span>Denver, USA</span></a> (<span>850</span>)
</li><li>
<a href="search?query=1000" rel="nofollow" title="Venue6"><span>Deli, India</span></a> (<span>860</span>)
</li><li>
<a href="search?query=1100" rel="nofollow" title="Venue7"><span>Lisbon, Protugal</span></a> (<span>870</span>)
</li><li>
<a href="search?query=1200" rel="nofollow" title="Venue8"><span>Madrid, Spain/span></a> (<span>880</span>)
</li><li>
<a href="search?query=1300" rel="nofollow" title="Venue9><span>Berlin, Germany</span></a> (<span>890</span>)
</li><li>
<a href="search?query=1400" rel="nofollow" title="Venue10"><span>Stockholm, Sweden</span></a> (<span>900</span>)
</li>
</ul>
My current results are
[{
"venueName": "London, UK /",
"venueURL": "search?query=500"
}]
But what i would like to see
[{
"venueName": "London, UK /","venueURL": "search?query=500",
},{
"venueName": "Pretoria, South Africa /","venueURL": "search?query=600",
},{
"venueName": "New York, USA /","venueURL": "search?query=700",
},{
"venueName": "Paris, France /","venueURL": "search?query=800",
},{
"venueName": "Denver, USA / ","venueURL": "search?query=900",
}]
I have tried to put in .each()
function pageFunction(context) {
const $ = context.jQuery;
const venueURL = $('div.collist a[href]').each(function(index, value){console.log(this.href);})
return {
venueURL
};
}
and when i run it in the browser console is seems to work, but as soon as i run it in the app i get the following error
ERROR PuppeteerCrawler: handleRequestFunction failed, reclaiming
failed request back to the list or queueError: Evaluation failed: RangeError: Maximum call stack size exceeded
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
当您尝试从页面函数返回不可分解的对象时,您会收到该错误。意味着,由于它是jQuery对象,并且
json.stringify
失败,因此无法将每个() 失败。您需要为每个结果返回一系列对象,以成为这样的单独的数据集项目:When you try to return a non-stringifiable object from the page function, you get that error. Means that trying to get
each()
to the dataset is not possible, since it's a jQuery object andJSON.stringify
fails. You need to return an array of objects for each one of your results to become a separate dataset item like this: