从HTML代码中删除URL和描述

发布于 2025-02-07 12:09:34 字数 3480 浏览 2 评论 0原文

我是编码的新手,我正在努力从HTML代码的块返回多个URL和描述。我知道我必须以某种方式循环,但不知道如何循环。希望有人可以帮助我。我所拥有的是

function pageFunction(context) {
 
    const $ = context.jQuery;
    const venueName = $('div.collist a').first().text() + ' /';
    const venueURL = $('div.collist a[href]').attr('href');


    return {
        venueName,
        venueURL
    };
}

我要刮擦的HTML代码是

    </div>
    <div class="collist">
    <h3>Properties</h3>x
    <ul class="list-unstyled">
    <li>
    <a href="search?query=500" rel="nofollow" title="Venue1"><span>London, UK</span></a> (<span>810</span>)
    </li><li>
    <a href="search?query=600" rel="nofollow" title="Venue2"><span>Pretoria, South Africa</span></a> (<span>820</span>)
    </li><li>
    <a href="search?query=700" rel="nofollow" title="Venue3"><span>New York, USA</span></a> (<span>830</span>)
    </li><li>
    <a href="search?query=800" rel="nofollow" title="Venue4"><span>Paris, France</span></a> (<span>840</span>)
    </li><li>
    <a href="search?query=900" rel="nofollow" title="Venue5"><span>Denver, USA</span></a> (<span>850</span>)
    </li><li>
    <a href="search?query=1000" rel="nofollow" title="Venue6"><span>Deli, India</span></a> (<span>860</span>)
    </li><li>
    <a href="search?query=1100" rel="nofollow" title="Venue7"><span>Lisbon, Protugal</span></a> (<span>870</span>)
    </li><li>
    <a href="search?query=1200" rel="nofollow" title="Venue8"><span>Madrid, Spain/span></a> (<span>880</span>)
    </li><li>
    <a href="search?query=1300" rel="nofollow" title="Venue9><span>Berlin, Germany</span></a> (<span>890</span>)
    </li><li>
    <a href="search?query=1400" rel="nofollow" title="Venue10"><span>Stockholm, Sweden</span></a> (<span>900</span>)
    </li>
    </ul>

我当前的结果是,

    [{
      "venueName": "London, UK /",
      "venueURL": "search?query=500"
    }]

但是我想看到的是

    [{
      "venueName": "London, UK /","venueURL": "search?query=500",
    },{
      "venueName": "Pretoria, South Africa /","venueURL": "search?query=600",
    },{
      "venueName": "New York, USA /","venueURL": "search?query=700",
    },{
      "venueName": "Paris, France /","venueURL": "search?query=800",
    },{
      "venueName": "Denver, USA / ","venueURL": "search?query=900",
    }]

我尝试放入.east()

function pageFunction(context) {
    
    const $ = context.jQuery;
     const venueURL = $('div.collist a[href]').each(function(index, value){console.log(this.href);})
    
    
    return {
        venueURL
    };
}

时,当我在浏览器控制台中运行时,似乎有效,但是一旦我在应用程序中运行它,我会收到以下错误

错误puppeteercrawler:handlesequestfunction失败 返回列表或队列

失败的请求

错误:评估失败:连击:最大呼叫堆栈大小超过

I'm quite new to coding and i'm struggling with returning multiple URLs and Descriptions from a block of HTML code. I know i have to loop it somehow but don't know how. Hope someone can help me please. What i have is

function pageFunction(context) {
 
    const $ = context.jQuery;
    const venueName = $('div.collist a').first().text() + ' /';
    const venueURL = $('div.collist a[href]').attr('href');


    return {
        venueName,
        venueURL
    };
}

The HTML code i'm trying to scrape is

    </div>
    <div class="collist">
    <h3>Properties</h3>x
    <ul class="list-unstyled">
    <li>
    <a href="search?query=500" rel="nofollow" title="Venue1"><span>London, UK</span></a> (<span>810</span>)
    </li><li>
    <a href="search?query=600" rel="nofollow" title="Venue2"><span>Pretoria, South Africa</span></a> (<span>820</span>)
    </li><li>
    <a href="search?query=700" rel="nofollow" title="Venue3"><span>New York, USA</span></a> (<span>830</span>)
    </li><li>
    <a href="search?query=800" rel="nofollow" title="Venue4"><span>Paris, France</span></a> (<span>840</span>)
    </li><li>
    <a href="search?query=900" rel="nofollow" title="Venue5"><span>Denver, USA</span></a> (<span>850</span>)
    </li><li>
    <a href="search?query=1000" rel="nofollow" title="Venue6"><span>Deli, India</span></a> (<span>860</span>)
    </li><li>
    <a href="search?query=1100" rel="nofollow" title="Venue7"><span>Lisbon, Protugal</span></a> (<span>870</span>)
    </li><li>
    <a href="search?query=1200" rel="nofollow" title="Venue8"><span>Madrid, Spain/span></a> (<span>880</span>)
    </li><li>
    <a href="search?query=1300" rel="nofollow" title="Venue9><span>Berlin, Germany</span></a> (<span>890</span>)
    </li><li>
    <a href="search?query=1400" rel="nofollow" title="Venue10"><span>Stockholm, Sweden</span></a> (<span>900</span>)
    </li>
    </ul>

My current results are

    [{
      "venueName": "London, UK /",
      "venueURL": "search?query=500"
    }]

But what i would like to see

    [{
      "venueName": "London, UK /","venueURL": "search?query=500",
    },{
      "venueName": "Pretoria, South Africa /","venueURL": "search?query=600",
    },{
      "venueName": "New York, USA /","venueURL": "search?query=700",
    },{
      "venueName": "Paris, France /","venueURL": "search?query=800",
    },{
      "venueName": "Denver, USA / ","venueURL": "search?query=900",
    }]

I have tried to put in .each()

function pageFunction(context) {
    
    const $ = context.jQuery;
     const venueURL = $('div.collist a[href]').each(function(index, value){console.log(this.href);})
    
    
    return {
        venueURL
    };
}

and when i run it in the browser console is seems to work, but as soon as i run it in the app i get the following error

ERROR PuppeteerCrawler: handleRequestFunction failed, reclaiming
failed request back to the list or queue

Error: Evaluation failed: RangeError: Maximum call stack size exceeded

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

陌上青苔 2025-02-14 12:09:34

当您尝试从页面函数返回不可分解的对象时,您会收到该错误。意味着,由于它是jQuery对象,并且json.stringify失败,因此无法将每个() 失败。您需要为每个结果返回一系列对象,以成为这样的单独的数据集项目:

function pageFunction(context) {
 
    const $ = context.jQuery;
    const results = [];
    
    // treat the parent as the unique container of data
    $('.collist li').each((_, li) => {
      const $li = $(li);
      // find the data individually per found LI element 
      const $link = $li.find('a').first();
      // using text() on an element that have children can include a lot of whitespace, narrow down to `span`
      const venueName = $link.find('span').first().text() + ' /';
      const venueURL = $link.attr('href');

      results.push({
        venueName,
        venueURL
      });
    });

    return results;
}

When you try to return a non-stringifiable object from the page function, you get that error. Means that trying to get each() to the dataset is not possible, since it's a jQuery object and JSON.stringify fails. You need to return an array of objects for each one of your results to become a separate dataset item like this:

function pageFunction(context) {
 
    const $ = context.jQuery;
    const results = [];
    
    // treat the parent as the unique container of data
    $('.collist li').each((_, li) => {
      const $li = $(li);
      // find the data individually per found LI element 
      const $link = $li.find('a').first();
      // using text() on an element that have children can include a lot of whitespace, narrow down to `span`
      const venueName = $link.find('span').first().text() + ' /';
      const venueURL = $link.attr('href');

      results.push({
        venueName,
        venueURL
      });
    });

    return results;
}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文