限制结果,从函数内的循环到10个结果-Python

发布于 2025-01-27 10:30:09 字数 974 浏览 1 评论 0原文

我有以下代码,该代码遵循随机的Wikipedia链接和文章标题。我试图将其限制为10个结果,而不是无限的结果,但我发现很难做到。有人可以帮忙吗?

import requests
from bs4 import BeautifulSoup
import random

def scrape_wiki_article(article_url):
    response = requests.get(url=article_url,)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    title = soup.find(id="firstHeading")
    print(title.text)
    
    #Get all the links
    allLinks = soup.find(id="bodyContent").find_all("a")
    random.shuffle(allLinks)
    linkToScrape = 0
    
    for link in allLinks:
        #We are only interested in other wiki articles so look for /wiki/ prefix
        if link['href'].find("/wiki/") == -1: # -1 is returned by .find if substring is not found 
            continue
        
        #Use this link to scrape
        linkToScrape = link
        break
    
    scrape_wiki_article("https://en.wikipedia.org" + linkToScrape['href'])
 
scrape_wiki_article("https://en.wikipedia.org/wiki/Web_scraping")

I have the below code which follows random Wikipedia links and prints title of articles. I am trying to limit it to 10 results and not infinite results, but I am finding it difficult to do. Can anybody help please?

import requests
from bs4 import BeautifulSoup
import random

def scrape_wiki_article(article_url):
    response = requests.get(url=article_url,)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    title = soup.find(id="firstHeading")
    print(title.text)
    
    #Get all the links
    allLinks = soup.find(id="bodyContent").find_all("a")
    random.shuffle(allLinks)
    linkToScrape = 0
    
    for link in allLinks:
        #We are only interested in other wiki articles so look for /wiki/ prefix
        if link['href'].find("/wiki/") == -1: # -1 is returned by .find if substring is not found 
            continue
        
        #Use this link to scrape
        linkToScrape = link
        break
    
    scrape_wiki_article("https://en.wikipedia.org" + linkToScrape['href'])
 
scrape_wiki_article("https://en.wikipedia.org/wiki/Web_scraping")

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦毁影碎の 2025-02-03 10:30:09

您可以首先将所有链接列表过滤到仅包含 /Wiki前缀的链接。一旦这样做,您就可以通过这样的方式来截断列表,

allLinks = allLinks[:10]

您最多可以搜索10个Wiki链接。

You can start by filtering the all links list to only include links with the /wiki prefix. Once you do that, you can truncate the list by doing something like

allLinks = allLinks[:10]

This way you would search up to 10 Wiki links.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文