限制结果，从函数内的循环到10个结果-Python

发布于 2025-01-27 10:30:09 字数 974 浏览 1 评论 0原文

我有以下代码，该代码遵循随机的Wikipedia链接和文章标题。我试图将其限制为10个结果，而不是无限的结果，但我发现很难做到。有人可以帮忙吗？

import requests
from bs4 import BeautifulSoup
import random

def scrape_wiki_article(article_url):
    response = requests.get(url=article_url,)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    title = soup.find(id="firstHeading")
    print(title.text)
    
    #Get all the links
    allLinks = soup.find(id="bodyContent").find_all("a")
    random.shuffle(allLinks)
    linkToScrape = 0
    
    for link in allLinks:
        #We are only interested in other wiki articles so look for /wiki/ prefix
        if link['href'].find("/wiki/") == -1: # -1 is returned by .find if substring is not found 
            continue
        
        #Use this link to scrape
        linkToScrape = link
        break
    
    scrape_wiki_article("https://en.wikipedia.org" + linkToScrape['href'])
 
scrape_wiki_article("https://en.wikipedia.org/wiki/Web_scraping")

原文

I have the below code which follows random Wikipedia links and prints title of articles. I am trying to limit it to 10 results and not infinite results, but I am finding it difficult to do. Can anybody help please?

import requests
from bs4 import BeautifulSoup
import random

def scrape_wiki_article(article_url):
    response = requests.get(url=article_url,)
    soup = BeautifulSoup(response.content, 'html.parser')
    
    title = soup.find(id="firstHeading")
    print(title.text)
    
    #Get all the links
    allLinks = soup.find(id="bodyContent").find_all("a")
    random.shuffle(allLinks)
    linkToScrape = 0
    
    for link in allLinks:
        #We are only interested in other wiki articles so look for /wiki/ prefix
        if link['href'].find("/wiki/") == -1: # -1 is returned by .find if substring is not found 
            continue
        
        #Use this link to scrape
        linkToScrape = link
        break
    
    scrape_wiki_article("https://en.wikipedia.org" + linkToScrape['href'])
 
scrape_wiki_article("https://en.wikipedia.org/wiki/Web_scraping")

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦毁影碎の 2025-02-03 10:30:09

您可以首先将所有链接列表过滤到仅包含 /Wiki前缀的链接。一旦这样做，您就可以通过这样的方式来截断列表，

allLinks = allLinks[:10]

您最多可以搜索10个Wiki链接。

You can start by filtering the all links list to only include links with the /wiki prefix. Once you do that, you can truncate the list by doing something like