需要Wikipedia Web Scraper不断要求用户输入

发布于 2025-01-29 07:15:15 字数 857 浏览 1 评论 0原文

在执行和显示结果之后，我需要以下代码才能再次要求用户输入。我猜一段时间循环是最好的，但不确定如何做BeautifulSoup和请求使用的库。

任何帮助将不胜感激。

from bs4 import BeautifulSoup

user_input = input("Enter article:")

response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
soup = BeautifulSoup(response.text, "html.parser")

list = []
count = 0

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
               "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation", "Edit section"]

for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
    if count <= 10:
        title = tag.get("title", "")
        if not any(x in title for x in IGNORE) and title != "":
            count = count + 1
            print(title)
            list.append(title)
    else:
        break

原文

I need the below code to ask for user input again, after executing and showing results. I guess a while loop would be best but not sure how to do it as have BeautifulSoup and requests library in use.

Any help would be greatly appreciated.

from bs4 import BeautifulSoup

user_input = input("Enter article:")

response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
soup = BeautifulSoup(response.text, "html.parser")

list = []
count = 0

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
               "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation", "Edit section"]

for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
    if count <= 10:
        title = tag.get("title", "")
        if not any(x in title for x in IGNORE) and title != "":
            count = count + 1
            print(title)
            list.append(title)
    else:
        break

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜司空 2025-02-05 07:15:15

使用返回语句

example

import requests
from bs4 import BeautifulSoup

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
          "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation",
          "Edit section"]


def get_user_input():
    user_input = input("Enter article:")
    if len(str(user_input)) > 0:
        return get_response(user_input)
    else:
        return get_user_input()


def get_response(user_input):
    response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
    soup = BeautifulSoup(response.text, "html.parser")

    title_list = []
    count = 0

    for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
        if count <= 10:
            title = tag.get("title", "")
            if not any(x in title for x in IGNORE) and title != "":
                count = count + 1
                print(title)
                title_list.append(title)
                print(title_list)
        else:
            return get_user_input()


if __name__ == '__main__':
    get_user_input()

Use function with return statement

Example

import requests
from bs4 import BeautifulSoup

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
          "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation",
          "Edit section"]


def get_user_input():
    user_input = input("Enter article:")
    if len(str(user_input)) > 0:
        return get_response(user_input)
    else:
        return get_user_input()


def get_response(user_input):
    response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
    soup = BeautifulSoup(response.text, "html.parser")

    title_list = []
    count = 0

    for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
        if count <= 10:
            title = tag.get("title", "")
            if not any(x in title for x in IGNORE) and title != "":
                count = count + 1
                print(title)
                title_list.append(title)
                print(title_list)
        else:
            return get_user_input()


if __name__ == '__main__':
    get_user_input()

回复收藏 0 原文

~没有更多了~