需要Wikipedia Web Scraper不断要求用户输入

发布于 2025-01-29 07:15:15 字数 857 浏览 1 评论 0原文

在执行和显示结果之后,我需要以下代码才能再次要求用户输入。我猜一段时间循环是最好的,但不确定如何做BeautifulSoup和请求使用的库。

任何帮助将不胜感激。

from bs4 import BeautifulSoup

user_input = input("Enter article:")

response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
soup = BeautifulSoup(response.text, "html.parser")

list = []
count = 0

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
               "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation", "Edit section"]

for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
    if count <= 10:
        title = tag.get("title", "")
        if not any(x in title for x in IGNORE) and title != "":
            count = count + 1
            print(title)
            list.append(title)
    else:
        break

I need the below code to ask for user input again, after executing and showing results. I guess a while loop would be best but not sure how to do it as have BeautifulSoup and requests library in use.

Any help would be greatly appreciated.

from bs4 import BeautifulSoup

user_input = input("Enter article:")

response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
soup = BeautifulSoup(response.text, "html.parser")

list = []
count = 0

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
               "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation", "Edit section"]

for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
    if count <= 10:
        title = tag.get("title", "")
        if not any(x in title for x in IGNORE) and title != "":
            count = count + 1
            print(title)
            list.append(title)
    else:
        break

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

夜司空 2025-02-05 07:15:15

使用返回语句

example

import requests
from bs4 import BeautifulSoup

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
          "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation",
          "Edit section"]


def get_user_input():
    user_input = input("Enter article:")
    if len(str(user_input)) > 0:
        return get_response(user_input)
    else:
        return get_user_input()


def get_response(user_input):
    response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
    soup = BeautifulSoup(response.text, "html.parser")

    title_list = []
    count = 0

    for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
        if count <= 10:
            title = tag.get("title", "")
            if not any(x in title for x in IGNORE) and title != "":
                count = count + 1
                print(title)
                title_list.append(title)
                print(title_list)
        else:
            return get_user_input()


if __name__ == '__main__':
    get_user_input()

Use function with return statement

Example

import requests
from bs4 import BeautifulSoup

IGNORE = ["Wikipedia:", "Category:", "Template:", "Template talk:", "User:",
          "User talk:", "Module:", "Help:", "File:", "Portal:", "#", "About this", ".ogg", "disambiguation",
          "Edit section"]


def get_user_input():
    user_input = input("Enter article:")
    if len(str(user_input)) > 0:
        return get_response(user_input)
    else:
        return get_user_input()


def get_response(user_input):
    response = requests.get("https://en.wikipedia.org/wiki/" + user_input)
    soup = BeautifulSoup(response.text, "html.parser")

    title_list = []
    count = 0

    for tag in soup.select('div.mw-parser-output a:not(.infobox  a)'):
        if count <= 10:
            title = tag.get("title", "")
            if not any(x in title for x in IGNORE) and title != "":
                count = count + 1
                print(title)
                title_list.append(title)
                print(title_list)
        else:
            return get_user_input()


if __name__ == '__main__':
    get_user_input()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文