问题通过网站循环

发布于 2025-02-12 03:40:07 字数 452 浏览 1 评论 0原文

试图循环浏览一个网站的多个页面。我可以刮擦一页,但是我不知道如何用列表中的项目替换URL中的括号中的内容。

URL = "https://www.samplebooks.com/&s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]
Statement = []
html_text = requests.get(Statement).text
Soup = BeautifulSoup(html_text, "lxml")
Books = Soup.find_all("tr")
for output in BList:
    Statement.append(URL.format(output))
    for things in Books:
        print((things.text))

Trying to loop through multiple pages of one website. I can scrape one page, but i can't figure out how to replace whats in the brackets in the URL with the items in the list.

URL = "https://www.samplebooks.com/&s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]
Statement = []
html_text = requests.get(Statement).text
Soup = BeautifulSoup(html_text, "lxml")
Books = Soup.find_all("tr")
for output in BList:
    Statement.append(URL.format(output))
    for things in Books:
        print((things.text))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

嘿看小鸭子会跑 2025-02-19 03:40:07

看来您正在正确创建语句数组,但是填充后您没有使用数组。

这可能就是您要寻找的

for output in BList:

    # add the URL to the array
    Statement.append(URL.format(output))
    
    # Statement[-1] is the last element in the array (most recent element)
    html_text = requests.get(Statement[-1]).text
    Soup = BeautifulSoup(html_text, "lxml")
    Books = Soup.find_all("tr")
    for things in Books:
        print((things.text))

It looks like you're creating the statement array correctly, but you are not using the array after filling it up.

This might be what you're looking for

for output in BList:

    # add the URL to the array
    Statement.append(URL.format(output))
    
    # Statement[-1] is the last element in the array (most recent element)
    html_text = requests.get(Statement[-1]).text
    Soup = BeautifulSoup(html_text, "lxml")
    Books = Soup.find_all("tr")
    for things in Books:
        print((things.text))
独夜无伴 2025-02-19 03:40:07

没有任何其他信息,很难给出具体的答案,因此改善您的问题将很棒。

我想强调两件事:

  1. URL参数(也称为查询字符串)是由问号启动的,而不是由shipator symbor &

    启动。

  2. 启动。需要一个称为语句的列表,简单地迭代Blist,然后直接请求:
    requests.get(url.format(output))。文本

示例

仅在替换示例URL part part https://www.samplebooks.com/时才能有效。使用正确的URL,如果HTML响应

import requests
from bs4 import BeautifulSoup 

URL = "https://www.samplebooks.com/?s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]

for b in BList:
    html_text = requests.get(URL.format(b)).text
    soup = BeautifulSoup(html_text)
    books = soup.find_all("tr")
    for things in books:
        print((things.text))

Without any additional information it is hard to give a concrete answer, so it would be great to improve your question.

There are two things I would like to highlight:

  1. URL parameters (also known as query strings) are initiated by a question mark ? and not by separator symbol &

  2. It do not need a list called Statement simply iterate the BList and request it directly like:
    requests.get(URL.format(output)).text

Example

Code only works if you replace example url part https://www.samplebooks.com/ with correct url and if there are <tr> in HTML response

import requests
from bs4 import BeautifulSoup 

URL = "https://www.samplebooks.com/?s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]

for b in BList:
    html_text = requests.get(URL.format(b)).text
    soup = BeautifulSoup(html_text)
    books = soup.find_all("tr")
    for things in books:
        print((things.text))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文