问题通过网站循环

发布于 2025-02-12 03:40:07 字数 452 浏览 1 评论 0原文

试图循环浏览一个网站的多个页面。我可以刮擦一页，但是我不知道如何用列表中的项目替换URL中的括号中的内容。

URL = "https://www.samplebooks.com/&s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]
Statement = []
html_text = requests.get(Statement).text
Soup = BeautifulSoup(html_text, "lxml")
Books = Soup.find_all("tr")
for output in BList:
    Statement.append(URL.format(output))
    for things in Books:
        print((things.text))

原文

Trying to loop through multiple pages of one website. I can scrape one page, but i can't figure out how to replace whats in the brackets in the URL with the items in the list.

URL = "https://www.samplebooks.com/&s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]
Statement = []
html_text = requests.get(Statement).text
Soup = BeautifulSoup(html_text, "lxml")
Books = Soup.find_all("tr")
for output in BList:
    Statement.append(URL.format(output))
    for things in Books:
        print((things.text))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

嘿看小鸭子会跑 2025-02-19 03:40:07

看来您正在正确创建语句数组，但是填充后您没有使用数组。

这可能就是您要寻找的

for output in BList:

    # add the URL to the array
    Statement.append(URL.format(output))
    
    # Statement[-1] is the last element in the array (most recent element)
    html_text = requests.get(Statement[-1]).text
    Soup = BeautifulSoup(html_text, "lxml")
    Books = Soup.find_all("tr")
    for things in Books:
        print((things.text))

It looks like you're creating the statement array correctly, but you are not using the array after filling it up.

This might be what you're looking for

for output in BList:

    # add the URL to the array
    Statement.append(URL.format(output))
    
    # Statement[-1] is the last element in the array (most recent element)
    html_text = requests.get(Statement[-1]).text
    Soup = BeautifulSoup(html_text, "lxml")
    Books = Soup.find_all("tr")
    for things in Books:
        print((things.text))

回复收藏 0 原文

独夜无伴 2025-02-19 03:40:07

没有任何其他信息，很难给出具体的答案，因此改善您的问题将很棒。

我想强调两件事：

URL参数（也称为查询字符串）是由问号？启动的，而不是由shipator symbor ＆amp;
启动。
启动。需要一个称为语句的列表，简单地迭代Blist，然后直接请求：
requests.get（url.format（output））。文本

示例

仅在替换示例URL part part https://www.samplebooks.com/时才能有效。使用正确的URL，如果HTML响应

import requests
from bs4 import BeautifulSoup 

URL = "https://www.samplebooks.com/?s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]

for b in BList:
    html_text = requests.get(URL.format(b)).text
    soup = BeautifulSoup(html_text)
    books = soup.find_all("tr")
    for things in books:
        print((things.text))

Without any additional information it is hard to give a concrete answer, so it would be great to improve your question.

There are two things I would like to highlight:

URL parameters (also known as query strings) are initiated by a question mark ? and not by separator symbol &
It do not need a list called Statement simply iterate the BList and request it directly like:
requests.get(URL.format(output)).text

Example

Code only works if you replace example url part https://www.samplebooks.com/ with correct url and if there are <tr> in HTML response

import requests
from bs4 import BeautifulSoup 

URL = "https://www.samplebooks.com/?s={}&1000"
BList = ["28", "9", "10", "14", "6", "13", "30", "29", "1", "24", "27"]

for b in BList:
    html_text = requests.get(URL.format(b)).text
    soup = BeautifulSoup(html_text)
    books = soup.find_all("tr")
    for things in books:
        print((things.text))

回复收藏 0 原文

~没有更多了~