美丽的汤中有继续运作吗?

发布于 2025-02-12 12:56:49 字数 1317 浏览 1 评论 0原文

from unittest import skip
import requests
from bs4 import BeautifulSoup
from csv import writer
import openpyxl

wb = openpyxl.load_workbook('Book3.xlsx')
ws = wb.active

with open('mtbs.csv', 'w', encoding='utf8', newline='') as f_output:
    csv_output = writer(f_output)
    header = ['Code','Product Description']
    csv_output.writerow(header)
    
    for row in ws.iter_rows(min_row=1, min_col=1, max_col=1, values_only=True):
        url = f"https://www.radwell.com/en-US/Buy/MITSUBISHI/MITSUBISHI/{row[0]}"
        print(url)
        req_page = requests.get(url)
        soup = BeautifulSoup(req_page.content, 'html.parser')
        div_techspec = soup.find('div', class_="minitabSection")

        if 'minitabSection' not in url:
            continue //does not work

        code = div_techspec.find_all('li')
        description1 = div_techspec.find_all('li')
        description2 = div_techspec.find_all('li')
        description3 = div_techspec.find_all('li')


        info = [code[0].text, description1[1].text, description2[2].text, description3[3].text]
        csv_output.writerow(info)

我目前正在尝试从某个网站收集数据。我有一个包含数百种产品代码的Excel表。但是,我目前正在取消的网站中不存在某些产品,并且循环停止运行。

我目前在此部分问题如果在URL中没有“ MinitabSection”:继续

不存在的URL应该被跳过并继续运行其余的代码。我该如何实现?

from unittest import skip
import requests
from bs4 import BeautifulSoup
from csv import writer
import openpyxl

wb = openpyxl.load_workbook('Book3.xlsx')
ws = wb.active

with open('mtbs.csv', 'w', encoding='utf8', newline='') as f_output:
    csv_output = writer(f_output)
    header = ['Code','Product Description']
    csv_output.writerow(header)
    
    for row in ws.iter_rows(min_row=1, min_col=1, max_col=1, values_only=True):
        url = f"https://www.radwell.com/en-US/Buy/MITSUBISHI/MITSUBISHI/{row[0]}"
        print(url)
        req_page = requests.get(url)
        soup = BeautifulSoup(req_page.content, 'html.parser')
        div_techspec = soup.find('div', class_="minitabSection")

        if 'minitabSection' not in url:
            continue //does not work

        code = div_techspec.find_all('li')
        description1 = div_techspec.find_all('li')
        description2 = div_techspec.find_all('li')
        description3 = div_techspec.find_all('li')


        info = [code[0].text, description1[1].text, description2[2].text, description3[3].text]
        csv_output.writerow(info)

I am currently trying to collect data from a certain website. I have an excel sheet containing hundreds of product codes. However some product does not exist in the website that I am currently scrapping from and the loop stops running.

I am currently having issues for this part if 'minitabSection' not in url: continue

URL that does not exist should be skipped and continue running the rest of codes. How do I achieve this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

浪菊怪哟 2025-02-19 12:56:49

不确定URL中是否有字符串minitabSection - 尝试find() div_techspec您应该检查其结果:

...
div_techspec = soup.find('div', class_="minitabSection")

if div_techspec is None:
    continue 
...

或其他四处走动:

...
div_techspec = soup.find('div', class_="minitabSection")

if div_techspec:
    code = div_techspec.find_all('li')

    info = [code[0].text, code[1].text, code[2].text, code[3].text]
    csv_output.writerow(info)

else:
    continue 
...

Not sure if there is a string minitabSection in your url - While you try to find() the div_techspec you should check its result:

...
div_techspec = soup.find('div', class_="minitabSection")

if div_techspec is None:
    continue 
...

or other way around:

...
div_techspec = soup.find('div', class_="minitabSection")

if div_techspec:
    code = div_techspec.find_all('li')

    info = [code[0].text, code[1].text, code[2].text, code[3].text]
    csv_output.writerow(info)

else:
    continue 
...
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文