使用断点调试 Python for 循环
我正在使用 bs4 为抓取项目编写 Python 脚本。
在抓取了几页后,脚本给了我一个“IndexError”。为了解决这个问题,我修改了代码并将增量变量放入“for 循环”中,特别是在包含页面的更高级别类别中。这样我至少减少了 1/2 的时间。
现在,有了这个“绕过解决方案”,我发现还是要等待很多时间。
这里就想到了断点和调试的解决方法。但这是主要问题:如何调试 for 循环从头开始而不丢失增量变量?
考虑我正在使用 PyCharm/Spyder。
阅读了很多这个主题,但没有找到任何解决方案
查看很多 youtube vids 教程
在间距中放置和删除断点。
from bs4 import BeautifulSoup
import requests
from fake_headers import Headers
branked_c0,link_ranked_c0,q_a_c0,condition_soldnumber_c0,price_tag_c0,link_image_c0,link_product_c0,number_sales_seller_c0,name_product_c0=[],[],[],[],[],[],[],[],[]
original_link="https://mercadolibre.cl/categorias#menu=categories"
req=requests.get(original_link,headers=Headers().generate())
soup=BeautifulSoup(req.text,'html.parser')
main=soup.select(".categories__container")[12] #bloque de categorias ##
sub_links=[]
end_="_Desde_{}"
for sbc in main:
title_main=sbc.nextSibling
print(type(title_main))
print(len(title_main.contents))
for i in range(0, len(title_main.contents)):
print(i)
subaru=title_main.contents[int(i)]
print(subaru)
for xo in subaru("a"):
nombre_subcat = xo.string
link_subcat = xo.get("href")
print(link_subcat)
sub_links.append(link_subcat)
print(sub_links)
for link in sub_links:
req = requests.get(link.format(0), headers=Headers().generate())
print(str(link) + str(nombre_subcat) + " = link s/nº")
soup = BeautifulSoup(req.text, 'html.parser')
try:
page_count=soup.select(".andes-pagination__page-count")[0]
total_count=(page_count.text.split(" "))[-1]
except:
page_count = 0
print(str(page_count) + " = número de página de la subcategoría")
# -#print(str(total_count)+"= número total de páginas de la categoría")
count = 51
for page_no in range(int(total_count)):
req = requests.get(link.format(count),headers=Headers().generate())
soup = BeautifulSoup(req.text, 'html.parser')
sbc = soup.select(".ui-search-item__group.ui-search-item__group--title")```
I'm writing a Python Script for a scraping project, using bs4.
The script gives me an "IndexError", after a few pages scraped. In order to solve that I modified the code and put an incremental variable in a "for loop", specifically in a higher level category who contain the pages. With this I reduced in at least 1/2 the amount of time.
Now, with this "bypass solution", I found that still have to wait a lot of time.
Here it comes to me the solution of breakpoints and debugging. But here it's the main problem: how to debug a for loop to start from the beginning without losing the incremental variable?
Consider I'm using PyCharm/Spyder.
Read a lot of this topics, without find any solution
View a lot of youtube vids tutorial
Put and remove breakpoints in the gutter.
from bs4 import BeautifulSoup
import requests
from fake_headers import Headers
branked_c0,link_ranked_c0,q_a_c0,condition_soldnumber_c0,price_tag_c0,link_image_c0,link_product_c0,number_sales_seller_c0,name_product_c0=[],[],[],[],[],[],[],[],[]
original_link="https://mercadolibre.cl/categorias#menu=categories"
req=requests.get(original_link,headers=Headers().generate())
soup=BeautifulSoup(req.text,'html.parser')
main=soup.select(".categories__container")[12] #bloque de categorias ##
sub_links=[]
end_="_Desde_{}"
for sbc in main:
title_main=sbc.nextSibling
print(type(title_main))
print(len(title_main.contents))
for i in range(0, len(title_main.contents)):
print(i)
subaru=title_main.contents[int(i)]
print(subaru)
for xo in subaru("a"):
nombre_subcat = xo.string
link_subcat = xo.get("href")
print(link_subcat)
sub_links.append(link_subcat)
print(sub_links)
for link in sub_links:
req = requests.get(link.format(0), headers=Headers().generate())
print(str(link) + str(nombre_subcat) + " = link s/nº")
soup = BeautifulSoup(req.text, 'html.parser')
try:
page_count=soup.select(".andes-pagination__page-count")[0]
total_count=(page_count.text.split(" "))[-1]
except:
page_count = 0
print(str(page_count) + " = número de página de la subcategoría")
# -#print(str(total_count)+"= número total de páginas de la categoría")
count = 51
for page_no in range(int(total_count)):
req = requests.get(link.format(count),headers=Headers().generate())
soup = BeautifulSoup(req.text, 'html.parser')
sbc = soup.select(".ui-search-item__group.ui-search-item__group--title")```
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论