AttributeError:' nonepy'对象没有属性'提取'
我试图从页面中排除DIV和NAV。第一次运行似乎很棒,但随后会引发错误。
从此页面:
=“ nofollow noreferrer”> https://discuss.dizzycoding.com/exclude-unwanted-tag-on-beautifulsoup-python/
我试图获取文章的文本(例如,在第五篇文章中是在第五篇文章中),但不是附件()和nav。
控制台日志:
ps c:\ users \ thoma \ desktop \ py \ velkesvatonovice.cz \ scripts> Python main.py
Trackback(最近的最新通话):文件“ main.py”,第53行,在 dunctedAttachments.extract()attributeError:'nontype'对象没有属性'extract'
代码的问题部分:
#Text full
unwantedAttachments = artcontent.find('div', class_="attachments")
unwantedAttachments.extract()
unwantedNav = artcontent.find('nav')
unwantedNav.extract()
print(artcontent)
完整代码:
from bs4 import BeautifulSoup
import requests
import re
from csv import writer
pageno=1
url= "https://www.velkesvatonovice.cz/windex.php/rubrika/elektronicka-uredni-deska/page/"+str(pageno)+"/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("article")
#65
def normalize(str):
return(re.sub(r'\xa0', ' ', str))
with open("listings.csv", "w", encoding="utf8") as f:
thewriter = writer(f)
header= ["Name", "Text", "Text full" ,"Attachments" , "Category", "Category full", "Date", "URL", "Page"]
thewriter.writerow(header)
for list in lists:
categorieslist=list.find_all("a", rel="category tag")
#Name
article=list.find("a", rel="bookmark").text.strip()
#Text
text=list.find("div", class_="entry excerpt entry-summary").text
#Category
category = (categorieslist[len(categorieslist)-1])
#Category full
categories=""
for cat in categorieslist:
categories += (cat.text + "/")
#Date
date=list.find("time").text
#URL
urlarticle=list.find("a", rel="bookmark")["href"]
pageart = requests.get(urlarticle)
soupart = BeautifulSoup(pageart.content, "html.parser")
artcontent = soupart.find("div", class_="entry-inner")
#Text full
unwantedAttachments = artcontent.find('div', class_="attachments")
unwantedAttachments.extract()
unwantedNav = artcontent.find('nav')
unwantedNav.extract()
print(artcontent)
#Attachments
#Page
item = [normalize(article), normalize(text), "ss", "Attachment", category.text, categories, date, urlarticle]
thewriter.writerow(item)
Im trying to exclude a div and nav from a page. The first run seems to run great, but then it throws error.
From this page: https://www.velkesvatonovice.cz/windex.php/rubrika/elektronicka-uredni-deska/
"Exclude" code from: https://discuss.dizzycoding.com/exclude-unwanted-tag-on-beautifulsoup-python/
Im trying to get the text of an article (which is for example in the 5th article), but not the attachments (), and nav.
Console log:
PS C:\Users\thoma\Desktop\py\velkesvatonovice.cz\scripts> python
main.pyTraceback (most recent call last): File
"main.py", line 53, in
unwantedAttachments.extract() AttributeError: 'NoneType' object has no attribute 'extract'
Problematic part of the code:
#Text full
unwantedAttachments = artcontent.find('div', class_="attachments")
unwantedAttachments.extract()
unwantedNav = artcontent.find('nav')
unwantedNav.extract()
print(artcontent)
Full code:
from bs4 import BeautifulSoup
import requests
import re
from csv import writer
pageno=1
url= "https://www.velkesvatonovice.cz/windex.php/rubrika/elektronicka-uredni-deska/page/"+str(pageno)+"/"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("article")
#65
def normalize(str):
return(re.sub(r'\xa0', ' ', str))
with open("listings.csv", "w", encoding="utf8") as f:
thewriter = writer(f)
header= ["Name", "Text", "Text full" ,"Attachments" , "Category", "Category full", "Date", "URL", "Page"]
thewriter.writerow(header)
for list in lists:
categorieslist=list.find_all("a", rel="category tag")
#Name
article=list.find("a", rel="bookmark").text.strip()
#Text
text=list.find("div", class_="entry excerpt entry-summary").text
#Category
category = (categorieslist[len(categorieslist)-1])
#Category full
categories=""
for cat in categorieslist:
categories += (cat.text + "/")
#Date
date=list.find("time").text
#URL
urlarticle=list.find("a", rel="bookmark")["href"]
pageart = requests.get(urlarticle)
soupart = BeautifulSoup(pageart.content, "html.parser")
artcontent = soupart.find("div", class_="entry-inner")
#Text full
unwantedAttachments = artcontent.find('div', class_="attachments")
unwantedAttachments.extract()
unwantedNav = artcontent.find('nav')
unwantedNav.extract()
print(artcontent)
#Attachments
#Page
item = [normalize(article), normalize(text), "ss", "Attachment", category.text, categories, date, urlarticle]
thewriter.writerow(item)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
一个简单的“如果”解决了整个问题。 Thx @Ahmad指出了这一点。
A simple "if" fixes the whole problem. Thx @Ahmad for pointing it out.