使用Beautifoulsoup创建Finder功能
我有一个def
用于Web crapinging,但是当我将属性放在变量中时,(list)BeautifulSoup
无法解决它,如果我执行执行,则它会返回无
,但是如果我手工放置它,则可以正常工作。
# llibreria x fer peticions html
import requests
#importem el Soup per fer busquedes al web
from bs4 import BeautifulSoup
#funcio x buscar a la web i fer codi reutilizable n
def finder(url):
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
page = requests.get(url[0], headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
result = soup.find(url[1])
price = soup.find(url[2])
price = price.replace("€","")
price = price.replace(".","")
price = int(price)
print("__________ House Finded _________")
print(result.text)
print(price)
print("________________________________")
habitaclia =[ "https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55", "h1", "'span', {'itemprop':'price'"]
finder(habitaclia)
I have a def
for web-scraping but when I put an attribute in a variable, (list) beautifulsoup
doesn't resolve it, if i execute it returns none
but if I put it by hand it works.
# llibreria x fer peticions html
import requests
#importem el Soup per fer busquedes al web
from bs4 import BeautifulSoup
#funcio x buscar a la web i fer codi reutilizable n
def finder(url):
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
page = requests.get(url[0], headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
result = soup.find(url[1])
price = soup.find(url[2])
price = price.replace("€","")
price = price.replace(".","")
price = int(price)
print("__________ House Finded _________")
print(result.text)
print(price)
print("________________________________")
habitaclia =[ "https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55", "h1", "'span', {'itemprop':'price'"]
finder(habitaclia)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
主要问题是您的字符串
“'span',{'itemprop':'price'“
在这种情况下用作一个参数
,在这种情况下为标签名称。这是在
Beautifulsoup
并没有以您期望的方式解析您的字符串时引起的。字符串也缺少}
,如果您真的想这样工作,请尝试更改策略并使用更多结构化信息,例如
dict
:并在脚本中使用它:
示例
结果
Main issue is that your string
"'span', {'itemprop':'price'"
is used as oneargument
, in this case as the tag name.This is caused while
BeautifulSoup
is not parsing your string in the way you expect, it is not able two know that the string should represent two arguments separated by comma. String is is also missing a}
If you really wanna work that way try to change your strategy and use more structured information e.g. a
dict
:and use it in your script like this:
Example
Result