使用Beautifoulsoup创建Finder功能

发布于 2025-02-10 23:18:28 字数 1212 浏览 3 评论 0原文

我有一个def用于Web crapinging,但是当我将属性放在变量中时,(list)BeautifulSoup无法解决它,如果我执行执行,则它会返回,但是如果我手工放置它,则可以正常工作。

# llibreria x fer peticions html
import requests
#importem el Soup per fer busquedes al web
from bs4 import BeautifulSoup

#funcio x buscar a la web i fer codi reutilizable n
def finder(url):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(url[0], headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(url[1])
    price = soup.find(url[2])
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")
    
    
habitaclia =[ "https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55", "h1", "'span', {'itemprop':'price'"]
finder(habitaclia)  

I have a def for web-scraping but when I put an attribute in a variable, (list) beautifulsoup doesn't resolve it, if i execute it returns none but if I put it by hand it works.

# llibreria x fer peticions html
import requests
#importem el Soup per fer busquedes al web
from bs4 import BeautifulSoup

#funcio x buscar a la web i fer codi reutilizable n
def finder(url):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(url[0], headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(url[1])
    price = soup.find(url[2])
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")
    
    
habitaclia =[ "https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55", "h1", "'span', {'itemprop':'price'"]
finder(habitaclia)  

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

岁月流歌 2025-02-17 23:18:28

主要问题是您的字符串“'span',{'itemprop':'price'“在这种情况下用作一个参数,在这种情况下为标签名称。

这是在Beautifulsoup并没有以您期望的方式解析您的字符串时引起的。字符串也缺少}

如果您真的想这样工作,请尝试更改策略并使用更多结构化信息,例如dict

habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

并在脚本中使用它:

price = soup.find(config.get('tag_two'),config.get('filter_two')).text
示例
import requests
from bs4 import BeautifulSoup

def finder(config):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(config.get('url'), headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(config.get('tag_one'))
    price = soup.find(config.get('tag_two'),config.get('filter_two')).text
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")


habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

finder(habitaclia)
结果
__________ House Finded _________
Casa pareada  en venta en El Colomer - La Bateria - La Coma Llançà
248000
________________________________

Main issue is that your string "'span', {'itemprop':'price'" is used as one argument, in this case as the tag name.

This is caused while BeautifulSoup is not parsing your string in the way you expect, it is not able two know that the string should represent two arguments separated by comma. String is is also missing a }

If you really wanna work that way try to change your strategy and use more structured information e.g. a dict:

habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

and use it in your script like this:

price = soup.find(config.get('tag_two'),config.get('filter_two')).text
Example
import requests
from bs4 import BeautifulSoup

def finder(config):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(config.get('url'), headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(config.get('tag_one'))
    price = soup.find(config.get('tag_two'),config.get('filter_two')).text
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")


habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

finder(habitaclia)
Result
__________ House Finded _________
Casa pareada  en venta en El Colomer - La Bateria - La Coma Llançà
248000
________________________________
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文