使用Beautifoulsoup创建Finder功能

发布于 2025-02-10 23:18:28 字数 1212 浏览 3 评论 0原文

我有一个def用于Web crapinging，但是当我将属性放在变量中时，（list）BeautifulSoup无法解决它，如果我执行执行，则它会返回无，但是如果我手工放置它，则可以正常工作。

# llibreria x fer peticions html
import requests
#importem el Soup per fer busquedes al web
from bs4 import BeautifulSoup

#funcio x buscar a la web i fer codi reutilizable n
def finder(url):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(url[0], headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(url[1])
    price = soup.find(url[2])
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")
    
    
habitaclia =[ "https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55", "h1", "'span', {'itemprop':'price'"]
finder(habitaclia)

原文

I have a def for web-scraping but when I put an attribute in a variable, (list) beautifulsoup doesn't resolve it, if i execute it returns none but if I put it by hand it works.

# llibreria x fer peticions html
import requests
#importem el Soup per fer busquedes al web
from bs4 import BeautifulSoup

#funcio x buscar a la web i fer codi reutilizable n
def finder(url):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(url[0], headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(url[1])
    price = soup.find(url[2])
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")
    
    
habitaclia =[ "https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55", "h1", "'span', {'itemprop':'price'"]
finder(habitaclia)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

岁月流歌 2025-02-17 23:18:28

主要问题是您的字符串“'span'，{'itemprop'：'price'“在这种情况下用作一个参数，在这种情况下为标签名称。

这是在Beautifulsoup并没有以您期望的方式解析您的字符串时引起的。字符串也缺少}，

如果您真的想这样工作，请尝试更改策略并使用更多结构化信息，例如dict：

habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

并在脚本中使用它：

price = soup.find(config.get('tag_two'),config.get('filter_two')).text

示例

import requests
from bs4 import BeautifulSoup

def finder(config):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(config.get('url'), headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(config.get('tag_one'))
    price = soup.find(config.get('tag_two'),config.get('filter_two')).text
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")


habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

finder(habitaclia)

结果

__________ House Finded _________
Casa pareada  en venta en El Colomer - La Bateria - La Coma Llançà
248000
________________________________

Main issue is that your string "'span', {'itemprop':'price'" is used as one argument, in this case as the tag name.

This is caused while BeautifulSoup is not parsing your string in the way you expect, it is not able two know that the string should represent two arguments separated by comma. String is is also missing a }

If you really wanna work that way try to change your strategy and use more structured information e.g. a dict:

habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

and use it in your script like this:

price = soup.find(config.get('tag_two'),config.get('filter_two')).text

Example

import requests
from bs4 import BeautifulSoup

def finder(config):
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}
    page = requests.get(config.get('url'), headers=headers)
    soup = BeautifulSoup(page.content, "html.parser")
    result = soup.find(config.get('tag_one'))
    price = soup.find(config.get('tag_two'),config.get('filter_two')).text
    price = price.replace("€","")
    price = price.replace(".","")
    price = int(price)
    print("__________ House Finded _________")
    print(result.text)
    print(price)
    print("________________________________")


habitaclia ={'url':'https://www.habitaclia.com/comprar-casa_pareada-en_venta_en_llanca_el_colomer_la_bateria_la_coma-llansa-i16454000001818.htm?hab=3&ordenar=precio_mas_bajo&st=3,6,8,10,12,15&f=parking&geo=p&from=list&lo=55', 
             'tag_one':'h1', 
             'tag_two': 'span',
             'filter_two': {'itemprop':'price'}
            }

finder(habitaclia)

Result

__________ House Finded _________
Casa pareada  en venta en El Colomer - La Bateria - La Coma Llançà
248000
________________________________

回复收藏 0 原文

~没有更多了~

关于作者

甜警司

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

使用Beautifoulsoup创建Finder功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

示例

结果

Example

Result

关于作者

相关话题

热门标签

推荐作者

闻呓

深府石板幽径

mabiao

枕花眠

qq_CrTt6n

红颜悴

友情链接

使用Beautifoulsoup创建Finder功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

示例

结果

Example

Result

关于作者

相关话题

热门标签

推荐作者

闻呓

深府石板幽径

mabiao

枕花眠

qq_CrTt6n

红颜悴

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。