第一次从WeatherCast网站刮擦

发布于 2025-02-06 19:52:37 字数 485 浏览 0 评论 0原文

我正在学习Web刮擦作为我的第一个迷你项目。目前与Python合作。我想提取天气数据并使用python来显示我所居住的天气,我通过检查标签找到了所需的数据,但它不断地给我所有的数字我试图编写其特定索引号,但仍然没有用。这是我到目前为止写的代码;

import requests
from bs4 import BeautifulSoup as bs

url= "http://kktcmeteor.org/tahminler/merkezler?m=ISKELE"
r= requests.get(url)

cast = bs(r.content, "lxml")

wthr = cast.findAll("div",{"class": "col-md-9"})
print (wthr)

任何帮助将不胜感激。我想要的数据是温度数据。

还;有人可以向我解释使用LXML或HTML.Parser之间的区别。我已经看到两种方法都被广泛使用,并且很好奇您如何决定使用另一种方法。

I am learning web scraping as my first mini-project. Currently working with python. I want to extract a weather data and use python to show the weather where I am living, I have found the data I needed by inspecting the tags but it keeps giving me all the numbers on the weather forecast table instead of the specific one I need I have tried for to write its specific index number but it still did not work. This is the code I have written so far;

import requests
from bs4 import BeautifulSoup as bs

url= "http://kktcmeteor.org/tahminler/merkezler?m=ISKELE"
r= requests.get(url)

cast = bs(r.content, "lxml")

wthr = cast.findAll("div",{"class": "col-md-9"})
print (wthr)

Any help would be greatly appreciated. The data I want is the Temperature data.

Also; Can somebody explain to me the differences between using lxml or html.parser. I have seen both methods being used widely and was curious how would you decide to use one over the other.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

以为你会在 2025-02-13 19:52:37

在诉讼之前,应考虑刮擦的合法性。您可以在这里找到有关它的信息< aping.htm
该站点没有robots.txt文件,因此可以爬网。
这是一种非常简化的方法,可以在您在问题中使用的URL中发布表数据。这使用html.parser来提取数据,

import requests
from bs4 import BeautifulSoup

def get_soup(my_url):
    HTML = requests.get(my_url) 
    my_soup = BeautifulSoup(HTML.text, 'html.parser') 
    if 'None' not in str(type(my_soup)):
        return my_soup
    else:
        return None

url = "http://kktcmeteor.org/tahminler/merkezler?m=ISKELE"

#   get the whole html document
soup = get_soup(url)

#   get something from that soup
#   here a table header and data are extracted from the soup
table_header = soup.find("table").findAll("th")
table_data = soup.find("table").findAll("td")

#   header's and data's type is list
#   combine lists
for x in range(len(table_header)):
    print(table_header[x].text + ' --> ' + table_data[x].text)

""" R e s u l t :
Tarih / Saat -->

Hava --> COK BULUTLU
Sıcaklık --> 27.5°C
İşba Sıcaklığı --> 17.9°C
Basınç --> 1003.5 hPa
Görüş --> 10 km
Rüzgar --> Batıdan (270) 5 kt.
12.06.2022 13:00 --> Genel Tablo Genel Harita
"""

这只是执行此操作的一种方法,并且仅在网站上的透明表中显示一个部分。再一次,请注意网站的robots.txt文件中所述的说明。问候...

Legality of scraping should be considered before the action. You can find something about it here https://www.tutorialspoint.com/python_web_scraping/legality_of_python_web_scraping.htm
This site doesn't have robots.txt file so it is permitted to crawl.
Here is a very simplified way to get the table data published at the url that you use in the question. This uses html.parser to extract data

import requests
from bs4 import BeautifulSoup

def get_soup(my_url):
    HTML = requests.get(my_url) 
    my_soup = BeautifulSoup(HTML.text, 'html.parser') 
    if 'None' not in str(type(my_soup)):
        return my_soup
    else:
        return None

url = "http://kktcmeteor.org/tahminler/merkezler?m=ISKELE"

#   get the whole html document
soup = get_soup(url)

#   get something from that soup
#   here a table header and data are extracted from the soup
table_header = soup.find("table").findAll("th")
table_data = soup.find("table").findAll("td")

#   header's and data's type is list
#   combine lists
for x in range(len(table_header)):
    print(table_header[x].text + ' --> ' + table_data[x].text)

""" R e s u l t :
Tarih / Saat -->

Hava --> COK BULUTLU
Sıcaklık --> 27.5°C
İşba Sıcaklığı --> 17.9°C
Basınç --> 1003.5 hPa
Görüş --> 10 km
Rüzgar --> Batıdan (270) 5 kt.
12.06.2022 13:00 --> Genel Tablo Genel Harita
"""

This is just one way to do it and it gets just a part shown in a transparent table on the site. Once more, take care of the instructions stated in the robots.txt file of the site. Regards...

心意如水 2025-02-13 19:52:37

我认为元素是&lt; div class =“ temp”&gt; 24.2°C&lt;/div;。

如果您的主要重点只是温度数据,则可以查看天气API。您可以找到几个公共API,您可以找到在这里

I think element is <div class="temp">24.2 °c</div>.

If your primary focus is just temperature data, you can check out weather APIs. There are several public APIs you could find here.

忆悲凉 2025-02-13 19:52:37

您是否检查了Web服务是否可以使用API​​?如果您保持一定的请求限制,许多天气应用都可以免费使用API​​。如果有的话,您可以轻松地仅请求所需的数据,因此无需格式化它。

Did you check if the webservice has an api you can use? Many weather-apps have api's you can use for free if you stay under a certain limit of requests. If there is, you could easily request only the data you need, so there is no need of formatting it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文