解析 Wunderground 中的 HTML 数据

发布于 2025-01-04 03:31:52 字数 1954 浏览 0 评论 0原文

所有,

我正在尝试从 Wunderground 下载天气数据历史记录。我遇到的问题是我需要完整的 METAR 信息。

这是我要下载的示例: 带有完整 METAR 的 CSV

由于我想下载全年的每小时数据,因此我需要编写它们的脚本。但无论我尝试什么(使用 wget 的 bash 或 python),我仍然无法通过脚本获得包含完整 METAR 的页面。

这是我的脚本的示例:

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
page = urllib2.urlopen(url)
dailyData = page.read()                            
print dailyData

我所拥有的是这样的:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,200,2011-01-01 06:54:00<br />
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,200,2011-01-01 07:54:00<br />
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,150,2011-01-01 08:54:00<br />

通过网络浏览器,这就是我得到的 - 请注意以 METAR 开头的新专栏:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,METAR KBUF 010754Z 20007KT 10SM BKN050 BKN130 10/07 A2994 RMK AO2 SLP140 T01000067,200,2011-01-01 07:54:00
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,METAR KBUF 010854Z 15005KT 10SM SCT050 SCT130 11/07 A2992 RMK AO2 SLP134 T01060067 58000,150,2011-01-01 08:54:00

对此的任何解决方案将不胜感激。谢谢!

All,

I am trying to download the weather data history from Wunderground. The problem that I have is that I need the full METAR information.

Here is the example that I want to download: CSV with full METAR.

Since I want to download the hourly data for the whole year, I need to script them. But no matter what I tried (bash with wget, or python), I still cannot have the page with full METAR via the script.

Here is the example of my script:

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
page = urllib2.urlopen(url)
dailyData = page.read()                            
print dailyData

What I have is something like:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,200,2011-01-01 06:54:00<br />
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,200,2011-01-01 07:54:00<br />
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,150,2011-01-01 08:54:00<br />

Through a web browswer, this is what I get - note a new column that starts with METAR:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,METAR KBUF 010754Z 20007KT 10SM BKN050 BKN130 10/07 A2994 RMK AO2 SLP140 T01000067,200,2011-01-01 07:54:00
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,METAR KBUF 010854Z 15005KT 10SM SCT050 SCT130 11/07 A2992 RMK AO2 SLP134 T01060067 58000,150,2011-01-01 08:54:00

Any solution to this would be appreciated. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

魔法少女 2025-01-11 03:31:52

浏览wunderunderground,我发现了“显示完整的 METARS” 链接。单击此处后,将浏览器指向 您发布的链接“逗号分隔文件”链接 显示 METAR 数据。好像设置了一些cookie。例如,page.info() 显示“Prefs”包含“SHOWMETAR:1”

Set-Cookie: Prefs=FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|SHOWMETAR:1|PHOTOTHUMBS:50|HISTICAO:KBUF*NULL|; path=/; expires=Fri, 01-Jan-2020 00:00:00 GMT; domain=.wunderground.com

import urllib2
import cookielib

cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))

setmetar = 'http://www.wunderground.com/cgi-bin/findweather/getForecast?setpref=SHOWMETAR&value=1'
request = urllib2.Request(setmetar)
response = opener.open(request)

url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
request = urllib2.Request(url)
page = opener.open(request)
# print(page.info())
dailyData = page.read()                            
print dailyData

TimeEST,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,FullMetar,WindDirDegrees,DateUTC<br />
12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00<br />

Browsing the wunderunderground, I found the "Show full METARS" link. After clicking there, pointing the browser at the link you posted or the "Comma Delimited File" link shows METAR data. It seems to set some cookies. For example, page.info() shows that "Prefs" includes "SHOWMETAR:1":

Set-Cookie: Prefs=FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|SHOWMETAR:1|PHOTOTHUMBS:50|HISTICAO:KBUF*NULL|; path=/; expires=Fri, 01-Jan-2020 00:00:00 GMT; domain=.wunderground.com

import urllib2
import cookielib

cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))

setmetar = 'http://www.wunderground.com/cgi-bin/findweather/getForecast?setpref=SHOWMETAR&value=1'
request = urllib2.Request(setmetar)
response = opener.open(request)

url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
request = urllib2.Request(url)
page = opener.open(request)
# print(page.info())
dailyData = page.read()                            
print dailyData

yields

TimeEST,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,FullMetar,WindDirDegrees,DateUTC<br />
12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00<br />
朕就是辣么酷 2025-01-11 03:31:52

当我通过浏览器点击该 URL 时,我看到的数据与您包含的第一个示例相同。浏览 Wunderground 网站,似乎有一种方法可以注册开发人员/API 帐户 - 如果您已经这样做了,并且在检索数据时登录了,那么差异可能是由于可用的附加信息造成的。注册用户。

如果您需要进行身份验证才能获取完整数据,那么值得您花时间考虑使用 mechanize 帮助您管理 cookie。

否则,我怀疑您使用的 URL 存在差异 - 扩展数据可能是使用附加参数指定的。

When I hit that URL through a browser, the data I see is the same as the first sample you include. Poking around the Wunderground site, it looks like there's a way to register for a developer/API account--if you've done that, and are signed in when retrieving the data, then the discrepancy is likely due to the additional information available to registered users.

If you need to authenticate to get the full data, it'll be worth your time to look into using mechanize to help you manage cookies.

Otherwise, I suspect there's a difference in the URLs you're using--the expanded data is probably specified with additional arguments.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文