解析 Wunderground 中的 HTML 数据

发布于 2025-01-04 03:31:52 字数 1954 浏览 0 评论 0原文

所有，

我正在尝试从 Wunderground 下载天气数据历史记录。我遇到的问题是我需要完整的 METAR 信息。

由于我想下载全年的每小时数据，因此我需要编写它们的脚本。但无论我尝试什么（使用 wget 的 bash 或 python），我仍然无法通过脚本获得包含完整 METAR 的页面。

这是我的脚本的示例：

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
page = urllib2.urlopen(url)
dailyData = page.read()                            
print dailyData

我所拥有的是这样的：

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,200,2011-01-01 06:54:00<br />
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,200,2011-01-01 07:54:00<br />
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,150,2011-01-01 08:54:00<br />

通过网络浏览器，这就是我得到的 - 请注意以 METAR 开头的新专栏：

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,METAR KBUF 010754Z 20007KT 10SM BKN050 BKN130 10/07 A2994 RMK AO2 SLP140 T01000067,200,2011-01-01 07:54:00
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,METAR KBUF 010854Z 15005KT 10SM SCT050 SCT130 11/07 A2992 RMK AO2 SLP134 T01060067 58000,150,2011-01-01 08:54:00

对此的任何解决方案将不胜感激。谢谢！

原文

All,

I am trying to download the weather data history from Wunderground. The problem that I have is that I need the full METAR information.

Here is the example that I want to download: CSV with full METAR.

Since I want to download the hourly data for the whole year, I need to script them. But no matter what I tried (bash with wget, or python), I still cannot have the page with full METAR via the script.

Here is the example of my script:

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
page = urllib2.urlopen(url)
dailyData = page.read()                            
print dailyData

What I have is something like:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,200,2011-01-01 06:54:00<br />
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,200,2011-01-01 07:54:00<br />
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,150,2011-01-01 08:54:00<br />

Through a web browswer, this is what I get - note a new column that starts with METAR:

12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00
2:54 AM,50.0,44.1,80,29.95,10.0,SSW,8.1,-,N/A,,Mostly Cloudy,METAR KBUF 010754Z 20007KT 10SM BKN050 BKN130 10/07 A2994 RMK AO2 SLP140 T01000067,200,2011-01-01 07:54:00
3:54 AM,51.1,44.1,77,29.93,10.0,SSE,5.8,-,N/A,,Scattered Clouds,METAR KBUF 010854Z 15005KT 10SM SCT050 SCT130 11/07 A2992 RMK AO2 SLP134 T01060067 58000,150,2011-01-01 08:54:00

Any solution to this would be appreciated. Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

魔法少女 2025-01-11 03:31:52

浏览wunderunderground，我发现了“显示完整的 METARS” 链接。单击此处后，将浏览器指向您发布的链接或 “逗号分隔文件”链接显示 METAR 数据。好像设置了一些cookie。例如，page.info() 显示“Prefs”包含“SHOWMETAR:1”

Set-Cookie: Prefs=FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|SHOWMETAR:1|PHOTOTHUMBS:50|HISTICAO:KBUF*NULL|; path=/; expires=Fri, 01-Jan-2020 00:00:00 GMT; domain=.wunderground.com

import urllib2
import cookielib

cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))

setmetar = 'http://www.wunderground.com/cgi-bin/findweather/getForecast?setpref=SHOWMETAR&value=1'
request = urllib2.Request(setmetar)
response = opener.open(request)

url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
request = urllib2.Request(url)
page = opener.open(request)
# print(page.info())
dailyData = page.read()                            
print dailyData

：

TimeEST,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,FullMetar,WindDirDegrees,DateUTC<br />
12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00<br />

Browsing the wunderunderground, I found the "Show full METARS" link. After clicking there, pointing the browser at the link you posted or the "Comma Delimited File" link shows METAR data. It seems to set some cookies. For example, page.info() shows that "Prefs" includes "SHOWMETAR:1":

Set-Cookie: Prefs=FAVS:1|WXSN:1|PWSOBS:1|WPHO:1|PHOT:1|RADC:0|RADALL:0|HIST0:NULL|GIFT:1|SHOWMETAR:1|PHOTOTHUMBS:50|HISTICAO:KBUF*NULL|; path=/; expires=Fri, 01-Jan-2020 00:00:00 GMT; domain=.wunderground.com

import urllib2
import cookielib

cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))

setmetar = 'http://www.wunderground.com/cgi-bin/findweather/getForecast?setpref=SHOWMETAR&value=1'
request = urllib2.Request(setmetar)
response = opener.open(request)

url = "http://www.wunderground.com/history/airport/KBUF/2011/1/1/DailyHistory.html?theprefset=SHOWMETAR&theprefvalue=1&format=1"
request = urllib2.Request(url)
page = opener.open(request)
# print(page.info())
dailyData = page.read()                            
print dailyData

yields

TimeEST,TemperatureF,Dew PointF,Humidity,Sea Level PressureIn,VisibilityMPH,Wind Direction,Wind SpeedMPH,Gust SpeedMPH,PrecipitationIn,Events,Conditions,FullMetar,WindDirDegrees,DateUTC<br />
12:54 AM,52.0,45.0,77,29.93,10.0,SSW,15.0,-,N/A,,Scattered Clouds,METAR KBUF 010554Z COR 20013KT 10SM FEW045 SCT140 11/07 A2992 RMK AO2 SLP134 60004 T01110072 10111 20078 58016,200,2011-01-01 05:54:00<br />
1:54 AM,53.1,45.0,74,29.95,10.0,SSW,12.7,-,N/A,,Mostly Cloudy,METAR KBUF 010654Z 20011KT 10SM BKN055 BKN130 12/07 A2994 RMK AO2 SLP141 T01170072,200,2011-01-01 06:54:00<br />

回复收藏 0 原文