Feedparser（和 urllib2）问题：连接超时

发布于 2024-12-17 11:39:00 字数 950 浏览 4 评论 0原文

，大多数时候我都会收到以下错误：

urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

从 Python 中的 urllib2 和 feedparser 库开始，每当尝试连接并从特定 URL 获取内容时，直接使用 feedparser.parser 并高级，其中我首先使用 urllib2 库来获取 XML 内容）粘贴在下面。

# test-1
import feedparser
f = feedparser.parse('http://www.zurnal24.si/index.php?ctl=show_rss')
title = f['channel']['title']
print title

# test-2
import urllib2
import feedparser
url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
request = opener.open(url)
response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title

当我尝试使用不同的 URL 地址（例如，http://www.delo.si/rss/）时，一切正常。请注意，所有 URL 都会指向非英语（即斯洛文尼亚语）RSS 源。

我从本地和远程计算机（通过 ssh）运行实验。报告的错误在远程计算机上更频繁地发生，尽管即使在本地主机上也是不可预测的。

任何建议将不胜感激。

原文

Starting with urllib2 and feedparser libraries in Python I'm getting the following error most of the time whenever try to connect and fetch content from particular URL:

urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

The minimal reproducible examples (basic, using feedparser.parser directly and advanced, where I use urllib2 library first to fetch XML content) are pasted below.

# test-1
import feedparser
f = feedparser.parse('http://www.zurnal24.si/index.php?ctl=show_rss')
title = f['channel']['title']
print title

# test-2
import urllib2
import feedparser
url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
request = opener.open(url)
response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title

When I try with different URL addresses (e.g., http://www.delo.si/rss/), everything works fine. Please note that all URL's lead to non-english (i.e., Slovenian) RSS feeds.

I run my experiments both from local and remote machine (via ssh). The error reported occurs more frequently on remote machine, although it's unpredictable even on local host.

Any suggestions would be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拿命拼未来 2024-12-24 11:39:00

超时多久发生一次？如果不频繁，您可以在每次超时后等待，然后重试请求：

import urllib2
import feedparser
import time
import sys

url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]

# Try to connect a few times, waiting longer after each consecutive failure
MAX_ATTEMPTS = 8
for attempt in range(MAX_ATTEMPTS):
    try:
        request = opener.open(url)
        break
    except urllib2.URLError, e:
        sleep_secs = attempt ** 2
        print >> sys.stderr, 'ERROR: %s.\nRetrying in %s seconds...' % (e, sleep_secs)            
        time.sleep(sleep_secs)

response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title

How often does the timeout occur? If it's not frequent, you could wait after each timeout and then retry the request:

import urllib2
import feedparser
import time
import sys

url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]

# Try to connect a few times, waiting longer after each consecutive failure
MAX_ATTEMPTS = 8
for attempt in range(MAX_ATTEMPTS):
    try:
        request = opener.open(url)
        break
    except urllib2.URLError, e:
        sleep_secs = attempt ** 2
        print >> sys.stderr, 'ERROR: %s.\nRetrying in %s seconds...' % (e, sleep_secs)            
        time.sleep(sleep_secs)

response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title

回复收藏 0 原文