Feedparser(和 urllib2)问题:连接超时

发布于 2024-12-17 11:39:00 字数 950 浏览 4 评论 0原文

,大多数时候我都会收到以下错误:

urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

从 Python 中的 urllib2 和 feedparser 库开始,每当尝试连接并从特定 URL 获取内容时 ,直接使用 feedparser.parser 并高级,其中我首先使用 urllib2 库来获取 XML 内容)粘贴在下面。

# test-1
import feedparser
f = feedparser.parse('http://www.zurnal24.si/index.php?ctl=show_rss')
title = f['channel']['title']
print title

# test-2
import urllib2
import feedparser
url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
request = opener.open(url)
response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title

当我尝试使用不同的 URL 地址(例如,http://www.delo.si/rss/)时,一切正常。请注意,所有 URL 都会指向非英语(即斯洛文尼亚语)RSS 源。

我从本地和远程计算机(通过 ssh)运行实验。报告的错误在远程计算机上更频繁地发生,尽管即使在本地主机上也是不可预测的。

任何建议将不胜感激。

Starting with urllib2 and feedparser libraries in Python I'm getting the following error most of the time whenever try to connect and fetch content from particular URL:

urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

The minimal reproducible examples (basic, using feedparser.parser directly and advanced, where I use urllib2 library first to fetch XML content) are pasted below.

# test-1
import feedparser
f = feedparser.parse('http://www.zurnal24.si/index.php?ctl=show_rss')
title = f['channel']['title']
print title

# test-2
import urllib2
import feedparser
url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
request = opener.open(url)
response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title

When I try with different URL addresses (e.g., http://www.delo.si/rss/), everything works fine. Please note that all URL's lead to non-english (i.e., Slovenian) RSS feeds.

I run my experiments both from local and remote machine (via ssh). The error reported occurs more frequently on remote machine, although it's unpredictable even on local host.

Any suggestions would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

拿命拼未来 2024-12-24 11:39:00

超时多久发生一次?如果不频繁,您可以在每次超时后等待,然后重试请求:

import urllib2
import feedparser
import time
import sys

url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]

# Try to connect a few times, waiting longer after each consecutive failure
MAX_ATTEMPTS = 8
for attempt in range(MAX_ATTEMPTS):
    try:
        request = opener.open(url)
        break
    except urllib2.URLError, e:
        sleep_secs = attempt ** 2
        print >> sys.stderr, 'ERROR: %s.\nRetrying in %s seconds...' % (e, sleep_secs)            
        time.sleep(sleep_secs)

response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title

How often does the timeout occur? If it's not frequent, you could wait after each timeout and then retry the request:

import urllib2
import feedparser
import time
import sys

url = 'http://www.zurnal24.si/index.php?ctl=show_rss'
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]

# Try to connect a few times, waiting longer after each consecutive failure
MAX_ATTEMPTS = 8
for attempt in range(MAX_ATTEMPTS):
    try:
        request = opener.open(url)
        break
    except urllib2.URLError, e:
        sleep_secs = attempt ** 2
        print >> sys.stderr, 'ERROR: %s.\nRetrying in %s seconds...' % (e, sleep_secs)            
        time.sleep(sleep_secs)

response = request.read()
feed = feedparser.parse(response)
title = feed['channel']['title']
print title
無處可尋 2024-12-24 11:39:00

正如错误所示,这是一个连接问题。这可能是您的互联网连接或其服务器/连接/带宽的问题。

一个简单的解决方法是在 while 循环中进行 feedparsing,当然保留最大重试次数的计数器。

As the error denotes, it is a connection problem. This may be a problem with your internet connection or with their servers/connection/bandwidth..

A simple workaround is to do your feedparsing in a while loop, of course keeping a counter of MAX retries..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文