如何修复或例外处理此错误

发布于 2024-07-26 09:07:03 字数 1489 浏览 5 评论 0 原文

我正在创建一个从任何网页获取图像 url 的代码,该代码使用 python 编写,并使用 BeutifulSoup 和 httplib2。 当我运行代码时,出现下一个错误:

Look me http://movies.nytimes.com          (this line is printed by the code)
Traceback (most recent call last):
File "main.py", line 103, in <module>
visit(initialList,profundidad)
File "main.py", line 98, in visit
visit(dodo[indice], bottom -1)
File "main.py", line 94, in visit
getImages(w)
File "main.py", line 34, in getImages
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 942, column 118

有人可以向我解释如何修复或创建错误的例外

I'm creating a code that gets image's urls from any web pages, the code are in python and use BeutifulSoup and httplib2.
When I run the code, I get the next error:

Look me http://movies.nytimes.com          (this line is printed by the code)
Traceback (most recent call last):
File "main.py", line 103, in <module>
visit(initialList,profundidad)
File "main.py", line 98, in visit
visit(dodo[indice], bottom -1)
File "main.py", line 94, in visit
getImages(w)
File "main.py", line 34, in getImages
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 942, column 118

Someone can explain me how to fix or make an exeption for the error

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

困倦 2024-08-02 09:07:03

您使用的是最新版本的 BeautifulSoup 吗?
这似乎是版本 3.1.x 的一个已知问题,因为它开始使用新的解析器(HTMLParser,而不是 SGMLParser),该解析器在处理格式错误的 HTML 方面表现较差。 您可以在BeautifulSoup 网站上找到更多相关信息。
作为快速解决方案,您可以简单地使用旧版本 (3.0.7a)。

Are you using latest version of BeautifulSoup?
This seems a known issue of version 3.1.x, because it started using a new parser (HTMLParser, instead of SGMLParser) that is much worse at processing malformed HTML. You can find more information about this on BeautifulSoup website.
As a quick solution, you can simply use an older version (3.0.7a).

若言繁花未落 2024-08-02 09:07:03

要专门捕获该错误,请将代码更改为如下所示:

try:
    iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))

except HTMLParseError:
    #Do something intelligent here

以下是有关 Python try except 块的更多阅读内容:
http://docs.python.org/tutorial/errors.html

To catch that error specifically, change your code to look like this:

try:
    iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))

except HTMLParseError:
    #Do something intelligent here

Here's some more reading on Python's try except blocks:
http://docs.python.org/tutorial/errors.html

成熟的代价 2024-08-02 09:07:03

当我的 HTML 文档中有字符串 =& 时,我收到了该错误。 当我替换该字符串(在我的例子中为 =and)时,我不再收到该解析错误。

I got that error when I had the string =& in my HTML document. When I replaced that string (in my case with =and) then I no longer received that parsing error.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文