如何修复或例外处理此错误
我正在创建一个从任何网页获取图像 url 的代码,该代码使用 python 编写,并使用 BeutifulSoup 和 httplib2。 当我运行代码时,出现下一个错误:
Look me http://movies.nytimes.com (this line is printed by the code)
Traceback (most recent call last):
File "main.py", line 103, in <module>
visit(initialList,profundidad)
File "main.py", line 98, in visit
visit(dodo[indice], bottom -1)
File "main.py", line 94, in visit
getImages(w)
File "main.py", line 34, in getImages
iSoupList = BeautifulSoup(response, parseOnlyThese=SoupStrainer('img'))
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 942, column 118
有人可以向我解释如何修复或创建错误的例外
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您使用的是最新版本的 BeautifulSoup 吗?
这似乎是版本 3.1.x 的一个已知问题,因为它开始使用新的解析器(HTMLParser,而不是 SGMLParser),该解析器在处理格式错误的 HTML 方面表现较差。 您可以在BeautifulSoup 网站上找到更多相关信息。
作为快速解决方案,您可以简单地使用旧版本 (3.0.7a)。
Are you using latest version of BeautifulSoup?
This seems a known issue of version 3.1.x, because it started using a new parser (HTMLParser, instead of SGMLParser) that is much worse at processing malformed HTML. You can find more information about this on BeautifulSoup website.
As a quick solution, you can simply use an older version (3.0.7a).
要专门捕获该错误,请将代码更改为如下所示:
以下是有关 Python try except 块的更多阅读内容:
http://docs.python.org/tutorial/errors.html
To catch that error specifically, change your code to look like this:
Here's some more reading on Python's try except blocks:
http://docs.python.org/tutorial/errors.html
当我的 HTML 文档中有字符串 =& 时,我收到了该错误。 当我替换该字符串(在我的例子中为 =and)时,我不再收到该解析错误。
I got that error when I had the string =& in my HTML document. When I replaced that string (in my case with =and) then I no longer received that parsing error.