格式错误的开始标记错误 - Python、BeautifulSoup 和 Sipie - Ubuntu 10.04
我刚刚安装了 python、mplayer、beautifulsoup 和 sipie,以便在我的 Ubuntu 10.04 计算机上运行 Sirius。我遵循了一些看似简单的文档,但遇到了一些问题。我对 Python 不太熟悉,所以这可能超出了我的范围。
我能够安装所有内容,但运行 sipie 会给出以下信息:
/usr/bin/Sipie/Sipie/Config.py:12: DeprecationWarning: the md5 module is deprecated;使用 hashlib 代替 import md5
回溯(最近一次调用最后一次): 文件“/usr/bin/Sipie/sipie.py”,第 22 行,在
文件“/usr/bin/Sipie/Sipie/cliPlayer.py”,第 74 行,在 cliPlayer 中 Completer = Completer(sipie.getStreams())
文件“/usr/bin/Sipie/Sipie/Factory.py”,第 374 行,在 getStreams 中 流 = self.tryGetStreams()
文件“/usr/bin/Sipie/Sipie/Factory.py”,第 298 行,位于 tryGetStreams 汤 = BeautifulSoup(数据)
文件“/usr/local/lib/python2.6/dist-packages/BeautifulSoup-3.1.0.1-py2.6.egg/BeautifulSoup.py”,第 1499 行,位于 __init__ 中 BeautifulStoneSoup.__init__(self, *args, **kwargs)
文件“/usr/local/lib/python2.6/dist-packages/BeautifulSoup-3.1.0.1-py2.6.egg/BeautifulSoup.py”,第 1230 行,位于 __init__ 中 self._feed(isHTML=isHTML)文件“/usr/local/lib/python2.6/dist-packages/BeautifulSoup-3.1.0.1-py2.6.egg/BeautifulSoup.py”,第 1263 行,位于 _feed self.builder.feed(标记)
文件“/usr/lib/python2.6/HTMLParser.py”,第 108 行,提要中 self.goahead(0)
文件“/usr/lib/python2.6/HTMLParser.py”,第 148 行,在 goahead 中 k = self.parse_starttag(i)
文件“/usr/lib/python2.6/HTMLParser.py”,第 226 行,在 parse_starttag 中 endpos = self.check_for_whole_start_tag(i)
文件“/usr/lib/python2.6/HTMLParser.py”,第 301 行,在 check_for_whole_start_tag 中 self.error("格式错误的开始标记")
文件“/usr/lib/python2.6/HTMLParser.py”,第115行,错误 引发 HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: malformed start tag, at line 100, column 3
我查看了这些文件和行号,但由于我不熟悉 Python,所以它没有多大意义。关于下一步该做什么有什么建议吗?
I just installed python, mplayer, beautifulsoup and sipie to run Sirius on my Ubuntu 10.04 machine. I followed some docs that seem straightforward, but am encountering some issues. I'm not that familiar with Python, so this may be out of my league.
I was able to get everything installed, but then running sipie gives this:
/usr/bin/Sipie/Sipie/Config.py:12: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5
Traceback (most recent call last):
File "/usr/bin/Sipie/sipie.py", line 22, in <module>
Sipie.cliPlayer()File "/usr/bin/Sipie/Sipie/cliPlayer.py", line 74, in cliPlayer
completer = Completer(sipie.getStreams())File "/usr/bin/Sipie/Sipie/Factory.py", line 374, in getStreams
streams = self.tryGetStreams()File "/usr/bin/Sipie/Sipie/Factory.py", line 298, in tryGetStreams
soup = BeautifulSoup(data)File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup-3.1.0.1-py2.6.egg/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup-3.1.0.1-py2.6.egg/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)File "/usr/local/lib/python2.6/dist-packages/BeautifulSoup-3.1.0.1-py2.6.egg/BeautifulSoup.py", line 1263, in _feed
self.builder.feed(markup)File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())HTMLParser.HTMLParseError: malformed start tag, at line 100, column 3
I looked through these files and the line numbers, but since I am unfamiliar with Python, it doesn't make much sense. Any advice on what to do next?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
假设你使用的是BeautifulSoup4,我在官方文档中发现了一些关于此的内容: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
我尝试过这个,效果很好,就像@Joshua一样
Suppose you are using BeautifulSoup4, I found out something in the official document about this: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
I tried this and it works well, just like what @Joshua
您遇到的问题非常常见,它们专门处理格式错误的 HTML。就我而言,有一个 HTML 元素对属性值进行了双引号。我今天实际上遇到了这个问题,并在这样做时看到了您的帖子。在将其交给 BeautifulSoup 4 之前,我终于能够通过 html5lib 解析 HTML 来解决这个问题。
首先,您需要:
然后,运行此示例代码:
如果您对此代码有任何疑问或需要一点更具体的指导,请告诉我。 :)
The issues you are encountering are pretty common, and they deal specifically with mal-formed HTML. In my case, there was an HTML element which had double quoted an attribute's value. I ran into this issue today actually, and in so doing so came across your post. I was FINALLY able to resolve this issue through parsing the HTML through html5lib before handing it off the BeautifulSoup 4.
First off, you'll need to:
Then, run this example code:
If you have any questions about this code or need a little more specific guidance, just let me know. :)
较新版本的 BeautifulSoup 使用 HTMLParser 而不是 SGMLParser (由于 SGMLParser 是从 Python 3.0 标准库中删除)。因此,BeautifulSoup 无法再正确处理许多格式错误的 HTML 文档,我相信您在这里遇到了这种情况。
解决您的问题的方法可能是卸载BeautifulSoup,然后安装旧版本(在 Ubuntu 10.04LTS 上仍可与 Python 2.6 配合使用):
请注意,此临时解决方案将不再与 Python 3.0 配合使用(在未来版本的 Ubuntu 中可能会成为默认设置)。
Newer versions of BeautifulSoup uses HTMLParser rather than SGMLParser (due to SGMLParser being removed from the Python 3.0 standard library). As a result, BeautifulSoup can no longer process many malformed HTML documents correctly, which is what I believe you are encountering here.
A solution to your problem is likely to be to uninstall BeautifulSoup, and install an older version (which will still work with Python 2.6 on Ubuntu 10.04LTS):
Just be aware that this temporary solution will no longer work with Python 3.0 (which may become the default in future versions of Ubuntu).
命令行:
Python 3:
Command Line:
Python 3:
查看文件“/usr/bin/Sipie/Sipie/Factory.py”第298行中提到的“数据”中第100行的第3列
Look at column 3 of line 100 in the "data" that is mentioned in File "/usr/bin/Sipie/Sipie/Factory.py", line 298