Python：Mechanize 和 BeautifulSoup 无法在共享主机上运行

发布于 2024-09-04 06:34:14 字数 3094 浏览 4 评论 0原文

在我的本地计算机上，我使用 Python 的 mechanize 和 BeautifulSoup 包来抓取和解析网站内容，一切似乎都工作得很好。我已经通过 apt-get 安装了这些软件包。

在我的共享托管网站（DreamHost）上，我下载了 .tar.gz 文件，提取软件包，重命名目录（例如，从 BeautifulSoup-3.1.0.tar.gz 到 BeautifulSoup）并尝试运行命令。

我在 BeautifulSoup 中遇到了一个奇怪的错误；我不知道这是否与 Dreamhost 上旧版本的 Python 有关，与目录名称有关，还是其他原因。

[sanjose]$ python
Python 2.4.4 (#2, Jan 24 2010, 11:50:13) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from BeautifulSoup import BeautifulSoup                           
>>> import mechanize                                                  
>>> url='http://www.iaa.gov.il/Rashat/he-IL/Airports/BenGurion/informationForTravelers/OnlineFlights.aspx?flightsType=arr'
>>> br=mechanize.Browser()                                                                                                
>>> br.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)')]                                
>>> r=br.open(url)
>>> html=r.read()
>>> type(html)
<type 'str'>

我这样做是为了表明输入确实是一个字符串。现在让我们运行在我的本地计算机上运行的命令：

>>> soup    =   BeautifulSoup.BeautifulSoup(html)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1493, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1224, in __init__
    self._feed(isHTML=isHTML)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1257, in _feed
    self.builder.feed(markup)
  File "/usr/lib/python2.4/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.4/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.4/HTMLParser.py", line 268, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1011, in handle_starttag
    self.soup.unknown_starttag(name, attrs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1408, in unknown_starttag
    tag = Tag(self, name, attrs, self.currentTag, self.previous)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 525, in __init__
    self.attrs = map(convert, self.attrs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 524, in <lambda>
    val))
  File "/usr/lib/python2.4/sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
TypeError: expected string or buffer

有什么想法吗？

亚当

原文

I am writing a small site decorator to make my local airport site work with standard HTML.

On my local computer, I use Python's mechanize and BeautifulSoup packages to scrape and parse the site contents, and everything seems to work just fine. I have installed these packages via apt-get.

On my shared hosting site (at DreamHost) I have downloaded the .tar.gz files, extracted the packages, renamed the directories (e.g., from BeautifulSoup-3.1.0.tar.gz to BeautifulSoup) and tried to run the command.

I've got a bizarre error with BeautifulSoup; I don't know if it's about an older version of Python on Dreamhost, about directory names, or other reason.

[sanjose]$ python
Python 2.4.4 (#2, Jan 24 2010, 11:50:13) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from BeautifulSoup import BeautifulSoup                           
>>> import mechanize                                                  
>>> url='http://www.iaa.gov.il/Rashat/he-IL/Airports/BenGurion/informationForTravelers/OnlineFlights.aspx?flightsType=arr'
>>> br=mechanize.Browser()                                                                                                
>>> br.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)')]                                
>>> r=br.open(url)
>>> html=r.read()
>>> type(html)
<type 'str'>

I've done this to show that the input is indeed a string. Now let's run the command that works in my local computer:

>>> soup    =   BeautifulSoup.BeautifulSoup(html)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1493, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1224, in __init__
    self._feed(isHTML=isHTML)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1257, in _feed
    self.builder.feed(markup)
  File "/usr/lib/python2.4/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.4/HTMLParser.py", line 148, in goahead
    k = self.parse_starttag(i)
  File "/usr/lib/python2.4/HTMLParser.py", line 268, in parse_starttag
    self.handle_starttag(tag, attrs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1011, in handle_starttag
    self.soup.unknown_starttag(name, attrs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 1408, in unknown_starttag
    tag = Tag(self, name, attrs, self.currentTag, self.previous)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 525, in __init__
    self.attrs = map(convert, self.attrs)
  File "/home/adamatan/matan.name/natbug/BeautifulSoup/BeautifulSoup.py", line 524, in <lambda>
    val))
  File "/usr/lib/python2.4/sre.py", line 142, in sub
    return _compile(pattern, 0).sub(repl, string, count)
TypeError: expected string or buffer

Any ideas?

Adam

分享到QQ

分享到微博