打印某些 HTML Python Mechanize
我正在制作一个用于自动登录网站的小 python 脚本。但我被困住了。
我希望将 html 的一小部分打印到终端中,该部分位于网站 html 文件中的此标记内:
<td class=h3 align='right'> John Appleseed</td><td> <a href="members_myaccount.php"><img border=0 src="../tbs_v7_0/images/myaccount.gif" alt="My Account"></a></td>
但是如何提取并打印名称 John Appleseed?
顺便说一下,我在 Mac 上使用 Python 的 Mechanize。
Im making a small python script for auto logon to a website. But i'm stuck.
I'm looking to print into terminal a small part of the html, located within this tag in the html file on the site:
<td class=h3 align='right'> John Appleseed</td><td> <a href="members_myaccount.php"><img border=0 src="../tbs_v7_0/images/myaccount.gif" alt="My Account"></a></td>
But how do I extract and print just the name, John Appleseed?
I'm using Pythons' Mechanize on a mac, by the way.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
Mechanize 仅适用于获取 html。一旦您想从 html 中提取信息,您可以使用 BeautifulSoup 等。 (另请参阅我对类似问题的回答:网页挖掘或抓取或爬行?我应该使用什么工具/库?)
取决于
位于 html 中(从您的问题中不清楚),您可以使用以下代码:
Mechanize is only good for fetching the html. Once you want to extract information from the html, you could use for example BeautifulSoup. (See also my answer to a similar question: Web mining or scraping or crawling? What tool/library should I use?)
Depending on where the
<td>
is located in the html (it's unclear from your question), you could use the following code:由于您尚未提供页面的完整 HTML,因此现在唯一的选择是使用 string.find() 或正则表达式。
但是,找到这个的标准方法是使用 xpath。看到这个问题:How to use Xpath in Python?
即可获取xpath对于使用 Firefox 的“检查元素”功能的元素。
例如,如果您想在 stackoverflow 站点中查找用户名的 XPATH。
As you have not provided the full HTML of the page, the only option right now is either using string.find() or regular expressions.
But, the standard way of finding this is using xpath. See this question: How to use Xpath in Python?
You can obtain the xpath for an element using "inspect element" feature of firefox.
For ex, if you want to find the XPATH for username in stackoverflow site.
您可以使用解析器提取文档中的任何信息。我建议您使用
lxml
模块。这里有一个示例:
有关
lxml
的更多信息此处You can use a parser to extract any information in a document. I suggest you to use
lxml
module.Here you have an example:
More information about
lxml
here