美丽的汤线搭配
我正在尝试构建一个仅包含表头和与我相关的行的 html 表。我正在使用的网站是 http://wolk.vlan77.be/~gerben。
我正在尝试获取表头和表条目,这样我就不必每次都查找自己的名字。
我想做的:
- 获取html页面
- 解析它以获取表的标题
- 解析它以获取与我相关的表标签行(因此表行包含卢卡斯)
- 构建一个显示标题和表条目的html页面与我相关
我现在正在做的事情:
- 首先使用 beautifulsoup 获取标头
- 获取我的条目
- 将两者添加到数组
将此数组传递给生成可以打印为 html 页面的字符串的方法
def downloadURL(自己): 全局输入 filehandle = self.urllib.urlopen('http://wolk.vlan77.be/~gerben') 输入='' 对于 filehandle.readlines() 中的行: 输入+=行 文件句柄.close()
def soupParserToTable(self,input): 全局标题 汤= self.BeautifulSoup(输入) header = soup.first('tr') 表输入='0' 表 = soup.findAll('tr') 对于表中的行: 打印行 打印'\n\n' 如果“卢卡斯”在行: 打印“真” 别的: 打印“假” 打印'\n\n****************\n\n'
我想从包含 lucas 的 html 文件中获取行,但是当我像这样运行它时,我在输出中得到这个:
****************
<tr><td>lucas.vlan77.be</td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> </tr>
false
现在我不明白为什么它不匹配,字符串 lucas 是清楚地在那里:/?
Im trying to build a html table that only contains the table header and the row that is relevant to me. The site I'm using is http://wolk.vlan77.be/~gerben.
I'm trying to get the the table header and my the table entry so I do not have to look each time for my own name.
What I want to do :
- get the html page
- Parse it to get the header of the table
- Parse it to get the line with table tags relevant to me (so the table row containing lucas)
- Build a html page that shows the header and table entry relevant to me
What I am doing now :
- get the header with beautifulsoup first
- get my entry
- add both to an array
pass this array to a method that generates a string that can be printed as html page
def downloadURL(self):
global input
filehandle = self.urllib.urlopen('http://wolk.vlan77.be/~gerben')
input = ''
for line in filehandle.readlines():
input += line
filehandle.close()def soupParserToTable(self,input): global header soup = self.BeautifulSoup(input) header = soup.first('tr') tableInput='0' table = soup.findAll('tr') for line in table: print line print '\n \n' if '''lucas''' in line: print 'true' else: print 'false' print '\n \n **************** \n \n'
I want to get the line from the html file that contains lucas, however when I run it like this I get this in my output :
****************
<tr><td>lucas.vlan77.be</td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> </tr>
false
Now I don't get why it doesn't match, the string lucas is clearly in there :/ ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
看起来你把这件事过于复杂化了。
这是一个更简单的版本...
It looks like you're over-complicating this.
Here's a simpler version...
这是因为 line 不是字符串,而是 BeautifulSoup.Tag 实例。尝试获取 td 值:
It's because line is not a string, but BeautifulSoup.Tag instance. Try to get td value instead: