当前位置：文江博客话题详情

美丽的汤线搭配

发布于 2024-12-06 04:39:53 字数 1452 浏览 0 评论 0原文

我正在尝试构建一个仅包含表头和与我相关的行的 html 表。我正在使用的网站是 http://wolk.vlan77.be/~gerben。

我正在尝试获取表头和表条目，这样我就不必每次都查找自己的名字。

我想做的：

获取html页面
解析它以获取表的标题
解析它以获取与我相关的表标签行（因此表行包含卢卡斯）
构建一个显示标题和表条目的html页面与我相关

我现在正在做的事情：

首先使用 beautifulsoup 获取标头
获取我的条目
将两者添加到数组

将此数组传递给生成可以打印为 html 页面的字符串的方法

def downloadURL(自己): 全局输入 filehandle = self.urllib.urlopen('http://wolk.vlan77.be/~gerben') 输入='' 对于 filehandle.readlines() 中的行：输入+=行文件句柄.close()

def soupParserToTable(self,input):
    全局标题

    汤= self.BeautifulSoup(输入)
    header = soup.first('tr')
    表输入='0'

    表 = soup.findAll('tr')
    对于表中的行：
        打印行
        打印'\n\n'
        如果“卢卡斯”在行：
            打印“真”
        别的：
            打印“假”
        打印'\n\n****************\n\n'

我想从包含 lucas 的 html 文件中获取行，但是当我像这样运行它时，我在输出中得到这个：

 **************** 


<tr><td>lucas.vlan77.be</td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> </tr>



false

现在我不明白为什么它不匹配，字符串 lucas 是清楚地在那里：/？

原文

Im trying to build a html table that only contains the table header and the row that is relevant to me. The site I'm using is http://wolk.vlan77.be/~gerben.

I'm trying to get the the table header and my the table entry so I do not have to look each time for my own name.

What I want to do :

get the html page
Parse it to get the header of the table
Parse it to get the line with table tags relevant to me (so the table row containing lucas)
Build a html page that shows the header and table entry relevant to me

What I am doing now :

get the header with beautifulsoup first
get my entry
add both to an array

pass this array to a method that generates a string that can be printed as html page

def downloadURL(self):
global input
filehandle = self.urllib.urlopen('http://wolk.vlan77.be/~gerben')
input = ''
for line in filehandle.readlines():
input += line
filehandle.close()

def soupParserToTable(self,input):
    global header

    soup = self.BeautifulSoup(input)
    header = soup.first('tr')
    tableInput='0'

    table = soup.findAll('tr')
    for line in table:
        print line
        print '\n \n'
        if '''lucas''' in line:
            print 'true'
        else:
            print 'false'
        print '\n \n **************** \n \n'

I want to get the line from the html file that contains lucas, however when I run it like this I get this in my output :

 **************** 


<tr><td>lucas.vlan77.be</td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> <td><span style="color:green;font-weight:bold">V</span></td> </tr>



false

Now I don't get why it doesn't match, the string lucas is clearly in there :/ ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半世蒼涼 2024-12-13 04:39:53

看起来你把这件事过于复杂化了。

这是一个更简单的版本...

>>> import BeautifulSoup
>>> import urllib2
>>> html = urllib2.urlopen('http://wolk.vlan77.be/~gerben')
>>> soup = BeautifulSoup.BeautifulSoup(html)
>>> print soup.find('td', text=lambda data: data.string and 'lucas' in data.string)
lucas.vlan77.be

It looks like you're over-complicating this.

Here's a simpler version...

>>> import BeautifulSoup
>>> import urllib2
>>> html = urllib2.urlopen('http://wolk.vlan77.be/~gerben')
>>> soup = BeautifulSoup.BeautifulSoup(html)
>>> print soup.find('td', text=lambda data: data.string and 'lucas' in data.string)
lucas.vlan77.be

回复收藏 0 原文