Python 3.2 lxml填写并提交表单,选择多个,怎么做?值不起作用

发布于 2024-11-18 09:32:34 字数 3653 浏览 3 评论 0原文

这个页面很棒,来自 Perl 世界,几年无所事事之后,我又重新开始编程(这个网页不存在,事情发生了怎样的变化)。现在,经过整整两天的搜索,我打出了最后一张牌,在这里寻求帮助。

在mac环境下工作,使用python 3.2和lxml 2.3(在www.jtmoon.com/?p=21之后安装),我想做的事情:

我的代码。我尝试了几次并输出了代码。

from lxml.html import parse, submit_form, tostring
page = parse('http://biodbnet.abcc.ncifcrf.gov/db/db2db.php').getroot()
page.forms[0].fields['input'] = 'GI Number'
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
page.forms[0].fields['hasComma'] = 'no'
page.forms[0].fields['removeDupValues'] = 'yes'
page.forms[0].fields['request'] = 'db2db'
page.forms[0].action = 'http://biodbnet.abcc.ncifcrf.gov/db/db2dbRes.php'
page.forms[0].fields['idList'] = '86439006'
submit_form(page.forms[0])

输出:

File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1058, in _value__set
"You must pass in a sequence")
TypeError: You must pass in a sequence

因此,由于该元素是多选元素,我知道我必须给出一个列表

page.forms[0].inputs['outputs[]'].value = list('Gene ID')

输出:

File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = list('Gene ID')
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1059, in _value__set
self.value.clear()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/_setmixin.py", line 115, in clear
self.remove(item)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1159, in remove
"The option %r is not currently selected" % item)
ValueError: The option 'Affy ID' is not currently selected

“Affy ID”是列表的第一个选项值,并且未选择它。但它有什么问题呢?

令人惊讶的是,如果我改为放置

page.forms[0].inputs['outputs[]'].multiple = list('Gene ID')
#page.forms[0].inputs['outputs[]'].value = list('Gene ID')

“Then”,不知怎的,lxml 会喜欢它,然后继续。但是,根据 lxml 文档,多个属性应该是一个布尔值(实际上,如果我打印该值,则应该是),我不应该触摸它,并且该项目的“值”实际上应该指向所选项目。

新的输出

File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 87, in <module>
submit_form(page.forms[0])
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 856, in submit_form
return open_http(form.method, url, values)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 876, in open_http_urllib
return urlopen(url, data)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 364, in open
req = meth(req)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 1052, in do_request_
raise TypeError("POST data should be bytes"
TypeError: POST data should be bytes or an iterable of bytes. It cannot be str.

那么,可以做什么?我确信使用 python 2.6 我可以使用 mecanize,或者也许 lxml 可以工作?但我真的不想在某种已弃用的版本中进行编码。我很喜欢 python,但我开始考虑回到 perl。也许这可能是一个聪明的举动?

任何帮助将非常感谢

Gerard

  • Reading 在这个论坛中,我找到 pythonpaste.org,它可以替代 lxml 吗?

Great page this one, coming from the perl world and after several years of doing nothing, I've re-started to program again (this web page didn't exist, how things change). And now, after a 2 full-days of searching, I play the last card of asking here for help.

Working under mac environment, with python 3.2 and lxml 2.3 (installed following www.jtmoon.com/?p=21), what I am trying to do:

My code. I put several attempts and the output code.

from lxml.html import parse, submit_form, tostring
page = parse('http://biodbnet.abcc.ncifcrf.gov/db/db2db.php').getroot()
page.forms[0].fields['input'] = 'GI Number'
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
page.forms[0].fields['hasComma'] = 'no'
page.forms[0].fields['removeDupValues'] = 'yes'
page.forms[0].fields['request'] = 'db2db'
page.forms[0].action = 'http://biodbnet.abcc.ncifcrf.gov/db/db2dbRes.php'
page.forms[0].fields['idList'] = '86439006'
submit_form(page.forms[0])

Output:

File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1058, in _value__set
"You must pass in a sequence")
TypeError: You must pass in a sequence

So, since that element is a multi-select element, I understand that I have to give a list

page.forms[0].inputs['outputs[]'].value = list('Gene ID')

Output:

File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = list('Gene ID')
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1059, in _value__set
self.value.clear()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/_setmixin.py", line 115, in clear
self.remove(item)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1159, in remove
"The option %r is not currently selected" % item)
ValueError: The option 'Affy ID' is not currently selected

'Affy ID' is the first option value of the list, and it is not selected. But what's the problem with it?

Surprisingly, if I instead put

page.forms[0].inputs['outputs[]'].multiple = list('Gene ID')
#page.forms[0].inputs['outputs[]'].value = list('Gene ID')

Then, somehow lxml likes it, and move on. However, the multiple attribute should be a boolean (actually it is if I print the value), I shouldn't touch it, and the "value" of the item should actually point to the selected items, according to the lxml docs.

The new output

File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 87, in <module>
submit_form(page.forms[0])
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 856, in submit_form
return open_http(form.method, url, values)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 876, in open_http_urllib
return urlopen(url, data)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 364, in open
req = meth(req)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 1052, in do_request_
raise TypeError("POST data should be bytes"
TypeError: POST data should be bytes or an iterable of bytes. It cannot be str.

So, what can be done?? I am sure that with python 2.6 I could use mecanize, or that perhaps lxml could work? But I really don't want to code in a sort-of deprecated version. I am enjoying a lot python, but I am starting to consider going back to perl. Perhaps this could be a smart movement??

Any help will be hugely appreciated

Gerard

  • Reading in this forum, I find pythonpaste.org, could it be a replacement for lxml?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

春庭雪 2024-11-25 09:32:34

将序列传递给 list() 将从该序列生成一个列表。 'Gene ID'是序列(即字符序列)。因此 list('Gene ID') 将生成一个字符列表,如下所示:

>>> list('Gene ID')
['G', 'e', 'n', 'e', ' ', 'I', 'D']

这不是您想要的。试试这个:

>>> ['Gene ID']
['Gene ID']

换句话说:

page.forms[0].inputs['outputs[]'].value = ['Gene ID']

这应该会让你前进一点。

Passing in a sequence to list() will generate a list from that sequence. 'Gene ID' is sequence (namely a sequence of characters). So list('Gene ID') will generate a list of characters, like so:

>>> list('Gene ID')
['G', 'e', 'n', 'e', ' ', 'I', 'D']

That's not what you want. Try this:

>>> ['Gene ID']
['Gene ID']

In other words:

page.forms[0].inputs['outputs[]'].value = ['Gene ID']

That should take you a bit forward.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文