Python 3.2 lxml填写并提交表单,选择多个,怎么做?值不起作用
这个页面很棒,来自 Perl 世界,几年无所事事之后,我又重新开始编程(这个网页不存在,事情发生了怎样的变化)。现在,经过整整两天的搜索,我打出了最后一张牌,在这里寻求帮助。
在mac环境下工作,使用python 3.2和lxml 2.3(在www.jtmoon.com/?p=21之后安装),我想做的事情:
- web: http://biodbnet.abcc.ncifcrf.gov/db/db2db.php
- 填写您找到的表格在那里
- 提交
我的代码。我尝试了几次并输出了代码。
from lxml.html import parse, submit_form, tostring
page = parse('http://biodbnet.abcc.ncifcrf.gov/db/db2db.php').getroot()
page.forms[0].fields['input'] = 'GI Number'
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
page.forms[0].fields['hasComma'] = 'no'
page.forms[0].fields['removeDupValues'] = 'yes'
page.forms[0].fields['request'] = 'db2db'
page.forms[0].action = 'http://biodbnet.abcc.ncifcrf.gov/db/db2dbRes.php'
page.forms[0].fields['idList'] = '86439006'
submit_form(page.forms[0])
输出:
File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1058, in _value__set
"You must pass in a sequence")
TypeError: You must pass in a sequence
因此,由于该元素是多选元素,我知道我必须给出一个列表
page.forms[0].inputs['outputs[]'].value = list('Gene ID')
输出:
File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = list('Gene ID')
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1059, in _value__set
self.value.clear()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/_setmixin.py", line 115, in clear
self.remove(item)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1159, in remove
"The option %r is not currently selected" % item)
ValueError: The option 'Affy ID' is not currently selected
“Affy ID”是列表的第一个选项值,并且未选择它。但它有什么问题呢?
令人惊讶的是,如果我改为放置
page.forms[0].inputs['outputs[]'].multiple = list('Gene ID')
#page.forms[0].inputs['outputs[]'].value = list('Gene ID')
“Then”,不知怎的,lxml 会喜欢它,然后继续。但是,根据 lxml 文档,多个属性应该是一个布尔值(实际上,如果我打印该值,则应该是),我不应该触摸它,并且该项目的“值”实际上应该指向所选项目。
新的输出
File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 87, in <module>
submit_form(page.forms[0])
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 856, in submit_form
return open_http(form.method, url, values)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 876, in open_http_urllib
return urlopen(url, data)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 364, in open
req = meth(req)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 1052, in do_request_
raise TypeError("POST data should be bytes"
TypeError: POST data should be bytes or an iterable of bytes. It cannot be str.
那么,可以做什么?我确信使用 python 2.6 我可以使用 mecanize,或者也许 lxml 可以工作?但我真的不想在某种已弃用的版本中进行编码。我很喜欢 python,但我开始考虑回到 perl。也许这可能是一个聪明的举动?
任何帮助将非常感谢
Gerard
- Reading 在这个论坛中,我找到 pythonpaste.org,它可以替代 lxml 吗?
Great page this one, coming from the perl world and after several years of doing nothing, I've re-started to program again (this web page didn't exist, how things change). And now, after a 2 full-days of searching, I play the last card of asking here for help.
Working under mac environment, with python 3.2 and lxml 2.3 (installed following www.jtmoon.com/?p=21), what I am trying to do:
- web: http://biodbnet.abcc.ncifcrf.gov/db/db2db.php
- to fill the form that you find there
- to submit it
My code. I put several attempts and the output code.
from lxml.html import parse, submit_form, tostring
page = parse('http://biodbnet.abcc.ncifcrf.gov/db/db2db.php').getroot()
page.forms[0].fields['input'] = 'GI Number'
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
page.forms[0].fields['hasComma'] = 'no'
page.forms[0].fields['removeDupValues'] = 'yes'
page.forms[0].fields['request'] = 'db2db'
page.forms[0].action = 'http://biodbnet.abcc.ncifcrf.gov/db/db2dbRes.php'
page.forms[0].fields['idList'] = '86439006'
submit_form(page.forms[0])
Output:
File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = 'Gene ID'
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1058, in _value__set
"You must pass in a sequence")
TypeError: You must pass in a sequence
So, since that element is a multi-select element, I understand that I have to give a list
page.forms[0].inputs['outputs[]'].value = list('Gene ID')
Output:
File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 30, in <module>
page.forms[0].inputs['outputs[]'].value = list('Gene ID')
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1059, in _value__set
self.value.clear()
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/_setmixin.py", line 115, in clear
self.remove(item)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 1159, in remove
"The option %r is not currently selected" % item)
ValueError: The option 'Affy ID' is not currently selected
'Affy ID' is the first option value of the list, and it is not selected. But what's the problem with it?
Surprisingly, if I instead put
page.forms[0].inputs['outputs[]'].multiple = list('Gene ID')
#page.forms[0].inputs['outputs[]'].value = list('Gene ID')
Then, somehow lxml likes it, and move on. However, the multiple attribute should be a boolean (actually it is if I print the value), I shouldn't touch it, and the "value" of the item should actually point to the selected items, according to the lxml docs.
The new output
File "/Users/gerard/Desktop/barbacue/MGFtoXML.py", line 87, in <module>
submit_form(page.forms[0])
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 856, in submit_form
return open_http(form.method, url, values)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/site-packages/lxml/html/__init__.py", line 876, in open_http_urllib
return urlopen(url, data)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 364, in open
req = meth(req)
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 1052, in do_request_
raise TypeError("POST data should be bytes"
TypeError: POST data should be bytes or an iterable of bytes. It cannot be str.
So, what can be done?? I am sure that with python 2.6 I could use mecanize, or that perhaps lxml could work? But I really don't want to code in a sort-of deprecated version. I am enjoying a lot python, but I am starting to consider going back to perl. Perhaps this could be a smart movement??
Any help will be hugely appreciated
Gerard
- Reading in this forum, I find pythonpaste.org, could it be a replacement for lxml?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将序列传递给
list()
将从该序列生成一个列表。'Gene ID'
是序列(即字符序列)。因此list('Gene ID')
将生成一个字符列表,如下所示:这不是您想要的。试试这个:
换句话说:
这应该会让你前进一点。
Passing in a sequence to
list()
will generate a list from that sequence.'Gene ID'
is sequence (namely a sequence of characters). Solist('Gene ID')
will generate a list of characters, like so:That's not what you want. Try this:
In other words:
That should take you a bit forward.