如何在自定义网站中执行搜索并阅读结果?
我正在开发一个功能,以在线下载蛋白质.pdb文件,这是我正在创建的代码的一部分,该代码是通过我们的Aibind Machine Learning模型生成的对接蛋白质和配体。对于这些蛋白质中的大约60%,我能够使用基因库将其HGNC ID转换为PDB ID,然后我通过网站Uniprot和RCSB查询下载PDB文件。但是,对于其他40%,仅存在用于蛋白质的计算生成的Alphafold PDB模型,而我一直使用的基因库并不识别这些蛋白质是具有有效的PDB ID的。值得庆幸的是,在Alphafold网站上有一个搜索功能,通过使用HGNC ID进行搜索,我会收到条目列表(顶部是我想要的蛋白质的99%),如下所示;
一旦我拥有Uniprot ID(在本示例中显示为Q7K0E6),然后我可以导航到Alphafold输入页面并访问文件服务器以下载该蛋白质的PDB文件,我我已经能够成功地针对我一直使用的数据库中具有注册Uniprot ID的蛋白质执行。
我一直在使用以下代码将搜索网页用作为搜索条目输入的HGNC符号刮擦,将所有HTML页面数据放入文本文件中。
import urllib
import urllib.request
import requests
url = 'https://alphafold.ebi.ac.uk/search/text/'
fname = 'alphaname.txt'
HGNC = 'vr1'
url = url + 'vr1'
get = urllib.request.urlopen(url)
html = get.read()
r = requests.get(url)
with open(fname, "wb") as f:
f.write(html)
当我在文件本身(手册以及通过Python)中执行搜索时,我看不到任何被查询的条目中的数据作为搜索结果。
我如何使用Python从网站搜索功能中执行的搜索中检索数据?
I am developing a function to download protein .pdb files online as part of a body of code I am creating to dock protein and ligands generated by our AIBind machine learning model. For around 60% of these proteins I am able to use gene libraries to convert their HGNC IDs to pdb IDs, which I then query through the website uniprot and RCSB to download pdb files. However, for the other 40% there only exist computationally generated alphafold PDB models for the proteins, and the gene libraries I have been using do not recognize these proteins as having valid PDB IDs. Thankfully, there is a search function on the alphafold website, where by searching with the HGNC ID, I recieve a list of entries (where the top one is 99% the protein I am looking for), as shown below;
Once I have the uniprot ID (which is shown in this example as Q7K0E6), I can then navigate to the alphafold entry page and access the file server to download the PDB file for that protein, which I have already been able to successfully perform for proteins that have a registered uniprot ID in the databanks that I have been utilizing.
I've been using the following code to scrape the search webpage with the HGNC symbol inputted as a search entry, putting all of the HTML page data into a text file.
import urllib
import urllib.request
import requests
url = 'https://alphafold.ebi.ac.uk/search/text/'
fname = 'alphaname.txt'
HGNC = 'vr1'
url = url + 'vr1'
get = urllib.request.urlopen(url)
html = get.read()
r = requests.get(url)
with open(fname, "wb") as f:
f.write(html)
When I perform a search in the file itself (manual as well as through python), I don't see any data from any of the entries queried as search results.
How would I use python to retrieve data from searches performed within the search function of a website?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
数据通过JavaScript从外部URL加载。您可以使用
请求
模块对其进行仿真,例如:打印:
The data is loaded from external URL via JavaScript. You can use
requests
module to simulate it, for example:Prints: