获取 PDB ID +来自 Uniprot ID 的链 ID?

发布于 2025-01-16 12:25:11 字数 932 浏览 5 评论 0原文

我有一个 Uniprot ID 列表,需要知道 PDB ID 和 Chain ID。 通过 Uniprot 网站上给出的代码,我可以获取 PDB ID,但不能获取链信息。

import urllib.parse
import urllib.request

url = 'https://www.uniprot.org/uploadlists/'

params = {
'from': 'ACC+ID',
'to': 'PDB_ID',
'format': 'tab',
'query': UniProtIDs
}

data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
with open('UniProt_PDB_IDs.txt', 'a') as f:
   with urllib.request.urlopen(req) as q:
      response = q.read()
      f.write(response.decode('utf-8'))

所以这段代码让我明白了:

From    To
A0A075B6N1  5HHM
A0A075B6N1  5HHO
A0A075B6N1  5NQK
A0A075B6T6  1AO7
A0A075B6T6  4ZDH

对于PDB ID 5HHM的蛋白质A0A075B6N1,链是E和J,所以我需要一种方法来检索链以获得类似的东西:

A0A075B6N1  5HHM_E 
A0A075B6N1  5HHM_J
A0A075B6N1  5HHo_E
A0A075B6N1  5NQK_B

它不必采用这种格式,稍后我将其转换为以 UniProt ID 作为键、以 PDB ID 作为值的字典。

提前感谢您的帮助!

I have a list of Uniprot IDs and need to know the PDB IDs plus the Chain IDs.
With the code given on the Uniprot website I can get the PDB IDs but not the Chain Information.

import urllib.parse
import urllib.request

url = 'https://www.uniprot.org/uploadlists/'

params = {
'from': 'ACC+ID',
'to': 'PDB_ID',
'format': 'tab',
'query': UniProtIDs
}

data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
with open('UniProt_PDB_IDs.txt', 'a') as f:
   with urllib.request.urlopen(req) as q:
      response = q.read()
      f.write(response.decode('utf-8'))

so this code gets me this:

From    To
A0A075B6N1  5HHM
A0A075B6N1  5HHO
A0A075B6N1  5NQK
A0A075B6T6  1AO7
A0A075B6T6  4ZDH

for the Protein A0A075B6N1 with PDB ID 5HHM the Chains are E and J so i need a way to also retrieve the chains to get something like that:

A0A075B6N1  5HHM_E 
A0A075B6N1  5HHM_J
A0A075B6N1  5HHo_E
A0A075B6N1  5NQK_B

It doesen't has to be in this format, later I convert it into a dictionary with the UniProt IDs as keys and the PDB IDs as values.

Thank you for your help in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

羁客 2025-01-23 12:25:11

最近刚刚发布了一个名为 localpdb 的工具,它可能完全可以满足您的需求: https://labstructbioinf.github.io /localpdb/

另一种方法是按段分割结构,这可以使用 MDanalysis Universe 对象轻松完成(https://www.mdanalysis .org)。假设您有 PDB ID 列表:

#fetch structures
universe_objects = []
for pdb_id in pdb_ids:
    mmtf_object = mda.fetch_mmtf(pdb_id)
    universe_objects.append(mmtf_object)

#get rid of water and ligands and split structures into chains
universe_chains = []
for universe_object in universe_objects:
    universe_chain = universe_object.select_atoms('protein').split('segment')
    universe_chains.append(universe_chain)
    
#flatten nested list
universe_chain_list = [item for sublist in universe_chains for item in sublist]

当然,您还可以使用其他工具来执行此操作。例如通过 ProDy Hierview 功能!

希望有帮助。

A tool called localpdb was just recently released that might does exactly what you want: https://labstructbioinf.github.io/localpdb/.

Another way would be to split the structures by segments, which can be easily done with MDanalysis universe objects (https://www.mdanalysis.org). Assuming you have a list of PDB IDs:

#fetch structures
universe_objects = []
for pdb_id in pdb_ids:
    mmtf_object = mda.fetch_mmtf(pdb_id)
    universe_objects.append(mmtf_object)

#get rid of water and ligands and split structures into chains
universe_chains = []
for universe_object in universe_objects:
    universe_chain = universe_object.select_atoms('protein').split('segment')
    universe_chains.append(universe_chain)
    
#flatten nested list
universe_chain_list = [item for sublist in universe_chains for item in sublist]

Of course there is other tools you can do this with. E.g. via the ProDy Hierview function!

Hope that helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文