获取 PDB ID +来自 Uniprot ID 的链 ID？

发布于 2025-01-16 12:25:11 字数 932 浏览 5 评论 0原文

我有一个 Uniprot ID 列表，需要知道 PDB ID 和 Chain ID。通过 Uniprot 网站上给出的代码，我可以获取 PDB ID，但不能获取链信息。

import urllib.parse
import urllib.request

url = 'https://www.uniprot.org/uploadlists/'

params = {
'from': 'ACC+ID',
'to': 'PDB_ID',
'format': 'tab',
'query': UniProtIDs
}

data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
with open('UniProt_PDB_IDs.txt', 'a') as f:
   with urllib.request.urlopen(req) as q:
      response = q.read()
      f.write(response.decode('utf-8'))

所以这段代码让我明白了：

From    To
A0A075B6N1  5HHM
A0A075B6N1  5HHO
A0A075B6N1  5NQK
A0A075B6T6  1AO7
A0A075B6T6  4ZDH

对于PDB ID 5HHM的蛋白质A0A075B6N1，链是E和J，所以我需要一种方法来检索链以获得类似的东西：

A0A075B6N1  5HHM_E 
A0A075B6N1  5HHM_J
A0A075B6N1  5HHo_E
A0A075B6N1  5NQK_B

它不必采用这种格式，稍后我将其转换为以 UniProt ID 作为键、以 PDB ID 作为值的字典。

提前感谢您的帮助！

原文

I have a list of Uniprot IDs and need to know the PDB IDs plus the Chain IDs.
With the code given on the Uniprot website I can get the PDB IDs but not the Chain Information.

import urllib.parse
import urllib.request

url = 'https://www.uniprot.org/uploadlists/'

params = {
'from': 'ACC+ID',
'to': 'PDB_ID',
'format': 'tab',
'query': UniProtIDs
}

data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
req = urllib.request.Request(url, data)
with open('UniProt_PDB_IDs.txt', 'a') as f:
   with urllib.request.urlopen(req) as q:
      response = q.read()
      f.write(response.decode('utf-8'))

so this code gets me this:

From    To
A0A075B6N1  5HHM
A0A075B6N1  5HHO
A0A075B6N1  5NQK
A0A075B6T6  1AO7
A0A075B6T6  4ZDH

for the Protein A0A075B6N1 with PDB ID 5HHM the Chains are E and J so i need a way to also retrieve the chains to get something like that:

A0A075B6N1  5HHM_E 
A0A075B6N1  5HHM_J
A0A075B6N1  5HHo_E
A0A075B6N1  5NQK_B

It doesen't has to be in this format, later I convert it into a dictionary with the UniProt IDs as keys and the PDB IDs as values.

Thank you for your help in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

羁客 2025-01-23 12:25:11

最近刚刚发布了一个名为 localpdb 的工具，它可能完全可以满足您的需求： https://labstructbioinf.github.io /localpdb/。

另一种方法是按段分割结构，这可以使用 MDanalysis Universe 对象轻松完成（https://www.mdanalysis .org）。假设您有 PDB ID 列表：

#fetch structures
universe_objects = []
for pdb_id in pdb_ids:
    mmtf_object = mda.fetch_mmtf(pdb_id)
    universe_objects.append(mmtf_object)

#get rid of water and ligands and split structures into chains
universe_chains = []
for universe_object in universe_objects:
    universe_chain = universe_object.select_atoms('protein').split('segment')
    universe_chains.append(universe_chain)
    
#flatten nested list
universe_chain_list = [item for sublist in universe_chains for item in sublist]

当然，您还可以使用其他工具来执行此操作。例如通过 ProDy Hierview 功能！

希望有帮助。

A tool called localpdb was just recently released that might does exactly what you want: https://labstructbioinf.github.io/localpdb/.

Another way would be to split the structures by segments, which can be easily done with MDanalysis universe objects (https://www.mdanalysis.org). Assuming you have a list of PDB IDs:

#fetch structures
universe_objects = []
for pdb_id in pdb_ids:
    mmtf_object = mda.fetch_mmtf(pdb_id)
    universe_objects.append(mmtf_object)

#get rid of water and ligands and split structures into chains
universe_chains = []
for universe_object in universe_objects:
    universe_chain = universe_object.select_atoms('protein').split('segment')
    universe_chains.append(universe_chain)
    
#flatten nested list
universe_chain_list = [item for sublist in universe_chains for item in sublist]

Of course there is other tools you can do this with. E.g. via the ProDy Hierview function!

Hope that helps.

回复收藏 0 原文

~没有更多了~