精炼存储在 SQLite 中的数据 - 如何连接多个联系人?

发布于 2024-09-02 10:44:51 字数 2285 浏览 8 评论 0原文

我正在存储不同元素之间的联系。我想消除某种类型的元素并存储由消除的元素互连的元素的新联系人。

问题背景

想象一下这个问题。你有一个水分子与其他分子接触(如果接触是氢键,那么我的水周围可能有 4 个其他分子)。如下图所示(A、B、C、D是其他一些原子,点表示接触)。

 A   B
 |   |
 H   H
  . .
   O
  / \
 H   H
 .   .
 C   D

我有所有点的信息,我需要消除中心的水并创建描述 AC、AD、AB、BC、BD 和 CD 接触的记录。

数据库结构

目前,我在数据库中有以下结构:

atoms

  • “id”整数PRIMARY KEY,
  • “amino”char(3)NOT NULL,(HOH 表示水或其他值)
  • 标识原子的其他列表

contacts

  • “acceptor_id”整数NOT NULL,(靠近我的氢的原子,这里C 或 D)
  • "donor_id" 整数 NOT NULL,(此处为 A 或 B)
  • "directness" char(1) NOT NULL, (这应该是 D 代表直接,W 代表 W对于水介导的)
  • 有关接触的其他列,例如距离

编辑: 前面描述的案例中的数据会是什么样子?

atoms

id|atom|amino
1 | O  | HOH
2 | N  | ARG  <- atom A from image
3 | S  | CYS  <- B 
4 | O  | SER  <- C
5 | N  | ARG  <- D

contacts

donor_id|acceptor_id|directness
1        4           D
1        5           D
2        1           D
3        1           D

我需要从中创建

contacts

donor_id|acceptor_id|directness
3        4           W            <- B-C
3        5           W            <- B-D
2        4           W            <- A-C
2        5           W            <- A-D
2        3           X            <- A-B    (These last two rows are escaping me,
4        5           X            <- C-D     there could be also row D-C, but not
                                             both C-D and D-C. A char 'X' could 
                                             be used to mark "no donor/acceptor")

当前解决方案(不足)

现在,我正在浏览所有具有以下功能的联系人: 捐赠者.amino =“HOH”。在此示例案例中,这将从 C 和 D 中选择联系人。对于每个选定的联系人,我都会查找与当前的 donor_id 具有相同 acceptor_id 的联系人。选定的联系人。根据此信息,我创建了新联系人。最后,我删除了 HOH 的所有联系人。

这样,我显然无法创建 CD 和 AB 联系人(其他 4 个都可以)。

如果我尝试类似的方法 - 试图找到两个具有相同捐赠者 ID 的联系人,我最终会得到重复的联系人(CD 和 DC)。

有没有一种简单的方法可以检索所有六个联系人而不重复?

我梦想着一个一页长的 SQL 查询,它只检索这六个想要的行。 :-)
但是,欢迎任何其他想法。

最好保存有关捐赠者的信息(如果可能),但并非绝对必要。

非常感谢到目前为止阅读此问题的所有人。

I'm storing contacts between different elements. I want to eliminate elements of certain type and store new contacts of elements which were interconnected by the eliminated element.

Problem background

Imagine this problem. You have a water molecule which is in contact with other molecules (if the contact is a hydrogen bond, there can be 4 other molecules around my water). Like in the following picture (A, B, C, D are some other atoms and dots mean the contact).

 A   B
 |   |
 H   H
  . .
   O
  / \
 H   H
 .   .
 C   D

I have the information about all the dots and I need to eliminate the water in the center and create records describing contacts of A-C, A-D, A-B, B-C, B-D, and C-D.

Database structure

Currently, I have the following structure in the database:

Table atoms:

  • "id" integer PRIMARY KEY,
  • "amino" char(3) NOT NULL, (HOH for water or other value)
  • other columns identifying the atom

Table contacts:

  • "acceptor_id" integer NOT NULL, (the atom near to my hydrogen, here C or D)
  • "donor_id" integer NOT NULL, (here A or B)
  • "directness" char(1) NOT NULL, (this should be D for direct and W for water-mediated)
  • other columns about the contact, such as the distance

EDIT:
How would look the data in the case depicted earlier.

atoms:

id|atom|amino
1 | O  | HOH
2 | N  | ARG  <- atom A from image
3 | S  | CYS  <- B 
4 | O  | SER  <- C
5 | N  | ARG  <- D

contacts:

donor_id|acceptor_id|directness
1        4           D
1        5           D
2        1           D
3        1           D

From which I need to make

contacts:

donor_id|acceptor_id|directness
3        4           W            <- B-C
3        5           W            <- B-D
2        4           W            <- A-C
2        5           W            <- A-D
2        3           X            <- A-B    (These last two rows are escaping me,
4        5           X            <- C-D     there could be also row D-C, but not
                                             both C-D and D-C. A char 'X' could 
                                             be used to mark "no donor/acceptor")

Current solution (insufficient)

Now, I'm going through all the contacts which have donor.amino = "HOH". In this sample case, this would select contacts from C and D. For each of these selected contacts, I look up contacts having the same acceptor_id as is the donor_id in the currently selected contact. From this information, I create the new contact. At the end, I delete all contacts to or from HOH.

This way, I am obviously unable to create C-D and A-B contacts (the other 4 are OK).

If I try a similar approach - trying to find two contacts having the same donor_id, I end up with duplicate contacts (C-D and D-C).

Is there a simple way to retrieve all six contacts without duplicates?

I'm dreaming about some one page long SQL query which retrieves just these six wanted rows. :-)
However, any other ideas are welcome.

It is preferable to conserve information about who is donor (where possible), but not strictly necessary.

Big thanks to all of you who read this question to this point.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

星星的軌跡 2024-09-09 10:44:51

你的解释有一个困难。

首先是一个有向图,其中每条边代表一个连接 X=>Y,其中 X 是供体,Y 是受体。表atoms 是该图的 SQL 表示形式。

您似乎想要的是无向的东西。因此,连接 XY 意味着 X 和 Y 通过水分子(当然也可以是其他一些物质)连接,但 X 和 Y 都可以是供体或受体。因此,您的最后一个表格有歧义(您注意到),因此某些链接可能以任何方式出现。在我看来,这意味着最后一列中的列标题 donor_idacceptor_id 没有您所解释的任何含义。这当然可能是我的困惑。

如果您想要的只是一张包含所有 6 个链接的表,每个链接作为一行,但不要太担心跟踪捐赠者/接受者的事情,那么这对我在 sqlite3 中有用:

 create temporary view hoh_view as 
 select donor_id as id, atoms.id as hoh_id from contacts, atoms 
       where acceptor_id=atoms.id and atoms.amino='HOH' 
 union select acceptor_id as id, atoms.id as hoh_id from contacts, atoms 
       where donor_id=atoms.id and atoms.amino='HOH';

 select a.id, b.id from hoh_view as a, hoh_view as b 
       where a.id > b.id and a.hoh_id=b.hoh_id;

我使用了临时视图让事情变得更清楚。如果您愿意,您可以将所有这些放入一个大查询中,方法是用第一个查询替换对 hoh_view 的每个引用。我觉得有点恶心,也许有办法整理一下。

如果您确实想跟踪供体/受体关系,您需要解释当两个氨基酸都是受体或供体时您如何决定做什么(即示例中的最后两行)。

如果这没有达到你想要的效果,那么也许我可以修复它,这样就可以了。

There's one difficulty with your explanation.

What you start with is a directed graph where each edge represents a connection X=>Y where X is a donor and Y an acceptor. The table atoms is the SQL representation of that graph.

What you seem to want is something that is undirected. So that a link X-Y means that X and Y are linked via a water molecule (or some other species of course) but that X and Y could both be donors or acceptors. For this reason your last table has an ambiguity (that you note) so that some links could occur either way around. It seems to me that means that the column headings donor_id and acceptor_id on your final column don't have any meaning that you have explained. This may be my confusion of course.

If all you want is a table with all 6 links in it, each as one row, but don't worry too much about keeping track of the donor/acceptor thing then this works for me in sqlite3:

 create temporary view hoh_view as 
 select donor_id as id, atoms.id as hoh_id from contacts, atoms 
       where acceptor_id=atoms.id and atoms.amino='HOH' 
 union select acceptor_id as id, atoms.id as hoh_id from contacts, atoms 
       where donor_id=atoms.id and atoms.amino='HOH';

 select a.id, b.id from hoh_view as a, hoh_view as b 
       where a.id > b.id and a.hoh_id=b.hoh_id;

Where I have used a temporary view to make things clearer. You can put this all into one big query if you like by replacing each reference to hoh_view by the first query. It feels a bit nasty to me and there may be a way of tidying it up.

If you do want to keep track of donor/acceptor relationships you need to explain how you decide what to do when both amino acids are acceptors or donors (i.e. the last two rows in your example).

If that doesn't do what you want, then maybe I can fix it up so it does.

悲欢浪云 2024-09-09 10:44:51

好吧,很难在评论中提供示例,我决定发布一个答案:

如果您必须遵循原始数据,则无法区分第一个结构的数据和第二个结构的数据。应该有一个附加的分组条件来消除第一和第二结构之间的方向。

sqlite> create table atoms (id INT, atom TEXT, amino TEXT);
sqlite> insert into atoms VALUES (1, 'O', 'HOH');
sqlite> insert into atoms VALUES (2, 'A', 'ARG');
sqlite> insert into atoms VALUES (3, 'B', 'CYS');
sqlite> insert into atoms VALUES (4, 'C', 'SER');
sqlite> insert into atoms VALUES (5, 'D', 'ARG');
sqlite> insert into atoms VALUES (6, 'O1', 'HOH');
sqlite> insert into atoms VALUES (7, 'A1', 'ARG');
sqlite> insert into atoms VALUES (8, 'B1', 'CYS');
sqlite> insert into atoms VALUES (9, 'C1', 'SER');
sqlite> insert into atoms VALUES (10, 'D1', 'ARG');
sqlite> select * from atoms;
1|O|HOH
2|A|ARG
3|B|CYS
4|C|SER
5|D|ARG
6|O1|HOH
7|A1|ARG
8|B1|CYS
9|C1|SER
10|D1|ARG

UPD

这是原始数据:

sqlite> .headers ON
sqlite> .mode columns
sqlite> select * from atoms;
id          atom        amino
----------  ----------  ----------
1           O           HOH
2           A           ARG
3           B           CYS
4           C           SER
5           D           ARG
6           O1          HOH
7           A1          ARG
8           B1          CYS
9           C1          SER
10          D1          ARG
sqlite> select * from contacts;
donor_id    acceptor_id  directness
----------  -----------  ----------
1           4            D
1           5            D
2           1            D
3           1            D
6           9            D
6           10           D
7           6            D
8           6            D

这是查询:

select
    c1.donor_id, c2.acceptor_id, 'W' as directness
from
    contacts c1, contacts c2, atoms a
where
    c1.acceptor_id = c2.donor_id
    and c1.acceptor_id=a.id
    and a.amino='HOH'
UNION ALL
select
    c1.donor_id, c2.donor_id, 'X' as directness
from
    contacts c1, contacts c2, atoms a
where
    c1.acceptor_id = c2.acceptor_id
    and c1.acceptor_id=a.id
    and a.amino='HOH'
    and c1.donor_id < c2.donor_id
UNION ALL
select
    c1.acceptor_id, c2.acceptor_id, 'X' as directness
from
    contacts c1, contacts c2, atoms a
where
    c1.donor_id = c2.donor_id
    and c1.donor_id=a.id
    and a.amino='HOH'
    and c1.acceptor_id < c2.acceptor_id;

这是结果:

donor_id    acceptor_id  directness
----------  -----------  ----------
2           4            W
2           5            W
3           4            W
3           5            W
7           9            W
7           10           W
8           9            W
8           10           W
2           3            X
7           8            X
4           5            X
9           10           X

Well, its hard to provide examples in comments, I decided to post an answer:

If you have to following original data, there is no way to distinguish data from the first structure from those of the second. There should be an additional grouping condition to eleminate directions between the first and the second structure.

sqlite> create table atoms (id INT, atom TEXT, amino TEXT);
sqlite> insert into atoms VALUES (1, 'O', 'HOH');
sqlite> insert into atoms VALUES (2, 'A', 'ARG');
sqlite> insert into atoms VALUES (3, 'B', 'CYS');
sqlite> insert into atoms VALUES (4, 'C', 'SER');
sqlite> insert into atoms VALUES (5, 'D', 'ARG');
sqlite> insert into atoms VALUES (6, 'O1', 'HOH');
sqlite> insert into atoms VALUES (7, 'A1', 'ARG');
sqlite> insert into atoms VALUES (8, 'B1', 'CYS');
sqlite> insert into atoms VALUES (9, 'C1', 'SER');
sqlite> insert into atoms VALUES (10, 'D1', 'ARG');
sqlite> select * from atoms;
1|O|HOH
2|A|ARG
3|B|CYS
4|C|SER
5|D|ARG
6|O1|HOH
7|A1|ARG
8|B1|CYS
9|C1|SER
10|D1|ARG

UPD

Here is the original data:

sqlite> .headers ON
sqlite> .mode columns
sqlite> select * from atoms;
id          atom        amino
----------  ----------  ----------
1           O           HOH
2           A           ARG
3           B           CYS
4           C           SER
5           D           ARG
6           O1          HOH
7           A1          ARG
8           B1          CYS
9           C1          SER
10          D1          ARG
sqlite> select * from contacts;
donor_id    acceptor_id  directness
----------  -----------  ----------
1           4            D
1           5            D
2           1            D
3           1            D
6           9            D
6           10           D
7           6            D
8           6            D

Here is the query:

select
    c1.donor_id, c2.acceptor_id, 'W' as directness
from
    contacts c1, contacts c2, atoms a
where
    c1.acceptor_id = c2.donor_id
    and c1.acceptor_id=a.id
    and a.amino='HOH'
UNION ALL
select
    c1.donor_id, c2.donor_id, 'X' as directness
from
    contacts c1, contacts c2, atoms a
where
    c1.acceptor_id = c2.acceptor_id
    and c1.acceptor_id=a.id
    and a.amino='HOH'
    and c1.donor_id < c2.donor_id
UNION ALL
select
    c1.acceptor_id, c2.acceptor_id, 'X' as directness
from
    contacts c1, contacts c2, atoms a
where
    c1.donor_id = c2.donor_id
    and c1.donor_id=a.id
    and a.amino='HOH'
    and c1.acceptor_id < c2.acceptor_id;

Here is the result:

donor_id    acceptor_id  directness
----------  -----------  ----------
2           4            W
2           5            W
3           4            W
3           5            W
7           9            W
7           10           W
8           9            W
8           10           W
2           3            X
7           8            X
4           5            X
9           10           X
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文