迭代文本文件并将最小值存储在字典中

发布于 2025-01-12 11:11:39 字数 1995 浏览 2 评论 0原文

我有一个非常大的文本文件（Summary_post_docking.txt），我想过滤它以找到最低分数。这就是我想到的：

class Ranker:
def __init__(self):
    self.results = {}
    with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
        for line in summary:
            score = float(line.split()[2])
            frag_name = str(line.split()[0].split('/')[9]).split('_')[0]
            if 0 >= score >= -200:
                self.results[frag_name] = score
                old = self.results[frag_name]
            if frag_name in self.results.keys():
                new = float(line.split()[2])
                if new < old:
                    self.results[frag_name] = new

    print(self.results)

不幸的是，这一切都是采用它读取的最后一个值，并且不会用新的较低值覆盖。

[str(line.split()[0].split('/')[9]).split('_')[0]] 是分子的名称，而 float(line.split()[2 ]) 是与其相关的分数。

我希望脚本将分子名称存储为键，将分数存储为值。对于每一行，每次它发现具有相同键的较低分数时，我希望它将值升级到它找到的最小值。

编辑：

我添加了 txt 文件中的几行：

/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose1       SCORE_sum: -70.13763978228677   avg_score: -0.7 SD_score: 0.44  avg_GBSA: -5.92 SD_GBSA: 2.96   avg_RMSD: 9.75  SD_RMSD: 3.49
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose2       SCORE_sum: -18.39638945104759   avg_score: -0.18    SD_score: 0.26  avg_GBSA: -5.2  SD_GBSA: 4.57   avg_RMSD: 34.57 SD_RMSD: 9.29
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose3       SCORE_sum: -206.23402454507794  avg_score: -2.06    SD_score: 1.15  avg_GBSA: -6.8  SD_GBSA: 1.66   avg_RMSD: 4.05  SD_RMSD: 1.73
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose4       SCORE_sum: -27.56483931516906   avg_score: -0.28    SD_score: 0.64  avg_GBSA: -2.2  SD_GBSA: 3.13   avg_RMSD: 15.43 SD_RMSD: 6.74

我已按照建议更新了代码！该脚本需要将与该键关联的值更新为它找到的最低分数。

原文

I have a very large text file (Summary_post_docking.txt) and I want to filter it to find the lowest scores.
This is what I came up with:

class Ranker:
def __init__(self):
    self.results = {}
    with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
        for line in summary:
            score = float(line.split()[2])
            frag_name = str(line.split()[0].split('/')[9]).split('_')[0]
            if 0 >= score >= -200:
                self.results[frag_name] = score
                old = self.results[frag_name]
            if frag_name in self.results.keys():
                new = float(line.split()[2])
                if new < old:
                    self.results[frag_name] = new

    print(self.results)

Unfortunately all this does is taking the last value it reads and doesn't override with the new lower value.

[str(line.split()[0].split('/')[9]).split('_')[0]] is the name of the molecule, while float(line.split()[2]) is the score associated with it.

I want the script to store the name of the molecule as key and the score as a value. For every line, everytime it finds a lower score with the same key I want it to upgrade the value to the smallest it finds.

EDIT:

I'm including a few lines from the txt file:

/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose1       SCORE_sum: -70.13763978228677   avg_score: -0.7 SD_score: 0.44  avg_GBSA: -5.92 SD_GBSA: 2.96   avg_RMSD: 9.75  SD_RMSD: 3.49
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose2       SCORE_sum: -18.39638945104759   avg_score: -0.18    SD_score: 0.26  avg_GBSA: -5.2  SD_GBSA: 4.57   avg_RMSD: 34.57 SD_RMSD: 9.29
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose3       SCORE_sum: -206.23402454507794  avg_score: -2.06    SD_score: 1.15  avg_GBSA: -6.8  SD_GBSA: 1.66   avg_RMSD: 4.05  SD_RMSD: 1.73
/scratch/ludovico3/spike/stalk/vs_docking_smiles/HTS_postdock/1_600/HTS_post_docking/Z385446130_pose4       SCORE_sum: -27.56483931516906   avg_score: -0.28    SD_score: 0.64  avg_GBSA: -2.2  SD_GBSA: 3.13   avg_RMSD: 15.43 SD_RMSD: 6.74

I have updated the code as suggested!
The script needs to update the value associated with the key to the lowest score it finds.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

有木有妳兜一样 2025-01-19 11:11:39

你的旧值可能等于None，并且...根据分子管理旧值是否符合逻辑？你不这样做。

class Ranker:
    def __init__(self):
        self.results = {}
        with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
            for line in summary:
                molecule_score = float(line.split()[2])
                molecule_name = str(line.split()[0].split('/')[9]).split('_')[0]
                if molecule_name not in self.results:
                    self.results[molecule_name] = score
                elif self.results[molecule_name] > score:
                    self.results[molecule_name] = score

Your old value could be equal None, and... is it logical to manage the old value according to the molecule? You don't do that.

class Ranker:
    def __init__(self):
        self.results = {}
        with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
            for line in summary:
                molecule_score = float(line.split()[2])
                molecule_name = str(line.split()[0].split('/')[9]).split('_')[0]
                if molecule_name not in self.results:
                    self.results[molecule_name] = score
                elif self.results[molecule_name] > score:
                    self.results[molecule_name] = score

回复收藏 0 原文

晨曦慕雪 2025-01-19 11:11:39

解决了！

class Ranker:
def __init__(self):
    self.results = {}
    with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
        for line in summary:
            self.set_score(line)

    self.sorted = dict(sorted(self.results.items(), key=lambda item: item[1]))
    print(self.sorted)

def set_score(self, line):
    new_score = float(line.split()[2])
    frag_name = str(line.split()[0].split('/')[9]).split('_')[0]

    if not (0 >= new_score >= -250):
        return

    if frag_name in self.results.keys():
        old_score = self.results[frag_name]
        if new_score > old_score:
            return

    self.results[frag_name] = new_score

Solved!

class Ranker:
def __init__(self):
    self.results = {}
    with open('HTS_post_docking/Summary_post_docking.txt', 'r') as summary:
        for line in summary:
            self.set_score(line)

    self.sorted = dict(sorted(self.results.items(), key=lambda item: item[1]))
    print(self.sorted)

def set_score(self, line):
    new_score = float(line.split()[2])
    frag_name = str(line.split()[0].split('/')[9]).split('_')[0]

    if not (0 >= new_score >= -250):
        return

    if frag_name in self.results.keys():
        old_score = self.results[frag_name]
        if new_score > old_score:
            return

    self.results[frag_name] = new_score

回复收藏 0 原文

~没有更多了~