从具有多个相似值和键的文件构建 Python 字典

发布于 2024-12-10 17:39:43 字数 2018 浏览 1 评论 0原文

我对 python 很陌生(一般来说对编码也很熟悉),我正在尝试使用它来分析工作中的一些数据。我有一个这样的文件:

    HWI-ST591_0064:5:1101:1228:2111#0/1 +   7included   11  A>G -   -
    HWI-ST591_0064:5:1101:1205:2125#0/1 +   genomic 17  A>G -   -
    HWI-ST591_0064:5:1101:1178:2129#0/1 +   7included   6   A>C 8   A>T
    HWI-ST591_0064:5:1101:1176:2164#0/1 +   7included   6   A>T 8   A>G
    HWI-ST591_0064:5:1101:1199:2234#0/1 +   7included   14  T>C 21  G>A
    HWI-ST591_0064:5:1101:1208:2249#0/1 +   7included   32  C>T -   -

制表符分隔。我正在尝试创建一个字典,其中包含该行的第一个值(唯一标识符)作为与连接的最后 4 个值作为键匹配的值列表,如下所示:

     {'32C>T--': ['HWI-ST591_0064:5:1101:1208:2249#0/1'], 
    '6A>C8A>C': ['HWI-ST591_0064:5:1101:1318:2090#0/1'], 
    '36A>G--': ['HWI-ST591_0064:5:1101:1425:2093#0/1'], 
     '----': ['HWI-ST591_0064:5:1101:1222:2225#0/1'], 
    '6A>C8A>T': ['HWI-ST591_0064:5:1101:1178:2129#0/1','HWIST591_0064:5:1101:1176:2164#0/1']}

这样我就可以获得唯一的列表识别、计数、排序或做我需要做的其他事情。我可以制作字典,但是当我尝试将其输出到文件时出现错误。我认为问题是因为这是一个列表,我不断收到错误

文件“Trial.py”,第 33 行, outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key]))) TypeError: unhashable type: 'list'

有没有办法让它工作,这样我就可以把它放在一个文件中?我在 for 循环中尝试了 .iteritems() 来制作字典,但这似乎不起作用。谢谢,这是我的代码:

inFile = open('path', 'rU')
outFile = open('path', 'w')

from collections import defaultdict

mutReadDict = defaultdict(list)

 for line in inFile:
entry               = line.strip('\n').split('\t')
fastQ_ID            = entry[0]
strand              = entry[1]
chromosome          = entry[2]
mut1pos             = entry[3]
mut1base            = entry[4]
mut2pos             = entry[5]
mut2base            = entry[6]

mutKey = mut1pos + mut1base + mut2pos + mut2base

if chromosome == '7included':
    mutReadDict[mutKey].append(fastQ_ID)
else:
    pass

keyList = [mutReadDict.keys()]
keyList.sort()

for key in keyList:
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))

outFile.close()

I am new to python (well to coding in general) and am trying to use it to analyze some data at work. I have a file like this:

    HWI-ST591_0064:5:1101:1228:2111#0/1 +   7included   11  A>G -   -
    HWI-ST591_0064:5:1101:1205:2125#0/1 +   genomic 17  A>G -   -
    HWI-ST591_0064:5:1101:1178:2129#0/1 +   7included   6   A>C 8   A>T
    HWI-ST591_0064:5:1101:1176:2164#0/1 +   7included   6   A>T 8   A>G
    HWI-ST591_0064:5:1101:1199:2234#0/1 +   7included   14  T>C 21  G>A
    HWI-ST591_0064:5:1101:1208:2249#0/1 +   7included   32  C>T -   -

Tab delimited. I am trying to create a dictionary that contains the first value of the line (a unique identifier) as a list of values that matches the joined last 4 values as the key, like this:

     {'32C>T--': ['HWI-ST591_0064:5:1101:1208:2249#0/1'], 
    '6A>C8A>C': ['HWI-ST591_0064:5:1101:1318:2090#0/1'], 
    '36A>G--': ['HWI-ST591_0064:5:1101:1425:2093#0/1'], 
     '----': ['HWI-ST591_0064:5:1101:1222:2225#0/1'], 
    '6A>C8A>T': ['HWI-ST591_0064:5:1101:1178:2129#0/1','HWIST591_0064:5:1101:1176:2164#0/1']}

This way I can then get a list of the unique identifies and count or sort or do the other things I need to do. I can get the dictionary made, but when I try to output it to a file I get an error. I think the problem is because this is a list, I keep getting the error

File "trial.py", line 33, in
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))
TypeError: unhashable type: 'list'

Is there a way to make this work so I can have it in a file? I tried .iteritems() on the for loop making the dictionary but that didn't seem to work. Thanks and here is my code:

inFile = open('path', 'rU')
outFile = open('path', 'w')

from collections import defaultdict

mutReadDict = defaultdict(list)

 for line in inFile:
entry               = line.strip('\n').split('\t')
fastQ_ID            = entry[0]
strand              = entry[1]
chromosome          = entry[2]
mut1pos             = entry[3]
mut1base            = entry[4]
mut2pos             = entry[5]
mut2base            = entry[6]

mutKey = mut1pos + mut1base + mut2pos + mut2base

if chromosome == '7included':
    mutReadDict[mutKey].append(fastQ_ID)
else:
    pass

keyList = [mutReadDict.keys()]
keyList.sort()

for key in keyList:
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))

outFile.close()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

悲欢浪云 2024-12-17 17:39:43

我想你想要:

keyList = mutReadDict.keys()

而不是

keyList = [mutReadDict.keys()]

你可能也是这个意思:

for key in keyList:
    outFile.write("%s\t%s\n" % (key, '\t'.join(mutReadDict[key])))

I think you want:

keyList = mutReadDict.keys()

instead of

keyList = [mutReadDict.keys()]

You probably mean this too:

for key in keyList:
    outFile.write("%s\t%s\n" % (key, '\t'.join(mutReadDict[key])))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文