从具有多个相似值和键的文件构建 Python 字典

发布于 2024-12-10 17:39:43 字数 2018 浏览 1 评论 0原文

我对 python 很陌生（一般来说对编码也很熟悉），我正在尝试使用它来分析工作中的一些数据。我有一个这样的文件：

    HWI-ST591_0064:5:1101:1228:2111#0/1 +   7included   11  A>G -   -
    HWI-ST591_0064:5:1101:1205:2125#0/1 +   genomic 17  A>G -   -
    HWI-ST591_0064:5:1101:1178:2129#0/1 +   7included   6   A>C 8   A>T
    HWI-ST591_0064:5:1101:1176:2164#0/1 +   7included   6   A>T 8   A>G
    HWI-ST591_0064:5:1101:1199:2234#0/1 +   7included   14  T>C 21  G>A
    HWI-ST591_0064:5:1101:1208:2249#0/1 +   7included   32  C>T -   -

制表符分隔。我正在尝试创建一个字典，其中包含该行的第一个值（唯一标识符）作为与连接的最后 4 个值作为键匹配的值列表，如下所示：

     {'32C>T--': ['HWI-ST591_0064:5:1101:1208:2249#0/1'], 
    '6A>C8A>C': ['HWI-ST591_0064:5:1101:1318:2090#0/1'], 
    '36A>G--': ['HWI-ST591_0064:5:1101:1425:2093#0/1'], 
     '----': ['HWI-ST591_0064:5:1101:1222:2225#0/1'], 
    '6A>C8A>T': ['HWI-ST591_0064:5:1101:1178:2129#0/1','HWIST591_0064:5:1101:1176:2164#0/1']}

这样我就可以获得唯一的列表识别、计数、排序或做我需要做的其他事情。我可以制作字典，但是当我尝试将其输出到文件时出现错误。我认为问题是因为这是一个列表，我不断收到错误

文件“Trial.py”，第 33 行， outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key]))) TypeError: unhashable type: 'list'

有没有办法让它工作，这样我就可以把它放在一个文件中？我在 for 循环中尝试了 .iteritems() 来制作字典，但这似乎不起作用。谢谢，这是我的代码：

inFile = open('path', 'rU')
outFile = open('path', 'w')

from collections import defaultdict

mutReadDict = defaultdict(list)

 for line in inFile:
entry               = line.strip('\n').split('\t')
fastQ_ID            = entry[0]
strand              = entry[1]
chromosome          = entry[2]
mut1pos             = entry[3]
mut1base            = entry[4]
mut2pos             = entry[5]
mut2base            = entry[6]

mutKey = mut1pos + mut1base + mut2pos + mut2base

if chromosome == '7included':
    mutReadDict[mutKey].append(fastQ_ID)
else:
    pass

keyList = [mutReadDict.keys()]
keyList.sort()

for key in keyList:
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))

outFile.close()

原文

I am new to python (well to coding in general) and am trying to use it to analyze some data at work. I have a file like this:

    HWI-ST591_0064:5:1101:1228:2111#0/1 +   7included   11  A>G -   -
    HWI-ST591_0064:5:1101:1205:2125#0/1 +   genomic 17  A>G -   -
    HWI-ST591_0064:5:1101:1178:2129#0/1 +   7included   6   A>C 8   A>T
    HWI-ST591_0064:5:1101:1176:2164#0/1 +   7included   6   A>T 8   A>G
    HWI-ST591_0064:5:1101:1199:2234#0/1 +   7included   14  T>C 21  G>A
    HWI-ST591_0064:5:1101:1208:2249#0/1 +   7included   32  C>T -   -

Tab delimited. I am trying to create a dictionary that contains the first value of the line (a unique identifier) as a list of values that matches the joined last 4 values as the key, like this:

     {'32C>T--': ['HWI-ST591_0064:5:1101:1208:2249#0/1'], 
    '6A>C8A>C': ['HWI-ST591_0064:5:1101:1318:2090#0/1'], 
    '36A>G--': ['HWI-ST591_0064:5:1101:1425:2093#0/1'], 
     '----': ['HWI-ST591_0064:5:1101:1222:2225#0/1'], 
    '6A>C8A>T': ['HWI-ST591_0064:5:1101:1178:2129#0/1','HWIST591_0064:5:1101:1176:2164#0/1']}

This way I can then get a list of the unique identifies and count or sort or do the other things I need to do. I can get the dictionary made, but when I try to output it to a file I get an error. I think the problem is because this is a list, I keep getting the error

File "trial.py", line 33, in
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))
TypeError: unhashable type: 'list'

Is there a way to make this work so I can have it in a file? I tried .iteritems() on the for loop making the dictionary but that didn't seem to work. Thanks and here is my code:

inFile = open('path', 'rU')
outFile = open('path', 'w')

from collections import defaultdict

mutReadDict = defaultdict(list)

 for line in inFile:
entry               = line.strip('\n').split('\t')
fastQ_ID            = entry[0]
strand              = entry[1]
chromosome          = entry[2]
mut1pos             = entry[3]
mut1base            = entry[4]
mut2pos             = entry[5]
mut2base            = entry[6]

mutKey = mut1pos + mut1base + mut2pos + mut2base

if chromosome == '7included':
    mutReadDict[mutKey].append(fastQ_ID)
else:
    pass

keyList = [mutReadDict.keys()]
keyList.sort()

for key in keyList:
outFile.write("%s\t%s\n" % ('\t' .join(key, mutReadDict[key])))

outFile.close()

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

悲欢浪云 2024-12-17 17:39:43

我想你想要：

keyList = mutReadDict.keys()

而不是

keyList = [mutReadDict.keys()]

你可能也是这个意思：

for key in keyList:
    outFile.write("%s\t%s\n" % (key, '\t'.join(mutReadDict[key])))

I think you want:

keyList = mutReadDict.keys()

instead of

keyList = [mutReadDict.keys()]

You probably mean this too:

for key in keyList:
    outFile.write("%s\t%s\n" % (key, '\t'.join(mutReadDict[key])))

回复收藏 0 原文

~没有更多了~

关于作者

彼岸花ソ最美的依靠

暂无简介

0 文章

0 评论

519 人气

关注发私信

友情链接

文江博客

从具有多个相似值和键的文件构建 Python 字典

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

从具有多个相似值和键的文件构建 Python 字典

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。