将值列表与每个集合元素相关联
我试图为以下问题提出最佳解决方案:
我有一个文件名列表,与每个文件名关联的是一个 ID;这些 ID 不是唯一的,这意味着多个文件名可能与一个 ID 相关联。
所以我可以将我的数据打包为: (ID, [filename1, filename2,...])
问题是我想将 ID 作为一个集合来使用,因为我需要分组并提取与另一个 ID 的差异和交集这些 ID 的预定义分组,并且我需要相对快速的操作,因为我有大约一百万个 ID。
但我不知道如何在将 ID 视为集合中的元素的同时保持 ID 与文件名列表关联。这是否可以与集合有关,或者是否有任何集合扩展可以实现此目的?
I'm trying to come up with the best solution for the following problem:
I have a list of filenames, and associated with each filename is an ID; these IDs are non-unique, meaning that several filenames might be associated with one ID.
So I could pack my data up as: (ID, [filename1, filename2,...])
The problem is that I would like to work with the IDs as a set since I will need to group and extract differences and intersections with another predefined grouping of these IDs, and I need the operations to be relatively fast since I have about a million IDs.
But I know no way to keep ID associated with the list of filenames while treating ID as an element in a set. Is this possible to do with sets, or is there any set extension that enables this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
听起来您的数据类似于下面的示例数据。如果是这样,那么代码将显示如何使用哈希表来完成您所要求的操作。哈希表可以是一个 Python
dict
(以id
作为键,以文件名list
作为关联值进行哈希),也可以只是一个 < code>set 的id
元素,如果这是您真正想要的(尽管正如其他人在评论中建议的那样,dict
可能是最好的解决方案)。示例输出:
为了方便起见,上面的代码使用了
defaultdict(list)
,但您也可以使用常规dict
,如下所示:It sounds like your data looks something like the sample data below. If so, then the code shows how to use a hash table to do what you're asking. The hash table could either be a Python
dict
(hashed onid
as key with alist
of file names as associated value) or simply aset
ofid
elements if that's what you really want (though as others have suggested in the comments, adict
is potentially the best solution).Sample output:
The above code uses a
defaultdict(list)
for convenience, but you could also use a regulardict
as follows: