将值列表与每个集合元素相关联

发布于 2025-01-13 06:08:35 字数 316 浏览 1 评论 0原文

我试图为以下问题提出最佳解决方案:

我有一个文件名列表,与每个文件名关联的是一个 ID;这些 ID 不是唯一的,这意味着多个文件名可能与一个 ID 相关联。

所以我可以将我的数据打包为: (ID, [filename1, filename2,...])

问题是我想将 ID 作为一个集合来使用,因为我需要分组并提取与另一个 ID 的差异和交集这些 ID 的预定义分组,并且我需要相对快速的操作,因为我有大约一百万个 ID。

但我不知道如何在将 ID 视为集合中的元素的同时保持 ID 与文件名列表关联。这是否可以与集合有关,或者是否有任何集合扩展可以实现此目的?

I'm trying to come up with the best solution for the following problem:

I have a list of filenames, and associated with each filename is an ID; these IDs are non-unique, meaning that several filenames might be associated with one ID.

So I could pack my data up as: (ID, [filename1, filename2,...])

The problem is that I would like to work with the IDs as a set since I will need to group and extract differences and intersections with another predefined grouping of these IDs, and I need the operations to be relatively fast since I have about a million IDs.

But I know no way to keep ID associated with the list of filenames while treating ID as an element in a set. Is this possible to do with sets, or is there any set extension that enables this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

当梦初醒 2025-01-20 06:08:35

听起来您的数据类似于下面的示例数据。如果是这样,那么代码将显示如何使用哈希表来完成您所要求的操作。哈希表可以是一个 Python dict(以 id 作为键,以文件名 list 作为关联值进行哈希),也可以只是一个 < code>set 的 id 元素,如果这是您真正想要的(尽管正如其他人在评论中建议的那样,dict 可能是最好的解决方案)。

files = [
    {'filename':'foo101', 'id':1},
    {'filename':'foo102', 'id':1},
    {'filename':'foo103', 'id':1},
    {'filename':'foo201', 'id':2},
    {'filename':'foo202', 'id':2},
    {'filename':'foo301', 'id':3},
    {'filename':'foo401', 'id':4},
]
fileDict = defaultdict(list)
for d in files:
    fileDict[d['id']].append(d['filename'])
[print(id, fileNames) for id, fileNames in fileDict.items()]
idSet = set(fileDict)
print(idSet)

示例输出:

1 ['foo101', 'foo102', 'foo103']
2 ['foo201', 'foo202']
3 ['foo301']
4 ['foo401']
{1, 2, 3, 4}

为了方便起见,上面的代码使用了 defaultdict(list),但您也可以使用常规 dict,如下所示:

files = [
    {'filename':'foo101', 'id':1},
    {'filename':'foo102', 'id':1},
    {'filename':'foo103', 'id':1},
    {'filename':'foo201', 'id':2},
    {'filename':'foo202', 'id':2},
    {'filename':'foo301', 'id':3},
    {'filename':'foo401', 'id':4},
]
fileDict = {}
for d in files:
    if d['id'] not in fileDict:
        fileDict[d['id']] = []
    fileDict[d['id']].append(d['filename'])
[print(id, fileNames) for id, fileNames in fileDict.items()]
idSet = set(fileDict)
print(idSet)

It sounds like your data looks something like the sample data below. If so, then the code shows how to use a hash table to do what you're asking. The hash table could either be a Python dict (hashed on id as key with a list of file names as associated value) or simply a set of id elements if that's what you really want (though as others have suggested in the comments, a dict is potentially the best solution).

files = [
    {'filename':'foo101', 'id':1},
    {'filename':'foo102', 'id':1},
    {'filename':'foo103', 'id':1},
    {'filename':'foo201', 'id':2},
    {'filename':'foo202', 'id':2},
    {'filename':'foo301', 'id':3},
    {'filename':'foo401', 'id':4},
]
fileDict = defaultdict(list)
for d in files:
    fileDict[d['id']].append(d['filename'])
[print(id, fileNames) for id, fileNames in fileDict.items()]
idSet = set(fileDict)
print(idSet)

Sample output:

1 ['foo101', 'foo102', 'foo103']
2 ['foo201', 'foo202']
3 ['foo301']
4 ['foo401']
{1, 2, 3, 4}

The above code uses a defaultdict(list) for convenience, but you could also use a regular dict as follows:

files = [
    {'filename':'foo101', 'id':1},
    {'filename':'foo102', 'id':1},
    {'filename':'foo103', 'id':1},
    {'filename':'foo201', 'id':2},
    {'filename':'foo202', 'id':2},
    {'filename':'foo301', 'id':3},
    {'filename':'foo401', 'id':4},
]
fileDict = {}
for d in files:
    if d['id'] not in fileDict:
        fileDict[d['id']] = []
    fileDict[d['id']].append(d['filename'])
[print(id, fileNames) for id, fileNames in fileDict.items()]
idSet = set(fileDict)
print(idSet)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文