按字典顺序对字符串进行分组(python)

发布于 2025-01-05 02:21:54 字数 226 浏览 1 评论 0原文

我有 N 个字符串,我想将它们按字典顺序划分为 M 个偶数大小的桶(+/- 1 个字符串)。而且,N>>M。

直接的方法是对所有字符串进行排序,并将结果列表拆分到 M 个桶中。

我想通过在完整列表可用之前将创建的每个字符串路由到存储桶来近似这一点。

有没有一种快速且Pythonic的方法将字符串分配给存储桶?我本质上是在寻找整数模运算符的字符串等效项。也许是保留字典顺序的哈希?这可能吗?

I have N strings that I want to divide lexicographic into M even-sized buckets (+/- 1 string). Also, N>>M.

The direct way would be to sort all the strings and split the resulting list into the M buckets.

I would like to instead approximate this by routing each string as it is created to a bucket, before the full list is available.

Is there a fast and pythonic way to assign strings to buckets? I'm essentially looking for a string-equivalent of the integer modulo operator. Perhaps a hash that preserves lexicographic order? Is that even possible?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

撞了怀 2025-01-12 02:21:54

您可以按字符串的前两个字符或类似的方式进行排序。

假设 M=100,因此您应该将字符划分为 sqrt(M) 区域,并且每个区域都应指向另一个 sqrt(M) > 区域,然后对于获得的每个字符串,您可以比较第一个字符来决定将字符串定向到哪个区域,然后再次比较第二个字符,就像一棵树,以存储桶作为叶子,比较作为节点。

You can sort by first two chars of a string, or something of this sort.

Let's say that M=100, so you should divide the characters into sqrt(M) regions, and each should point to another sqrt(M) regions, then for each string you get, you can compare the first char to decide which region to direct the string to and again for the second char, something like a tree with buckets as leaves and comparisons as nodes.

皓月长歌 2025-01-12 02:21:54

根据定义,哈希不保留任何顺序。

我不认为有任何Python式的方法可以做到这一点。

您可以只创建字典(基本上是哈希函数)并继续向每个循环样式添加一个字符串,但它不会保留任何顺序。

A hash by definition doesn't preserve any order.

And I don't think there is any pythonic way to do this.

You could just create dictionaries (which are basically hashing functions) and keep adding a string to each round-robin style, but it wouldn't preserve any order.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文