在 Python 中计算随机子集 - 我的代码出了什么问题？

发布于 2025-01-18 03:41:24 字数 1636 浏览 0 评论 0原文

该代码是返回具有尺寸k的子集，从设置尺寸n（EPI 5.15）返回。也就是说，n＆gt; 0，k＆lt; = n，从n（假设上）形成一个集合{0，1，2，...，n-1}，我们从中选择k元素以形成子集。 NCK有选择子集的可能性，我们希望将其均匀挑选，我们也希望该子集中的置换也是随机的。代码有三个版本 - 来自官方解决方案，我的调整和我自己的解决方案。后两个是错误的，但我不知道为什么。我将在三个代码下方解释算法的要旨。

官方解决方案

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        rmap = H.get(r, r)
        imap = H.get(i, i)
        H[r] = imap
        H[i] = rmap
    return [H[i] for i in range(k)]

更改为官方解决方案（错误）

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        H[r] = H.get(i, i)
        H[i] = H.get(r, r)
    return [H[i] for i in range(k)]

（DEAD错误）

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        if r in H:
            H[i], H[r] = H[r], i
        else:
            H[i], H[r] = r, i
    return [H[i] for i in range(k)]

我的解决方案（错误）逻辑的解决方案

我们从数组a，＆lt; 0，1，2，...，n-1＆gt; Gt;中选择一个元素，而无需重复。首先，从数组A中挑选R，然后用[0]将其交换；然后选择另一个R并用[1]交换它...直到我们填充[K-1]，总共有K元素，例如以下代码：

'''
A = <0, 1, 2, ..., n-1>
i    0  1  2       n-1
'''

def random_sampling(k, A):
    for i in range(k):
        r = random.randrange(i, len(A))
        A[i], A[r] = A[r], A[i]

A = [i for i in range(n)]
k = some_constant
random_sampling(k, A)
answer = A[:k]

将空间复杂性从O（N）降低到O（ k）通过模仿数组＆lt; 0，1，2，...，n-1＆gt;，我们将右上的代码更改为官方解决方案，该解决方案使用哈希表，我们从中选择了一个要包含在子集中的元素。我的使用方式与原始答案不同，但我不知道为什么会出现问题。

原文

This code is to return a subset with size k from the set size n (EPI 5.15). That is, take n > 0, k <= n, and from n we (hypothetically) form a set {0, 1, 2, ..., n-1} from which we pick k elements to form a subset. There are nCk possibilities of picking a subset and we want it to be uniformly picked and we also want the permutation in that subset is also random. There are three versions of codes -- from official solution, my tweak, and my own solution. The latter two are WRONG but I don't know why. I will explain the gist of the algorithm right below the three codes.

Official Solution

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        rmap = H.get(r, r)
        imap = H.get(i, i)
        H[r] = imap
        H[i] = rmap
    return [H[i] for i in range(k)]

Change to the official solution (WRONG)

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        H[r] = H.get(i, i)
        H[i] = H.get(r, r)
    return [H[i] for i in range(k)]

My solution (DEAD WRONG)

def random_subset(n: int, k: int) -> List[int]:
    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        if r in H:
            H[i], H[r] = H[r], i
        else:
            H[i], H[r] = r, i
    return [H[i] for i in range(k)]

Underlying Logic

We pick an element from the part of array A, <0, 1, 2, ..., n-1>, without duplicate. At first, pick r from array A and swap it with A[0]; then pick another r and swap it with A[1] ... until we have filled A[k-1], in total we have k elements, like the following code:

'''
A = <0, 1, 2, ..., n-1>
i    0  1  2       n-1
'''

def random_sampling(k, A):
    for i in range(k):
        r = random.randrange(i, len(A))
        A[i], A[r] = A[r], A[i]

A = [i for i in range(n)]
k = some_constant
random_sampling(k, A)
answer = A[:k]

To reduce space complexity from O(n) to O(k) by mimicking an array <0, 1, 2, ..., n-1>, we changed the right above code into an official solution that uses a hash table from which we select an element to be included in a subset. The problem arises in the way I use hash table differently from original answer but I don't know why.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怼怹恏 2025-01-25 03:41:24

，最后一项似乎没有基本意义。像这样的赋值

            H[i] = H[r], i

将 H[i] 绑定到 2 元组，而不是整数。

中间（第二个）是踩自己的脚趾：

        H[r] = H.get(i, i)
        H[i] = H.get(r, r)

第二行中的 get() 毫无用处，因为 H[r] 已被绑定在上面的行中。当第二行执行时，r总是在H中，因此这两行是相同的，例如，

        temp = H.get(i, i)
        H[i] = H[r] = temp

这很明显不是你打算这样做。

顺便说一句，如果您出于某种原因热衷于减少行数，这应该可行：

    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        H[i], H[r] = H.get(r, r), H.get(i, i)
    return [H[i] for i in range(k)]

但我发现第一个版本是最清晰的。

编辑：最后一个代码的新版本

最后一个版本已更改为：

    for i in range(k):
        r = random.randrange(i, n)
        if r in H:
            H[i], H[r] = H[r], i
        else:
            H[i], H[r] = r, i

现在将实现概念上的“交换”逻辑如果情况总是如此，在顶部循环。 H.get(i, i) == i。但事实并非如此，因此它可能会失败。

例如，从 n=9 和 k=5 开始（这并不微妙 - 几乎是任意的）。在第一次循环迭代 (i=0) 中，假设选择了 r=1。然后将 H[0] 设置为 1，将 H[1] 设置为 0。这样就可以了。但是，现在 H.get(1, 1) 不是 1，而是 0。这可能会给下一步带来麻烦。

在下一次迭代 (i=1) 中，假设选择了 r=5。所以代码

            H[i], H[r] = r, i # which is
            H[1], H[5] = 5, 1

确实糟糕！现在 0（在 H[1] 中）不再在 H 中，并且 1 在 H 中出现两次（它在 H[0] 中） code>，现在它也在 H[5] 中）。这根本不是“交换”。

顺便说一句，还有另一种我更喜欢的写法，因为它非常明确地表明，一旦选择了子集元素，该决定将永远不会改变。它还减少了 H 的大小：

def random_subset(n, k):
    H = {}
    result = []
    for i in range(k):
        r = random.randrange(i, n)
        result.append(H.get(r, r))
        H[r] = H.pop(i, i)
    return result

, the last one seems not to make basic sense. An assignment like

            H[i] = H[r], i

binds H[i] to a 2-tuple, not to an integer.

The middle (second) one is stepping on its own toes:

        H[r] = H.get(i, i)
        H[i] = H.get(r, r)

The get() in the second line is more-than-less useless, because H[r] was bound in the line right above. r is always in H when the second line executes, so that pair of lines is the same as, e.g.,

        temp = H.get(i, i)
        H[i] = H[r] = temp

Which is pretty clearly not what you intended to do.

By the way, if you're keen to reduce the line count for some reason, this should work:

    H = {}
    for i in range(k):
        r = random.randrange(i, n)
        H[i], H[r] = H.get(r, r), H.get(i, i)
    return [H[i] for i in range(k)]

But I find the first version to be clearest.

Edit: new version of last code

The last version was changed to do:

    for i in range(k):
        r = random.randrange(i, n)
        if r in H:
            H[i], H[r] = H[r], i
        else:
            H[i], H[r] = r, i

Now that would implement the conceptual "swap" logic if it were always the case, at the top of the loop. that H.get(i, i) == i. But that isn't the case, so it can fail.

For example, start with n=9 and k=5 (this isn't delicate - pretty much arbitrary). On the first loop iteration (i=0), suppose r=1 is picked. Then H[0] is set to 1 and H[1] to 0. That's fine. But, now it's not the case that H.get(1, 1) is 1 - it's 0. That can cause trouble on the next step.

On the next iteration (i=1), suppose r=5 is picked. So the code does

            H[i], H[r] = r, i # which is
            H[1], H[5] = 5, 1

Oops! Now 0 (which was in H[1]) is no longer in H, and 1 is in H twice (it was in H[0], and now it's also in H[5]). That's not "a swap" at all.

BTW, there's another way to write this I like better, because it makes it very explicit that once a subset element is picked, that decision will never be changed. It also cuts the size of H:

def random_subset(n, k):
    H = {}
    result = []
    for i in range(k):
        r = random.randrange(i, n)
        result.append(H.get(r, r))
        H[r] = H.pop(i, i)
    return result

回复收藏 0 原文

~没有更多了~

关于作者

复古式

暂无简介

文章

666 人气

关注发私信

友情链接

文江博客

在 Python 中计算随机子集 - 我的代码出了什么问题？

官方解决方案

更改为官方解决方案（错误）

（DEAD错误）

我的解决方案（错误）逻辑的解决方案

Official Solution

Change to the official solution (WRONG)

My solution (DEAD WRONG)

Underlying Logic

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑：最后一个代码的新版本

Edit: new version of last code

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

在 Python 中计算随机子集 - 我的代码出了什么问题？

官方解决方案

更改为官方解决方案（错误）

（DEAD错误）

我的解决方案（错误）逻辑的解决方案

Official Solution

Change to the official solution (WRONG)

My solution (DEAD WRONG)

Underlying Logic

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑：最后一个代码的新版本

Edit: new version of last code

关于作者

相关话题

热门标签

推荐作者

饮湿

明月

02

hs1283

风向决定发型

落花浅忆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。