在 Python 中计算随机子集 - 我的代码出了什么问题?
该代码是返回具有尺寸k的子集,从设置尺寸n(EPI 5.15)返回。也就是说,n> 0,k< = n,从n(假设上)形成一个集合{0,1,2,...,n-1},我们从中选择k元素以形成子集。 NCK有选择子集的可能性,我们希望将其均匀挑选,我们也希望该子集中的置换也是随机的。代码有三个版本 - 来自官方解决方案,我的调整和我自己的解决方案。后两个是错误的,但我不知道为什么。我将在三个代码下方解释算法的要旨。
官方解决方案
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
rmap = H.get(r, r)
imap = H.get(i, i)
H[r] = imap
H[i] = rmap
return [H[i] for i in range(k)]
更改为官方解决方案(错误)
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
H[r] = H.get(i, i)
H[i] = H.get(r, r)
return [H[i] for i in range(k)]
(DEAD错误)
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
if r in H:
H[i], H[r] = H[r], i
else:
H[i], H[r] = r, i
return [H[i] for i in range(k)]
我的解决方案(错误)逻辑的解决方案
我们从数组a,< 0,1,2,...,n-1> Gt;中选择一个元素,而无需重复。首先,从数组A中挑选R,然后用[0]将其交换;然后选择另一个R并用[1]交换它...直到我们填充[K-1],总共有K元素,例如以下代码:
'''
A = <0, 1, 2, ..., n-1>
i 0 1 2 n-1
'''
def random_sampling(k, A):
for i in range(k):
r = random.randrange(i, len(A))
A[i], A[r] = A[r], A[i]
A = [i for i in range(n)]
k = some_constant
random_sampling(k, A)
answer = A[:k]
将空间复杂性从O(N)降低到O( k)通过模仿数组&lt; 0,1,2,...,n-1&gt;,我们将右上的代码更改为官方解决方案,该解决方案使用哈希表,我们从中选择了一个要包含在子集中的元素。我的使用方式与原始答案不同,但我不知道为什么会出现问题。
This code is to return a subset with size k from the set size n (EPI 5.15). That is, take n > 0, k <= n, and from n we (hypothetically) form a set {0, 1, 2, ..., n-1} from which we pick k elements to form a subset. There are nCk possibilities of picking a subset and we want it to be uniformly picked and we also want the permutation in that subset is also random. There are three versions of codes -- from official solution, my tweak, and my own solution. The latter two are WRONG but I don't know why. I will explain the gist of the algorithm right below the three codes.
Official Solution
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
rmap = H.get(r, r)
imap = H.get(i, i)
H[r] = imap
H[i] = rmap
return [H[i] for i in range(k)]
Change to the official solution (WRONG)
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
H[r] = H.get(i, i)
H[i] = H.get(r, r)
return [H[i] for i in range(k)]
My solution (DEAD WRONG)
def random_subset(n: int, k: int) -> List[int]:
H = {}
for i in range(k):
r = random.randrange(i, n)
if r in H:
H[i], H[r] = H[r], i
else:
H[i], H[r] = r, i
return [H[i] for i in range(k)]
Underlying Logic
We pick an element from the part of array A, <0, 1, 2, ..., n-1>, without duplicate. At first, pick r from array A and swap it with A[0]; then pick another r and swap it with A[1] ... until we have filled A[k-1], in total we have k elements, like the following code:
'''
A = <0, 1, 2, ..., n-1>
i 0 1 2 n-1
'''
def random_sampling(k, A):
for i in range(k):
r = random.randrange(i, len(A))
A[i], A[r] = A[r], A[i]
A = [i for i in range(n)]
k = some_constant
random_sampling(k, A)
answer = A[:k]
To reduce space complexity from O(n) to O(k) by mimicking an array <0, 1, 2, ..., n-1>, we changed the right above code into an official solution that uses a hash table from which we select an element to be included in a subset. The problem arises in the way I use hash table differently from original answer but I don't know why.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
,最后一项似乎没有基本意义。像这样的赋值
将
H[i]
绑定到 2 元组,而不是整数。中间(第二个)是踩自己的脚趾:
第二行中的
get()
毫无用处,因为H[r]
已被绑定在上面的行中。当第二行执行时,r
总是在H
中,因此这两行是相同的,例如,这很明显不是你打算这样做。
顺便说一句,如果您出于某种原因热衷于减少行数,这应该可行:
但我发现第一个版本是最清晰的。
编辑:最后一个代码的新版本
最后一个版本已更改为:
现在将实现概念上的“交换”逻辑如果情况总是如此,在顶部循环。
H.get(i, i) == i
。但事实并非如此,因此它可能会失败。例如,从
n=9
和k=5
开始(这并不微妙 - 几乎是任意的)。在第一次循环迭代 (i=0
) 中,假设选择了r=1
。然后将H[0]
设置为 1,将H[1]
设置为 0。这样就可以了。 但是,现在H.get(1, 1)
不是 1,而是 0。这可能会给下一步带来麻烦。在下一次迭代 (
i=1
) 中,假设选择了r=5
。所以代码确实糟糕!现在 0(在
H[1]
中)不再在H
中,并且 1 在 H 中出现两次(它在H[0]
中) code>,现在它也在H[5]
中)。这根本不是“交换”。顺便说一句,还有另一种我更喜欢的写法,因为它非常明确地表明,一旦选择了子集元素,该决定将永远不会改变。它还减少了
H
的大小:, the last one seems not to make basic sense. An assignment like
binds
H[i]
to a 2-tuple, not to an integer.The middle (second) one is stepping on its own toes:
The
get()
in the second line is more-than-less useless, becauseH[r]
was bound in the line right above.r
is always inH
when the second line executes, so that pair of lines is the same as, e.g.,Which is pretty clearly not what you intended to do.
By the way, if you're keen to reduce the line count for some reason, this should work:
But I find the first version to be clearest.
Edit: new version of last code
The last version was changed to do:
Now that would implement the conceptual "swap" logic if it were always the case, at the top of the loop. that
H.get(i, i) == i
. But that isn't the case, so it can fail.For example, start with
n=9
andk=5
(this isn't delicate - pretty much arbitrary). On the first loop iteration (i=0
), supposer=1
is picked. ThenH[0]
is set to 1 andH[1]
to 0. That's fine. But, now it's not the case thatH.get(1, 1)
is 1 - it's 0. That can cause trouble on the next step.On the next iteration (
i=1
), supposer=5
is picked. So the code doesOops! Now 0 (which was in
H[1]
) is no longer inH
, and 1 is in H twice (it was inH[0]
, and now it's also inH[5]
). That's not "a swap" at all.BTW, there's another way to write this I like better, because it makes it very explicit that once a subset element is picked, that decision will never be changed. It also cuts the size of
H
: