在盒子里生成球

发布于 2024-09-06 09:34:07 字数 1951 浏览 6 评论 0原文

给定两个排序向量 ab,找到所有由 a 之和以及 b 的排列组成的向量,并且一旦排序后是唯一的。

您可以通过以下方式创建所寻找的向量之一:

  • 采用向量a和向量b的排列。
  • 将它们加在一起,这样c[i ]=a[i]+b[i].
  • c进行排序。

我感兴趣的是找到产生整组唯一的 c 向量的 b 排列集。

示例 0a='ccdd'b='xxyy'
给出矢量求和:'cycydxdx''cxcxdydy''cxcydxdy'
请注意,b 的排列:'xyxy''yxyx' 是相等的,因为在这两种情况下,“box c”和“ box d" 都恰好得到一个 'x' 和一个 'y'

我想这类似于将 M 球放入 M 盒子中(每个盒子一个),其中一些球组和盒子是相同的。
更新:给定一个字符串a='aabbbcdddd'b='xxyyzzttqq',你的问题将是4个盒子里有10个球。有 4 个不同的盒子,尺寸分别为 2、3、1 和 4。球是成对无法区分的。

示例 1:给定字符串为 a='xyy'b='kkd'
可能的解决方案:'kkd''dkk'
原因:我们看到 b 的所有唯一排列都是 'kkd''kdk''dkk'。然而,根据我们的限制,前两个排列被认为是相同的,因为不同的索引映射到字符串 a 中的相同字符 'y'

示例 2: 给定字符串为 a='xyy'b='khd'
可能的解决方案:'khd''dkh''hkd'

示例 3:给定字符串为 a='xxxx'b='khhd'
可能的解决方案:'khhd'

我可以使用 Narayana Pandita 的算法解决生成唯一候选 b 排列的问题,如 维基百科/排列
第二部分接缝更硬。我最好的办法是将两个字符串成对连接到一个列表中,对其进行排序并将其用作查找集中的键。 ('xx'+'hd' 连接→'xh','xd' 排序→'xd','xh' )。

由于我的 M 通常非常大,并且字符串中的相似性很常见,因此我目前生成的 b 排列比实际通过设置过滤器的排列要多。我希望有一个算法可以直接生成正确的结果。欢迎任何改进。

Given two sorted vectors a and b, find all vectors which are sums of a and some permutation of b, and which are unique once sorted.

You can create one of the sought vectors in the following way:

  • Take vector a and a permutation of vector b.
  • Sum them together so c[i]=a[i]+b[i].
  • Sort c.

I'm interested in finding the set of b-permutations that yield the entire set of unique c vectors.

Example 0: a='ccdd' and b='xxyy'
Gives the summed vectors: 'cycydxdx', 'cxcxdydy', 'cxcydxdy'.
Notice that the permutations of b: 'xyxy' and 'yxyx' are equal, because in both cases the "box c" and the "box d" both get exactly one 'x' and one 'y'.

I guess this is similar to putting M balls in M boxes (one in each) with some groups of balls and boxes being identical.
Update: Given a string a='aabbbcdddd' and b='xxyyzzttqq' your problem will be 10 balls in 4 boxes. There are 4 distinct boxes of size 2, 3, 1 and 4. The balls are pair wise indistinguishable.

Example 1: Given strings are a='xyy' and b='kkd'.
Possible solution: 'kkd', 'dkk'.
Reason: We see that all unique permutations of b are 'kkd', 'kdk' and 'dkk'. However with our restraints, the two first permutations are considered equal as the indices on which the differ maps to the same char 'y' in string a.

Example 2: Given strings are a='xyy' and b='khd'.
Possible solution: 'khd', 'dkh', 'hkd'.

Example 3: Given strings are a='xxxx' and b='khhd'.
Possible solution: 'khhd'.

I can solve the problem of generating unique candidate b permutations using Narayana Pandita's algorithm as decribed on Wikipedia/Permutation.
The second part seams harder. My best shot is to join the two strings pairwise to a list, sort it and use it as a key in a lookup set. ('xx'+'hd' join→'xh','xd' sort→'xd','xh').

As my M is often very big, and as similarities in the strings are common, I currently generate way more b permutations than actually goes through the set filter. I would love to have an algorithm generating the correct ones directly. Any improvement is welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

生活了然无味 2024-09-13 09:34:07

要生成可能重复元素的 k 组合(多重集),以下内容可能很有用: 多重集组合的格雷码 (1995)

对于递归解决方案,您可以尝试以下操作:

计算每个字符出现的次数。假设它们是 x1 x2 ... xm,对应于 m 个不同的字符。

然后,您需要找到所有可能的有序对 (y1 y2 ... ym),使得

0 <= yi <= xi

且 Sum yi = k。

这里yi是字符i出现的次数。

这个想法是,固定字符 1 出现的次数 (y1)。然后从剩余的中递归生成 k-y1 的所有组合。

psuedocode:

List Generate (int [] x /* array index starting at 1*/, 
               int k /* size of set */) {

    list = List.Empty;

    if (Sum(x) < k) return list;

    for (int i = 0; i <= x[1], i++) {

        // Remove first element and generate subsets of size k-i.

        remaining = x.Remove(1);

        list_i = Generate(remaining, k-i);

        if (list_i.NotEmpty()) {

            list = list + list_i;    

        } else {

            return list;
        }

    }

    return list;
}

编辑之前:

如果我理解正确,您需要查看字符串 a,查看仅出现一次的符号。假设有 k 个这样的符号。然后你需要生成 b 的所有可能的排列,其中包含 k 个元素,并映射到相应位置的那些符号。其余的您可以根据需要忽略/填写。

我记得在这里发布了 C# 代码:如何找到给定长度中 k 的排列?

我假设 xxyy 只会给出 1 个唯一的字符串,而只出现一次的字符串是“区分”点。

例如,在 a=xyy, b=add 的情况下,

区别点是 x

因此,您选择长度为 1 的“add”排列。这些将为您提供 ad

因此 adddad(或 dda) 就是您需要的。

对于a=xyyz b=good

,区别点是x和z,

因此您生成长度为2的b的排列,从而

go
og
oo
od
do
gd
dg

给出7种独特的排列。

这有帮助吗?我的理解正确吗?

To generate k-combinations of possibly repeated elements (multiset), the following could be useful: A Gray Code for Combinations of a Multiset (1995).

For a recursive solution you try the following:

Count the number of times each character appears. Say they are x1 x2 ... xm, corresponding to m distinct characters.

Then you need to find all possible ordered pairs (y1 y2 ... ym) such that

0 <= yi <= xi

and Sum yi = k.

Here yi is the number of times character i appears.

The idea is, fix the number of times char 1 appears (y1). Then recursively generate all combinations of k-y1 from the remaining.

psuedocode:

List Generate (int [] x /* array index starting at 1*/, 
               int k /* size of set */) {

    list = List.Empty;

    if (Sum(x) < k) return list;

    for (int i = 0; i <= x[1], i++) {

        // Remove first element and generate subsets of size k-i.

        remaining = x.Remove(1);

        list_i = Generate(remaining, k-i);

        if (list_i.NotEmpty()) {

            list = list + list_i;    

        } else {

            return list;
        }

    }

    return list;
}

PRIOR TO EDITS:

If I understood it correctly, you need to look at string a, see the symbols that appear exactly once. Say there are k such symbols. Then you need to generate all possible permutations of b, which contain k elements and map to those symbols at the corresponding positions. The rest you can ignore/fill in as you see fit.

I remember posting C# code for that here: How to find permutation of k in a given length?

I am assuming xxyy will give only 1 unique string and the ones that appear exactly once are the 'distinguishing' points.

Eg in case of a=xyy, b=add

distinguishing point is x

So you select permuations of 'add' of length 1. Those gives you a and d.

Thus add and dad (or dda) are the ones you need.

For a=xyyz b=good

distinguishing points are x and z

So you generate permutations of b of length 2 giving

go
og
oo
od
do
gd
dg

giving you 7 unique permutations.

Does that help? Is my understanding correct?

戒ㄋ 2024-09-13 09:34:07

好吧,很抱歉我一直无法清楚地解释这个问题,但这是一个解决方案。

我们需要两个函数组合runvector(v)combinations(s,k) 生成长度为 k 的多重集的唯一组合。对于s='xxyy',这些将是['xx','xy','yy']。 runvector(v) 将表示为排序向量的多重集转换为更简单的结构,即 runvector。 runvector('cddeee')=[1,2,3]

为了解决这个问题,我们将使用递归生成器。我们遍历了适合 box1 的所有组合以及对其余框的追索权,禁止我们已经选择的值。为了完成禁止,combinations 将维护一个跨调用的位数组。

在 python 中,该方法如下所示:

def fillrest(banned,out,rv,b,i):
    if i == len(rv):
        yield None
        return
    for comb in combinations(b,rv[i],banned):
        out[i] = comb
        for rest in fillrest(banned,out,rv,b,i+1):
            yield None

def balls(a,b):
    rv = runvector(a)
    banned = [False for _ in b]
    out = [None for _ in rv]
    for _ in fill(out,rv,0,b,banned):
        yield out[:]

>>> print list(balls('abbccc','xyyzzz'))
[['x', 'yy', 'zzz'],
 ['x', 'yz', 'yzz'],
 ['x', 'zz', 'yyz'],
 ['y', 'xy', 'zzz'],
 ['y', 'xz', 'yzz'],
 ['y', 'yz', 'xzz'],
 ['y', 'zz', 'xyz'],
 ['z', 'xy', 'yzz'],
 ['z', 'xz', 'yyz'],
 ['z', 'yy', 'xzz'],
 ['z', 'yz', 'xyz'],
 ['z', 'zz', 'xyy']]

输出采用“box”格式,但可以轻松合并回简单字符串:'xyyzzzz''xyzyzz'.. 。

Ok, I'm sorry I never was able to clearly explain the problem, but here is a solution.

We need two functions combinations and runvector(v). combinations(s,k) generates the unique combinations of a multiset of a length k. For s='xxyy' these would be ['xx','xy','yy']. runvector(v) transforms a multiset represented as a sorted vector into a more simple structure, the runvector. runvector('cddeee')=[1,2,3].

To solve the problem, we will use recursive generators. We run through all the combinations that fits in box1 and the recourse on the rest of the boxes, banning the values we already chose. To accomplish the banning, combinations will maintain a bitarray across of calls.

In python the approach looks like this:

def fillrest(banned,out,rv,b,i):
    if i == len(rv):
        yield None
        return
    for comb in combinations(b,rv[i],banned):
        out[i] = comb
        for rest in fillrest(banned,out,rv,b,i+1):
            yield None

def balls(a,b):
    rv = runvector(a)
    banned = [False for _ in b]
    out = [None for _ in rv]
    for _ in fill(out,rv,0,b,banned):
        yield out[:]

>>> print list(balls('abbccc','xyyzzz'))
[['x', 'yy', 'zzz'],
 ['x', 'yz', 'yzz'],
 ['x', 'zz', 'yyz'],
 ['y', 'xy', 'zzz'],
 ['y', 'xz', 'yzz'],
 ['y', 'yz', 'xzz'],
 ['y', 'zz', 'xyz'],
 ['z', 'xy', 'yzz'],
 ['z', 'xz', 'yyz'],
 ['z', 'yy', 'xzz'],
 ['z', 'yz', 'xyz'],
 ['z', 'zz', 'xyy']]

The output are in 'box' format, but can easily be merged back to simple strings: 'xyyzzzz', 'xyzyzz'...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文