创建不再有一个相交元素的组合
我希望创建一种特殊类型的组合,其中没有两个集合具有多个相交元素。让我用一个例子来解释:
假设我们有 9 个字母集,其中包含 A、B、C、D、E、F、G、H 和 I
如果您创建三个字母的标准非重复组合,您将拥有9C3套。 这些将包含 ABC、ABD、BCD 等集合。我希望创建最多只有 1 个常用字母的集合。因此,在本例中,我们将得到以下集合:
ABC、ADG、AEI、AFH、BEH、BFG、BDI、CFI、CDH、CEG、DEF 和 GHI - 请注意,如果您采用任意两个集合,则不会超过 1 个集合重复的信。
生成这样的集合的好方法是什么?它应该是可扩展的解决方案,以便我可以对一组 1000 个字母执行此操作,子集大小为 4。
非常感谢任何帮助。
谢谢
I am looking to create a special type of combination in which no two sets have more than one intersecting element. Let me explain with an example:
Let us say we have 9 letter set that contains A, B, C, D, E, F, G, H, and I
If you create the standard non-repeating combinations of three letters you will have 9C3 sets.
These will contain sets like ABC, ABD, BCD, etc. I am looking to create sets that have at the most only 1 common letter. So in this example, we will get following sets:
ABC, ADG, AEI, AFH, BEH, BFG, BDI, CFI, CDH, CEG, DEF, and GHI - note that if you take any two sets there are no more than 1 repeating letter.
What would be a good way to generate such sets? It should be scalable solution so that I can do it for a set of 1000 letters, with sub set size of 4.
Any help is highly appreciated.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
假设您有 n 个字母(或学生,或其他什么),并且每周都希望将它们划分为大小为 k 的子集(每周总共 n/k 子集)。此方法几乎每周都会生成
n/k
子集 - 下面我将展示如何扩展它以准确生成n/k
子集。生成子集(无分区)
首先选择 p,最大素数 <= n/k。
让我们考虑每个有序对 (a,b),以便
我们可以将每个配对映射到我们的一个字母;因此,我们可以用这种方式映射
p*k <= n
个字母(同样,我在下面展示如何精确映射 n 个字母)现在,给定
我们可以创建一个集合Sw(i) 我们的有序对。 Sw(i) 中的每一对将代表一个字母(根据上面的映射),并且集合 Sw(i) 本身代表一个“字母组”又名。大小为 k 的一个子集。
Sw(i) 的公式为:
如果我们改变所有可能值的 w 和 i,我们将得到总共 p2 组。当我们取其中任意两个集合时,它们最多将有一个相交元素。
工作原理
假设我们有两个集合 Sw1(i1) 和 Sw2(i2)。如果 Sw1(i1) 和 Sw2(i2) 有多个共同元素,那么至少存在两个
x
使得但是,任何学过模算术的人都知道,如果 p 是素数,则 x 有唯一解,或者 (w1 = w< sub>2 和 i1 = i2);因此,不能有多个 x,并且 Sw1(i1) 和 Sw2(i2 >) 最多可以有一个相交元素。
分析
自
p < n/k
,由切比雪夫定理(该定理指出 x 和2x for x > 3)因此,该方法至少生成 (n/2k)2 个字母子集,但实际上 p 会更接近 n/k,因此数字会更接近(n/k)2。由于最大可能此类子集的简单上限为 n(n-1)/(k(k-1))(请参阅下面 BlueRaja 的评论),这意味着该算法是渐近的 最佳,并且会生成接近最佳数量的集合(即使在最坏情况下,它也不会生成少于最佳数量的 1/4;再次参见下面的评论)
分区
您现在希望每周将字母分组到分区中:每周,所有字母都包含在一组中。
我们通过将
w
固定为某个值(代表周)并让i
从 0 到 p-1 变化来实现此目的。证明
考虑我们创建的组:
假设
w
是固定的,而i
从 0 到 p-1 变化。然后我们得到 p 组:现在我们说 Sw(i1) 和 Sw(i2) (与 i1 =/= i2)相交;然后
对于某个 x,因此 i1 = i2。因此,Sw(i1) 和 Sw(i2) 不相交。
由于我们的集合中没有两个相交,并且恰好有 p 个集合(每个集合有 k 个元素),因此我们的集合形成了 k*p 个字母的划分。
每周生成 n/k 个子集
此方法的最大缺点是它生成 p*k 个字母的集合,而不是 n 个字母。如果不能省略最后一个字母(如您的情况,字母是学生),则有两种方法可以每周准确生成 n/k 个子集:
查找一组素数 p 1, p2, p3, ... 其总和正好为 n/k。然后我们可以将每组 pik 字母视为一个独立的字母表,这样我们就可以找到 p< 的一组子集,而不是寻找 pik 字母的子集。 sub>1*k 个字母,p2*k 个字母的另一组子集,另一组...
这样做的缺点是,一组中的字母永远不会与另一组中的字母配对,从而减少了生成的子集总数。幸运的是,如果 n 是偶数,根据哥德巴赫猜想†,你只需要两组大多数(如果 n 是奇数,则您只需要最多三个)
此方法保证大小为 k 的子集,但不会生成尽可能多的子集。
† 虽然未经证实,但已知对于此问题中您可能遇到的每个大得离谱的数字都是正确的
另一种选择是使用最小素数 p > ;=n/k。这将为您提供 p*k >= n 个字母 - 生成子集后,只需扔掉多余的字母即可。因此,最终这会给你一些大小 < 的子集。 k.假设 k 整除 n(即 n/k 是整数),您可以采用较小的子集并手动将它们混合以形成大小为 k 的子集,但这样可能会与过去/未来的子集有一些重叠。< br>
此方法生成的子集至少与原始方法一样多,但有些子集的大小可能<< k
示例
取 n = 15,k = 3。即有 15 名学生,我们分成三人一组。
首先,我们选择最大素数 p <= n/k。 n/k 是素数(我们很幸运!),所以 p = 5。
我们将 15 名学生映射到上述有序对 (a,b) 中,得到 (每个字母都是一个学生):
该方法生成 25 组,每组三个。因此,由于我们每周需要安排 n/k = 5 组,因此我们可以安排 5 周的活动(每周 5 组 * 5 周 = 25 组)。
对于第 0 周,我们生成分区为
对于第 4 周,它将是
以下是所有 5 周的时间表:
更多实际示例
在您的案例中,每组中有 n = 1000 名学生,k = 4。因此,我们选择 p 作为最大素数 <= (n/k = 1000/4 = 250),因此 p = 241。 不考虑上面“每周生成 n/k 子集”,此方法将为 961 名学生生成持续 241 周的时间表。
(可能的最大子集数的上限为 1000*999/(4*3) = 83250,但实际数量可能小于该数。即便如此,此方法也会生成 58081 个子集,即大约 70%理论最大值!)
如果我们使用上面的第一种方法为 1000 名学生生成一个时间表,我们将 p1 = 113, p2 = 137(这样 p 1 + p2 = n/k)。因此,我们可以生成 (113)^2 + (137)^2 = 31,538 个学生子集,足以持续 113 周。
如果我们使用上面的第二种方法为正好 1000 名学生生成一个时间表,我们取 p = 251。这将为我们提供 1004 名学生 251 周的时间表;我们每周都会从日程表中删除 4 名幽灵学生。通常,这将导致每周分为四组,每组 3 人(尽管可能性不大,但也有可能,例如一组 2 人,两组 3 人)。 4 名学生的学生总数始终是 4 的倍数,因此您可以手动将这些学生分为 4 人一组,但可能会有其中两名学生稍后再次分到另一组的风险。
最后的想法
这个算法的一个缺陷是它并不真正灵活:如果一个学生辍学,我们就永远与一个幽灵学生在一起。此外,无法在年中将新学生添加到日程表中(除非我们通过最初创建虚拟学生来允许他们)。
该问题属于组合学中的限制集系统范畴。请参阅本文了解更多信息,特别是第 1 章和第 2 章。因为它是一个postscript文件,所以你需要gsview或其他东西来查看它。
Say you have n letters (or students, or whatever), and every week want to partition them into subsets of size k (for a total of
n/k
subsets every week). This method will generate almostn/k
subsets every week - I show below how to extend it to generate exactlyn/k
subsets.Generating the Subsets (no partitioning)
First pick p, the largest prime <= n/k.
Let's consider every ordered pair (a,b) such that
We can map each pairing to one of our letters; thus, we can map
p*k <= n
letters this way (again, I show below how to map exactly n letters)Now, given
We can create a set Sw(i) of our ordered pairs. Each pairing in Sw(i) will represent one letter (according to our mapping above), and the set Sw(i) itself represents one "grouping of letters" aka. one subset of size k.
The formula for Sw(i) is
If we vary w and i over all possible values, we get p2 total sets. When we take any two of these sets, they will have at most one intersecting element.
How it works
Say we have two sets Sw1(i1) and Sw2(i2). If Sw1(i1) and Sw2(i2) have more than one element in common, then there exists at least two
x
such thatHowever, anyone who's taken modular arithmetic knows that if p is prime, either x has a unique solution or (w1 = w2 and i1 = i2); thus, there cannot be more than one x, and Sw1(i1) and Sw2(i2) can have at most one intersecting element.
Analysis
Since
p < n/k
, by Chebyshev's Theorem (which states there is a prime between x and 2x for x > 3)Thus, this method generates at least (n/2k)2 subsets of letters, though in practice p will be nearer to n/k, so the number will be nearer to (n/k)2. Since a simple upper bound for the maximum possible such subsets is
n(n-1)/(k(k-1))
(see BlueRaja's comment below), this means the algorithm is asymptotically optimal, and will generate near the optimal amount of sets (even in the worst case, it won't generate less than about 1/4th the optimal amount; see again the comment below)Partitioning
You now want to group the letters into partitions each week: each week, all letters are included in exactly one group.
We do this by letting
w
be fixed to a certain value (representing the week) and lettingi
vary from 0 to p-1.Proof
Consider the groups we created:
Let's say
w
is fixed andi
varies from 0 to p-1. Then we get p sets:Now let's say Sw(i1) and Sw(i2) (with i1 =/= i2) intersect; then
for some x, and hence i1 = i2. Thus, Sw(i1) and Sw(i2) don't intersect.
Since no two of our sets intersect, and there are exactly p of them (each with k elements), our sets form a partition of the k*p letters.
Generating n/k Subsets Each Week
The biggest disadvantage of this method is that it generates sets for p*k letters, rather than n letters. If the last letters can't be left out (as in your case, where the letters are students), there are two ways to generate exactly n/k subsets each week:
Find a set of prime numbers p1, p2, p3, ... which sums up to exactly n/k. Then we can treat each group of pik letters as an independent alphabet, so that rather than finding subsets of pk letters, we find one group of subsets for p1*k letters, another group of subsets for p2*k letters, another group...
This has the disadvantage that letters from one group will never be paired with letters from another group, reducing the total number of subsets generated. Luckily, if n is even, by Goldbach's conjecture† you will only need two groups at the most (if n is odd, you will only need three at most)
This method guarantees subsets of size k, but doesn't generate as many subsets.
† Though unproven, it is known to be true for every ridiculously large number you will likely encounter for this problem
The other option is to use the smallest prime p >= n/k. This will give you p*k >= n letters - after the subsets have been generated, simply throw out the extra letters. Thus, in the end this gives you some subsets with size < k. Assuming k divides n evenly (ie. n/k is an integer), you could take the smaller subsets and mix them up by hand to make subsets of size k, but you risk having some overlap with past/future subsets this way.
This method generates at least as many subsets as the original method, but some may have size < k
Example
Take n = 15, k = 3. i.e. there are 15 students and we are making groups of three.
To begin with, we pick largest prime p <= n/k. n/k is prime (lucky us!), so p = 5.
We map the 15 students into the ordered pairs (a,b) described above, giving us (each letter is a student):
The method generates 25 groups of three. Thus, since we need to schedule n/k = 5 groups each week, we can schedule 5 weeks of activities (5 groups a week * 5 weeks = 25 groups).
For week 0, we generate the partition as
For week 4 it will be
Here's the schedule for all 5 weeks:
More Practical Example
In your case, n = 1000 students and k = 4 in each group. Thus, we pick p as the largest prime <= (n/k = 1000/4 = 250), so p = 241. Without considering the alterations above under "Generating n/k Subsets Each Week", this method will generate a schedule for 961 students lasting 241 weeks.
(An upper-bound for the maximum number of subsets possible would be 1000*999/(4*3) = 83250, though the actual number is likely less than that. Even so, this method generates 58081 subsets, or about 70% of the theoretical maximum!)
If we use the first method above to generate a schedule for exactly 1000 students, we take p1 = 113, p2 = 137 (so that p1 + p2 = n/k). Thus, we can generate (113)^2 + (137)^2 = 31,538 subsets of students, enough to last 113 weeks.
If we use the second method above to generate a schedule for exactly 1000 students, we take p = 251. This will give us a schedule for 1004 students for 251 weeks; we remove the 4 phantom students from the schedule each week. Usually, this will result in four groups of 3 every week (though unlikely, it is also possible to have for example one group of 2 and two groups of 3). The groups with < 4 students will always have a multiple-of-4 total number of students, so you could manually place those students into groups of 4, at the risk of potentially having two of those students together again later in another group.
Final thoughts
One flaw of this algorithm is that it's not really flexible: if a student drops out, we are forever stuck with a phantom student. Also, there is no way to add new students to the schedule midway through the year (unless we allow for them by initially creating phantom students).
This problem falls under the category of Restricted Set Systems in combinatorics. See this paper for more information, especially Chapters 1 and 2. Since it is a postscript file, you will need gsview or something to view it.
我必须添加另一个答案,因为另一个答案已经太长了。
如果您有以下限制:
1) 您每周需要 4 人一组。
2)某一周的每个小组都是不相交的,并且每个学生都恰好在一个小组中。
3) 如果两个学生在同一组,则他们以后不能在同一组。
如果构建一个图G如下:
1)每个学生都是一个节点。
2) 两个学生通过一条边连接在一起,前提是他们以前没有在一个小组中呆过。
随着学生任意退学/加入,这成为一个难题!即使您一开始有一个完整的图表,几周后,该图表可能会变得相当不可预测。
你的问题:你需要找到 G 的一个生成子图,这样它就是 K_4 副本的并集,或者换句话说,它是 K_4 的分区。
不幸的是,看起来这个问题是 NP-Hard:4 组的精确覆盖(NP 完全)可以简化为您的问题(就像 3 组的精确覆盖可以简化为划分为三角形一样)。
也许这将有助于提供一些近似算法: http://www.siam.org /proceedings/soda/2010/SODA10_122_chany.pdf
(您的问题可以简化为超图匹配,因此该算法可以用于您的问题)。
精确覆盖: http://en.wikipedia.org/wiki/Exact_cover
划分为三角形: https://noppa.tkk.fi/noppa/ kurssi/t-79.5103/viikkoharjoitukset/T-79_5103_solutions_5.pdf
4 个集合的精确覆盖 = 给定大小为 4m 的集合 S 和 S 的 4 元素子集的集合 C,是否存在以下子集 C' C,使得 S 的每个元素在 C' 中恰好出现一次。
不幸的是,似乎您可能必须更改一些限制。
I had to add another answer as the other one was too long already.
If you have the following constraints:
1) You need groups of 4 every week.
2) Each group in a certain week is disjoint and each student is in exactly one group.
3) If two students are in the same group, they need cannot be in the same group in future.
If you construct a graph G as follows:
1) Each student is a node.
2) Two students are joined by an edge iff they haven't been together in a group before.
With students dropping/joining arbitrarily, this becomes a hard problem! Even though you start out with a complete graph intially, after some weeks, the graph could become quite unpredictable.
Your problem: You need to find a spanning subgraph of G, such that it is a union of copies of K_4 or in other words a partition into K_4s.
Unfortunately, look like this problem is NP-Hard: Exact cover by 4-sets (which is NP-Complete) can be reduced to your problem (just like Exact cover by 3-sets can be reduced to partition into triangles).
Perhaps this will help give some aproximation algorithms: http://www.siam.org/proceedings/soda/2010/SODA10_122_chany.pdf
(Your problem can be reduced to Hypergraph matching and so algorithms for that can be used for your problem).
Exact Cover: http://en.wikipedia.org/wiki/Exact_cover
Partition into triangles: https://noppa.tkk.fi/noppa/kurssi/t-79.5103/viikkoharjoitukset/T-79_5103_solutions_5.pdf
Exact Cover by 4 sets = Given a set S of size 4m and a collection C of 4-element subsets of S, does there exist a subset C' of C, such that each element of S appears precisely once in C'.
Unfortunately, seems like you might have to change some constraints.
这是该算法的一些概述。
AB BC CD DE EF FG GH HI
AC BD CE DF EG FH GI
AD BE CF DG EH FI
AE BF CG DH EI
AF BG CH DI
股份公司BH CI
阿哈比
AI
这些可以存储在 6 个 n(n-1) 数组中
现在开始尝试使用以下规则组合连续对:
一个。只有当存在共同字符时,两对才能组合。
b.当通过保留共同字符形成的对也可用时,组合是可能的。例如,如果我们想组合 AB 和 AC,那么我们需要检查 BC 是否也可用。
c.当满足上述规则时,我们将两对组合成一个三元组(例如AB和AC合并形成ABC)并将所有三对(例如AB、AC和BC)标记为不可用。
继续执行此操作,在数组中查找可用的对,并将它们合并以形成三元组,并将这些对标记为不可用,直到没有可用的对或不再可以形成三元组。
例子:
1. 组合AB + AC --> ABC;标记 AB、AC 和 BC 不可用。
2. 组合AD+AE -->自动体外除颤器;将 AD、AE 和 DE 标记为不可用。
3. AF+AG组合 --> AFG;标记 AF、AG 和 FG 不可用。
..
Here is some outline of the algorithm.
AB BC CD DE EF FG GH HI
AC BD CE DF EG FH GI
AD BE CF DG EH FI
AE BF CG DH EI
AF BG CH DI
AG BH CI
AH BI
AI
These can be stored in an array of sixe n(n-1)
Now start attempting to combine consecutive pairs using the following rules:
a. Two pairs can be combined only when there is a common character.
b. The combination is possible when the pair formed by leaving the common character is also available. e.g. if we want to combine AB and AC then we need to check if BC is also available.
c. When the above rules are satisfied we combine the two pairs into a triple (e.g. AB and AC merged to form ABC) and mark all the three pairs, such as AB, AC and BC as unavailable.
Continue doing this looking for available pairs in the array and merging them to form triples and marking the pairs unavailable until there are no available pairs or no triples can be formed anymore.
Example:
1. combine AB + AC --> ABC; Mark AB, AC and BC unavailable.
2. combine AD + AE --> AED; Mark AD, AE and DE unavailable.
3. combine AF + AG --> AFG; Mark AF, AG and FG unavailable.
..
这是一种可以满足您所陈述的要求的方法。是否达到你想要的效果我不知道。
我将让您将其转换为工作代码,但我认为它不会太困难并且应该可以很好地扩展。请注意,它也应该适用于任何其他大小的子集,您可以为任何 n 使用不规则的 n 边形平铺平面,尽管这可能会变得很困难。
我想到的另一种方法是:
这需要更多的簿记工作,并且有许多极端情况需要处理,但同样,编码也不应该太困难。
编辑:很容易将我的第一种方法转换为更明显是图上的算法的形式。将每个子集建模为一个节点,将字母表中的每个字母建模为一条边。构造一个图,其中每个节点的度为 n(子集中的元素数量),并且每个字母(边)使用一次。
Here's an approach which will satisfy the requirements you have stated. Whether it does what you want I don't know.
I'll leave it up to you to turn this into working code, but I don't think that it should be too difficult and it should scale well. Note that it should also work for any other size of subset, you can tile the plane with irregular n-gons for any n, though it might get difficult.
The other approach I thought of is:
This requires a bit more book-keeping, and there are a number of corner cases to deal with, but again it shoudln't be too difficult to code up.
EDIT: It's easy to transform my first approach into a form which is more obviously an algorithm on a graph. Model each subset as a node, and each letter in the alphabet as an edge. Construct a graph where each node has degree n (number of elements in the subset) and each letter (edge) is used once.
@khuss
同样的方法可以推广。但它不是线性算法,可能是指数算法。
例如:当子集大小为 4 时,我们选择 2 个或更多对具有 4 个唯一字符。
例如“AB和CD”或“AB、AC&AD”仅当满足以下条件时:
我们像以前一样继续,直到无法形成更多 4 元组或不再有可用的对。
@khuss
Same method can be generalized. But it is not a linear algorithm and may be exponential.
For example: When subset size is 4, we pick 2 or more pairs with 4 unique characters.
e.g. "AB and CD" or "AB, AC & AD" only if the following conditions are satisfied:
We continue as before until no more 4-tuples can be formed or no more pairs are available.