当前位置：文江博客话题详情

组合独立集/汉明距离的算法/近似

发布于 2024-09-14 21:07:15 字数 974 浏览 13 评论 0原文

输入：图G 输出：几个独立的集合，使得一个节点对于所有独立集合的隶属度是唯一的。因此，一个节点与它自己集合中的任何节点都没有连接。这是一个示例路径。

由于这里需要澄清另一个改写：

将给定的图划分为集合，以便

i 可以通过其在集合中的成员身份区分所有其他节点，例如，如果节点 i 只出现在集合 A 中，则其他节点不应出现在集合 A 中。只设置A
如果节点 j 存在于集合 A 和 B 中，则集合 A 和 B 中不应存在其他节点。如果节点的成员资格由位模式编码，则这些位模式的汉明距离至少为一
如果两个节点在图中相邻，则它们不应出现在同一集合中，因此是一个独立的集合< /p>

示例： B没有相邻节点 D=>A, A=>D

解：

AB //
B D

A 具有位模式 10，且其集合中没有相邻节点。 B 的位模式为 11 并且没有相邻节点，D 的位模式为 01 因此所有节点的汉明距离至少为 1 并且没有相邻节点 =>正确

错误，因为 D 和 A 已连接：

ADB
/ DB

A 在其集合中具有位模式 10 和 D，它们是相邻的。 B 的位模式为 11 并且没有相邻节点，D 的位模式为 11，B 也有，因此该解决方案存在两个错误，因此不被接受。

当然，随着图中节点数量的增加，这应该扩展到更多的集合，因为您至少需要 log(n) 集合。

我已经编写了 MAX-SAT 的转换，以使用 sat 求解器来实现此目的。但条款的数量太大了。更直接的方法会很好。到目前为止，我有一个近似值，但我想要一个精确的解决方案，或者至少是一个更好的近似值。

我尝试过一种方法，使用粒子群从任意解决方案优化到更好的解决方案。然而，运行时间非常糟糕，结果也远非理想。我正在寻找动态算法或其他东西，但是我无法理解如何划分和解决这个问题。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

岁月静好 2024-09-21 21:07:15

这不是一个完整的答案，我不知道它对你有多大用处。但这里是这样的：

汉明距离让我觉得这是一种转移注意力的东西。你的问题陈述说它必须至少为 1，但也可以是 1000。只要说每个节点的集合成员资格的位编码是唯一的就足够了。

您的问题陈述没有说明这一点，但您上面的解决方案表明每个节点必须是至少 1 个集合的成员。 IE。任何节点的集合成员资格都不允许使用全 0 的位编码。

暂时忽略连接的节点，不相交的节点很容易：只需使用未使用的位编码对它们进行顺序编号即可。把那些留到最后。

上面的例子使用了有向边，但同样，这让我觉得这是一个转移注意力的事情。如果因为A＝＞D而A不能与D在同一集合中，则无论D＝＞A是否，D不能与A在同一集合中。

您提到至少需要 log(N) 组。您最多也有 N 组。完全连接的图（具有 (N^2-N)/2 个无向边）将需要 N 个集合，每个集合包含一个节点。

事实上，如果您的图包含 M 维（M in 1..N-1）的完全连接单纯形，具有 M+1 个顶点和 (M^2+M)/2 个无向边，则您将需要至少 M+1套。

在上面的示例中，您有一个这样的单纯形 (M=1)，具有 2 个顶点 {A,D} 和 1 个（无向）边 {(A,D)}。

看来您的问题可以归结为找到图中最大的完全连接的单纯形。换句话说，您遇到了布线问题：您需要多少个维度来布线边缘，以免交叉？这听起来不像是一个非常可扩展的问题。

找到第一个大型单纯形很容易。每个顶点节点都会获得一个带有自己的位的新集合。

不相交的节点很容易。处理完连接的节点后，只需按顺序对不相交的节点进行编号即可跳过任何先前使用的位模式。从上面的示例中，由于 A 和 D 采用 01 和 10，因此 B 的下一个可用位模式是 11。

棘手的部分就变成了如何在使用 new 创建任何新集合之前将任何剩余的单纯形尽可能多地折叠到现有范围内位。折叠时，每个节点必须使用 2 个或更多位（组），并且这些位（组）不得与任何相邻节点的位（组）相交。

考虑一下当向示例中添加另一个节点 C 时，上面的示例会发生什么情况：

如果 C 直接连接到 A 和 D，那么最初的问题就变成寻找具有 3 个顶点 {A,C,D} 的 2-单纯形，并且3 条边 {(A,c),(A,D),(C,D)}。一旦 A、C 和 D 采用位模式 001、010 和 100，不相交 B 的最低可用位模式就是 011。

另一方面，如果 C 直接连接 A 或 D，但不是两者都连接，则该图有两个 1-单纯形。假设我们首先找到具有顶点 {A,D} 的 1-单纯形，并为其提供位模式 01 和 10，那么问题就变成了如何将 C 折叠到该范围内。唯一至少包含 2 位的位模式是 11，但它与 C 连接的节点相交，因此我们必须创建一个新集合并将 C 放入其中。此时的解决方案与上面类似。

如果 C 不相交，则 B 或 C 将获得位模式 11，而剩下的一个将需要一组新的并获得位模式 100。

假设 C 连接到 B，但不连接到 A 或 D。同样，该图有两个 1 -单纯形，但这次是不相交的。假设首先找到 {A,D}，如上给出 A 和 D 的位模式 10 和 01。我们可以将 B 或 C 折叠到现有范围中。该范围内唯一可用的位模式是 11，B 或 C 都可以获得该模式，因为它们都不与 A 或 D 相邻。一旦使用 11，则不会保留具有 2 个或更多位集的位模式，我们将不得不创建剩余节点的新集合，赋予其位模式 100。

假设 C 连接到所有 3 个 A、B 和 D。在这种情况下，该图有一个 2-单纯形，具有 3 个顶点 {A,C,D} 和一个 1 -具有 2 个顶点 {B, C} 的单纯形。如上所述，处理最大单形后，A、C 和 D 将具有位模式 001、010、100。为了将 B 折叠到该范围内，具有 2 个或更多位设置的可用位模式为：011、101、110 和111. 除了 101 之外，所有这些都与 C 相交，因此 B 将得到位模式 101。

那么问题就变成：如何有效地找到最大的全连通单纯形？

如果找到最大的全连通单纯形，完全连接的单纯形太昂贵了，可以通过查找连接方面的最大最小值来为潜在的完全连接的单纯形设置一个近似上限：

扫描更新边缘
顶点数为
连接边。
对于每个连接的节点，创建一个 Cn 计数的数组，最初为零
其中 Cn 是边数
连接到节点 n。
再次扫一遍边，对于相连的节点n1和n2，
增加 n1 中的计数
对应于Cn2，反之亦然。
如果Cn2> cn1，更新最后一次计数
，
再次扫描连接的节点，计算上界
每个节点可以最大的单纯形
成为其中的一部分。您可以构建一个带有顶点列表的鸽子数组
扫描节点时的每个上限。
从最大到最小提取鸽洞数组并
将节点折叠成唯一的集合。

如果您的节点位于集合 N 中，边位于集合 E 中，则复杂度将为：
O(|N|+|E|+O(Step 5))

如果上述近似足够了，问题就变成：在给定要求的情况下，如何有效地将节点折叠到现有范围中？

Not a complete answer, and I don't know how useful it will be to you. But here goes:

The hamming distance strikes me as a red herring. Your problem statement says it must be at least 1 but it could be 1000. It suffices to say the bit encoding for each node's set memberships is unique.

Your problem statement doesn't spell it out, but your solution above suggests every node must be a member of at least 1 set. ie. a bit encoding of all 0's is not allowed for any node's set memberships.

Ignoring connected nodes for a moment, disjoint nodes are easy: Simply number them sequentially with an unused bit encoding. Save those for last.

Your example above uses directed edges, but again, that strikes me as a red herring. If A cannot be in the same set as D because A=>D, D cannot be in the same set as A regardless whether D=>A.

You mention needing at least log(N) sets. You will also have at most N sets. A fully connected graph (with (N^2-N)/2 undirected edges) will require N sets each containing a single node.

In fact, if your graph contains a fully connected simplex of M dimensions (M in 1..N-1) with M+1 vertices and (M^2+M)/2 undirected edges, you will require at least M+1 sets.

In your example above, you have one such simplex (M=1) with 2 vertices {A,D} and 1 (undirected) edge {(A,D)}.

It would seem that your problem boils down to finding the largest fully connected simplexes in your graph. Stated differently, you have a routing problem: How many dimensions do you need to route your edges so none cross? It doesn't sound like a very scalable problem.

The first large simplex found is easy. Every vertex node gets a new set with its own bit.

The disjoint nodes are easy. Once the connected nodes are dealt with, simply number the disjoint nodes sequentially skipping any previously used bit patterns. From your example above, since A and D take 01 and 10, the next available bit pattern for B is 11.

The tricky part then becomes how to fold any remaining simplexes as much as possible into the existing range before creating any new sets with new bits. When folding, one must use 2 or more bits (sets) for each node, and the bits (sets) must not intersect with the bits (sets) for any adjacent node.

Consider what happens to your example above when one adds another node, C, to the example:

If C connects directly to both A and D, then the initial problem becomes finding the 2-simplex with 3 vertices {A,C,D} and 3 edges {(A,c),(A,D),(C,D)}. Once A, C and D take the bit patterns 001, 010 and 100, the lowest available bit pattern for disjoint B is 011.

If, on the other hand, C connects directly A or D but not both, the graph has two 1-simplexes. Supposing we find the 1-simplex with vertices {A,D} first giving them the bit patterns 01 and 10, the problem then becomes how to fold C into that range. The only bit pattern with at least 2 bits is 11, but that intersects with whichever node C connects to so we have to create a new set and put C in it. At this point, the solution is similar to the one above.

If C is disjoint, either B or C will get the bit pattern 11 and the remaining one will need a new set and get the bit pattern 100.

Suppose C connects to B but not to A or D. Again, the graph has two 1-simplexes but this time disjoint. Suppose {A,D} is found first as above giving A and D the bit patterns 10 and 01. We can fold B or C into the existing range. The only available bit pattern in the range is 11 and either B or C could get that pattern as neither is adjacent to A or D. Once 11 is used, no bit patterns with 2 or more bits set remain, and we will have to create a new set for the remaining node giving it the bit pattern 100.

Suppose C connects to all 3 A, B and D. In this case, the graph has a 2-simplex with 3 vertexes {A,C,D} and a 1-simplex with 2 vertexes {B, C}. Proceeding as above, after processing the largest simplex, A, C and D will have bit patterns 001, 010, 100. For folding B into this range, the available bit patterns with 2 or more bits set are: 011, 101, 110 and 111. All of these except 101 intersect with C so B would get the bit pattern 101.

The question then becomes: How efficiently can you find the largest fully-connected simplexes?

If finding the largest fully connected simplex is too expensive, one could put an approximate upper bound on potential fully connected simplexes by finding maximal minimums in terms of connections:

Sweep through the edges updating the
vertices with a count of the
connecting edges.
For each connected node, create an array of Cn counts initially zero
where Cn is the count of edges
connected to the node n.
Sweep through the edges again, for the connected nodes n1 and n2,
increment the count in n1
corresponding to Cn2 and vice versa.
If Cn2 > Cn1, update the last count
in the n1 array and vice versa.
Sweep through the connected nodes again, calculating an upper bound on
the largest simplex each node could
be a part of. You could build a pigeon-hole array with a list of vertices
for each upper bound as you sweep through the nodes.
Work through the pigeon-hole array from largest to smallest extracting and
folding nodes into unique sets.