当前位置：文江博客话题详情

最大可能的字母矩形

发布于 2024-12-21 06:51:56 字数 552 浏览 1 评论 0原文

编写一个程序来找到最大可能的字母矩形，使得每一行形成一个单词（从左到右），每列形成一个单词（从上到下）。

我发现这个有趣的问题。这不是家庭作业，尽管听起来可能是家庭作业。（我不在学校）。我这样做是为了好玩。

示例
来自cat、汽车、ape、api、rep、< em>提示我们得到以下矩形（这是一个正方形）：
<前><代码>汽车猿提示

我最初的想法是构建某种前缀树，这样我就可以检索以特定字符串开头的所有单词。当我们已经有 2 个或更多单词（从上到下或从左到右阅读）并且需要找到下一个要添加的单词时，这将很有用。

还有其他想法吗？

编辑

可以用长方体（3D 矩形）来完成吗？

如果它需要在对角线上有有效的单词怎么办（想法来源：user645466）；如何优化它的算法？

原文

Write a program to find the largest possible rectangle of letters such that every row forms a word (left to right) and every column forms a word (top to bottom).

I found this interesting question. It's not homework, though it may sound as such. (I'm not in school). I'm doing this for fun.

Example
From cat, car, ape, api, rep, tip we get the following rectangle (which is a square):
c a r
a p e
t i p

My initial idea is to build a some sort of a prefix tree so I can retrieve all words that start with a specific string. This would be useful when we already have 2 or more words (either reading top to bottom or left to right) and we need to find the next word to add.

Any other ideas?

Edit

Could this be done with a cuboid (3D rectangle)?

What if it needs to have valid words on the diagonals (idea credit: user645466); how would the algo for it be optimized?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

開玄 2024-12-28 06:51:56

设 D 为字典。固定 m 和 n。我们可以将寻找 m × n 矩形的问题表述为约束满足问题 (CSP)。

x_i,1…x_i,n ∈ D ∀i ∈ {1, …, m}
x_1,j…x_m,j ∈ D ∀j ∈ {1, …, n}
x_i,j ∈ {A, …, Z} ∀i ∈ {1, …, m}, ∀j ∈ {1, …, n}

}解决 CSP 的常见方法是将回溯与约束传播相结合。宽松地说，回溯意味着我们选择一个变量，猜测它的值，并用更少的一个变量递归地解决子问题，而约束传播意味着尝试减少每个变量的可能性数量（可能为零，这意味着没有解决方案）。

例如，我们可以通过选择 x_1,1 = Q 来启动 3 × 3 网格。

Q??
???
???

对于英语词典，x_1,2 和 x_2,1 的唯一可能是 U（在 Scrabble 之前的“words”）。

解决 CSP 的技巧是在回溯和约束传播之间取得平衡。如果我们根本不传播约束，那么我们只是在使用蛮力。如果我们完美地传播约束，那么我们就不需要回溯，但是单独解决 NP 难问题的传播算法的运行成本可能相当昂贵。

在这个问题中，除非我们有良好的数据结构支持，否则在每个回溯节点上使用大型字典将变得昂贵。我将概述一种使用 trie 或 DAWG 快速计算字母集，通过这些字母集前缀可以扩展为完整的单词。

在每个回溯节点，我们分配的变量集是一个 Young 表。换句话说，只有在对变量上方和左侧的变量进行赋值之前，才会对变量进行赋值。在下图中，. 表示已赋值的变量，* 和 ? 表示未赋值的变量。

.....*
...*??
...*??
..*???
*?????

标记为 * 的变量是下一个要赋值的候选变量。拥有多种选择而不是每次选择固定变量的优点是某些变量排序可能比其他变量排序要好得多。

对于每个 *，在 trie/DAWG 中进行两次查找，一次查找水平方向，一次查找垂直方向，并计算下一个字母组的交集。一种经典的策略是选择可能性最少的变量，希望我们能更快地找到矛盾。我喜欢这个策略，因为当变量的可能性为零时，它会自然地修剪；当变量的可能性为 1 时，它会自然地传播。通过缓存结果，我们可以非常快速地评估每个节点。

Let D be the dictionary. Fix m and n. We can formulate the problem of finding an m × n rectangle as a constraint satisfaction problem (CSP).

x_i,1…x_i,n ∈ D    ∀i ∈ {1, …, m}
x_1,j…x_m,j ∈ D    ∀j ∈ {1, …, n}
x_i,j ∈ {A, …, Z}    ∀i ∈ {1, …, m}, ∀j ∈ {1, …, n}

A very common approach for solving CSPs is to combine backtracking with constraint propagation. Loosely speaking, backtracking means we pick a variable, guess its value, and recursively solve the subproblem with one fewer variable, and constraint propagation means trying to reduce the number of possibilities for each variable (possibly to zero, which means there's no solution).

As an example, we might start a 3 × 3 grid by choosing x_1,1 = Q.

Q??
???
???

With an English dictionary, the only possibility for x_1,2 and x_2,1 is U (in before Scrabble “words”).

The art of solving CSPs is balancing between backtracking and constraint propagation. If we don't propagate constraints at all, then we're just using brute force. If we propagate constraints perfectly, then we don't need to backtrack, but a propagation algorithm that solves an NP-hard problem by itself is probably rather expensive to run.

In this problem, working with a large dictionary at each backtracking node will get expensive unless we have good data structure support. I'll outline an approach that uses a trie or a DAWG quickly to compute the set of letters via which a prefix extends to a complete word.

At each backtracking node, the set of variables we have assigned is a Young tableau. In other words, no variable is assigned until the variables above it and to the left have been assigned. In the diagram below, . denotes an assigned variable and * and ? denote unassigned variables.

.....*
...*??
...*??
..*???
*?????

The variables marked * are candidates for the next to be assigned a value. The advantage of having multiple choices rather choosing a fixed variable each time is that some variable orderings can be much better than others.

For each *, make two lookups into the trie/DAWG, one for the horizontal and one for the vertical, and compute the intersection of the sets of letters that can come next. One classic strategy is to choose the variable with the fewest possibilities in the hope that we reach a contradiction faster. I like this strategy because it prunes naturally when there's a variable with zero possibilities and propagates naturally when there's a variable with one. By caching results, we can make evaluating each node very fast.

回复收藏 0 原文

帅的被狗咬 2024-12-28 06:51:56

给定给定长度的单词字典，创建两个新字典：第一个包含单词的所有单字母前缀（可以是给定长度的单词的第一个字母的所有字母），第二个包含所有双字母前缀单词的数量（可以是给定长度的单词的前两个字母的所有两个字母的序列）。您也可以使用三重前缀，但您可能不需要超出这个范围。

从字典中选择一个单词，命名为X。这将是矩阵的第一行。
使用您创建的方便列表检查 X[1]、X[2]、...、X[N] 是否都是有效的单字母前缀。如果是，则继续步骤 3；否则返回步骤 1。
从字典中选择一个单词，命名为 Y。这将是矩阵的第二行。
检查 X[1] Y[1]、X[2] Y[2]、...、X[N] Y[ N] 都是使用您创建的方便列表的有效双字母前缀。如果是，则继续步骤 5；否则返回步骤 3。如果这是字典中的最后一个单词，则一直返回步骤 1。
...
2(N-1)。从字典中选择一个单词，命名为Z。这将是矩阵的第 N 行。
2N。检查 X[1] Y[1] ... Z[1]、X[2] Y[2] ... Z[2]、.. ., X[N] Y[N] ... Z[N] 都是字典中的单词。如果是，那么恭喜你，你已经做到了！否则返回步骤2(N-1)。如果这是字典中的最后一个单词，则一直返回到步骤 2(n-3)。

其逻辑是一次一行地构建单词矩形，为行选择单词，然后检查该列是否可以补充为单词。这比一次添加一个字母要快得多。

回复收藏 0 原文

晌融 2024-12-28 06:51:56

为相同长度=索引的单词创建一个Bag[]，然后创建
一组 Tries，每个长度的 wordList 一个 Trie

   Rectangle makeRectangle(length, height, rectangle)
    {
        if ( length == rectangle.length()) check if complete and return;
        checkIfPartialComplete - check columns for valid prefixes
        for ( i from 1 to grouplist[i-1])
        {
            newRectangle = append word[i] to rectangle 
            return makeRectangle(l,h, rectangle with appended word) if not equal to null
        }
    }


boolean checkPartial(int l, Trie trie)
{
    for ( int i =0 ; i < l; i++)
    {
        String col = getColumn(i);
        if (!trie.contains(col))
        {
            return false;
        }
    }
    return true;
}
boolean checkComplete(int l, Bag<String> lengthGroup)
{
    for ( int i=0; i < l ; i++)
    {
        String col = getColumn(i);
        if (!lengthGroup.contains(col))
        {
            return false;
        }
    }
    return true;
}

Create a Bag[] for word of the same length = index then create
an array of Tries, one Trie for wordList of each length

   Rectangle makeRectangle(length, height, rectangle)
    {
        if ( length == rectangle.length()) check if complete and return;
        checkIfPartialComplete - check columns for valid prefixes
        for ( i from 1 to grouplist[i-1])
        {
            newRectangle = append word[i] to rectangle 
            return makeRectangle(l,h, rectangle with appended word) if not equal to null
        }
    }


boolean checkPartial(int l, Trie trie)
{
    for ( int i =0 ; i < l; i++)
    {
        String col = getColumn(i);
        if (!trie.contains(col))
        {
            return false;
        }
    }
    return true;
}
boolean checkComplete(int l, Bag<String> lengthGroup)
{
    for ( int i=0; i < l ; i++)
    {
        String col = getColumn(i);
        if (!lengthGroup.contains(col))
        {
            return false;
        }
    }
    return true;
}

回复收藏 0 原文

~没有更多了~