改进单词搜索游戏最坏的情况

发布于 2024-12-07 15:48:27 字数 2962 浏览 2 评论 0原文

考虑:

a c p r c 
x s o p c 
v o v n i 
w g f m n 
q a t i t

如果 i_index 位于 旁边,则拼贴中的字母表 i_index 与另一个字母表 j_index 相邻 j_index位于以下任意位置:

* * *
* x *
* * *

这里所有的*表示与x相邻的位置。

任务是在图块中找到给定的字符串。条件是给定字符串的所有字符都应该相邻,并且图块中的任何一个字符都不能多次使用来构造给定字符串。

我想出了一个简单的回溯解决方案,该解决方案的速度相当快,但最坏情况下的时间确实更糟。

举个例子:假设图块有 4x4 填充了所有 a,因此有 16 个 a,要查找的字符串是 aaaaaaaaaaaaaaaab ,即 15 个 a 和 1 个 b 。一种是消除包含未出现在图块中的字符的字符串。但最坏的情况仍然可能出现,比如图块有 abababababababab 并且要查找的字符串是 abababababababbb

我的尝试是这样的:

https://ideone.com/alUPf:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX 5

int sp (char mat[MAX][MAX], char *pat, int c, int i, int j)
{
  int r = 0;
  char temp;


  if (c == strlen (pat))
    return 1;
  if (((i<0) || (j<0)) || (i>=MAX) || (j>=MAX))
        return 0;
  if (mat[i][j] != pat[c])
    return 0;
  if (isupper (mat[i][j]))
    return 0;


  /* Save character and mark location to indicate
   * DFS has visited this node, to stop other branches
   * to enter here and cross over path
   */
  temp = mat[i][j];
  mat[i][j] = 0;

  r |= sp (mat, pat, c+1, i-1, j-1);
  r |= sp (mat, pat, c+1, i-1, j);
  r |= sp (mat, pat, c+1, i-1, j+1);
  r |= sp (mat, pat, c+1, i, j+1);
  r |= sp (mat, pat, c+1, i+1, j+1);
  r |= sp (mat, pat, c+1, i+1, j);
  r |= sp (mat, pat, c+1, i+1, j-1);
  r |= sp (mat, pat, c+1, i, j-1);

  /* restore value */
  mat[i][j] = temp;

  /* mark if success */
  if ((mat[i][j] == pat[c]) && (r == 1))
    mat[i][j] = toupper (mat[i][j]);

  return r;
}

/* Testing the `sp` module */
int main (void)
{
  char mat[MAX][MAX] = {
                     {'a', 'c', 'p', 'r', 'c'},
                     {'x', 's', 'o', 'p', 'c'},
                     {'v', 'o', 'v', 'n', 'i'},
                     {'w', 'g', 'f', 'm', 'n'},
                     {'q', 'a', 't', 'i', 't'}
                   };
  char pat[] = "microsoft";
  int i, j;

  for (i=0; i<5; i++)
  {
    for (j=0; j<5; j++)
      printf ("%c ", mat[i][j]);
    printf ("\n");
  }

  for (i=0; i<5; i++)
    for (j=0; j<5; j++)
      sp (mat, pat, 0, i, j);

  printf ("\n\n\n");
  for (i=0; i<5; i++)
  {
    for (j=0; j<5; j++)
    {
      if (isupper (mat[i][j]))
        printf ("%c ", mat[i][j]);
      else
        printf (". ");
    }
    printf ("\n");
  }
  printf ("\n");
  return 0;
}

打印:

a c p r c 
x s o p c 
v o v n i 
w g f m n 
q a t i t 



. . . R . 
. S O . C 
. O . . I 
. . F M . 
. . T . . 

函数 sp 完成工作,执行回溯。

有更好的办法吗?或者是否有可能缩短最坏情况的时间?

Consider:

a c p r c 
x s o p c 
v o v n i 
w g f m n 
q a t i t

An alphabet i_index is adjacent to another alphabet j_index in the tile if i_index is next to j_index in any of the following positions:

* * *
* x *
* * *

Here all the * indicates the location which are adjacent to x.

The task is to find a given string in the tile. The condition is that all the characters of the given string should be adjacent, and no one character in the tile may be used more than once to construct the given string.

I have came up with a simply backtracking solution, for which the solutions are pretty fast, but the worst case time is really worse.

For an example: Say the tile has 4x4 filled with all a's , therefore 16 a's, and the string to find is aaaaaaaaaaaaaaab, that is, 15 a's and one b . One what is to eliminate strings with characters which does not appear in the tile. But still worst case can still appear with say the tile have abababababababab and the string to find is abababababababbb .

My attempt is like this:

https://ideone.com/alUPf:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAX 5

int sp (char mat[MAX][MAX], char *pat, int c, int i, int j)
{
  int r = 0;
  char temp;


  if (c == strlen (pat))
    return 1;
  if (((i<0) || (j<0)) || (i>=MAX) || (j>=MAX))
        return 0;
  if (mat[i][j] != pat[c])
    return 0;
  if (isupper (mat[i][j]))
    return 0;


  /* Save character and mark location to indicate
   * DFS has visited this node, to stop other branches
   * to enter here and cross over path
   */
  temp = mat[i][j];
  mat[i][j] = 0;

  r |= sp (mat, pat, c+1, i-1, j-1);
  r |= sp (mat, pat, c+1, i-1, j);
  r |= sp (mat, pat, c+1, i-1, j+1);
  r |= sp (mat, pat, c+1, i, j+1);
  r |= sp (mat, pat, c+1, i+1, j+1);
  r |= sp (mat, pat, c+1, i+1, j);
  r |= sp (mat, pat, c+1, i+1, j-1);
  r |= sp (mat, pat, c+1, i, j-1);

  /* restore value */
  mat[i][j] = temp;

  /* mark if success */
  if ((mat[i][j] == pat[c]) && (r == 1))
    mat[i][j] = toupper (mat[i][j]);

  return r;
}

/* Testing the `sp` module */
int main (void)
{
  char mat[MAX][MAX] = {
                     {'a', 'c', 'p', 'r', 'c'},
                     {'x', 's', 'o', 'p', 'c'},
                     {'v', 'o', 'v', 'n', 'i'},
                     {'w', 'g', 'f', 'm', 'n'},
                     {'q', 'a', 't', 'i', 't'}
                   };
  char pat[] = "microsoft";
  int i, j;

  for (i=0; i<5; i++)
  {
    for (j=0; j<5; j++)
      printf ("%c ", mat[i][j]);
    printf ("\n");
  }

  for (i=0; i<5; i++)
    for (j=0; j<5; j++)
      sp (mat, pat, 0, i, j);

  printf ("\n\n\n");
  for (i=0; i<5; i++)
  {
    for (j=0; j<5; j++)
    {
      if (isupper (mat[i][j]))
        printf ("%c ", mat[i][j]);
      else
        printf (". ");
    }
    printf ("\n");
  }
  printf ("\n");
  return 0;
}

which prints:

a c p r c 
x s o p c 
v o v n i 
w g f m n 
q a t i t 



. . . R . 
. S O . C 
. O . . I 
. . F M . 
. . T . . 

The function sp does the work, performs the back tracking.

Is there a better way ? or is it possible to lower the worst case time ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

终陌 2024-12-14 15:48:27

没有多项式算法,所以我认为你不会变得更好......

可以使用字母矩阵对任何网格图(具有度数<= 4的节点的平面图)进行编码。下面的网格

0-0-0
| | |
0 0-0
| | 
0-0-0

可以通过将边变成“a”、将顶点变成“b”、将空格变成“z”来转换

B a B a B  
a z a z a  
B z B a B  
a z a z z  
B a B a B 

在原始图中寻找哈密顿路径相当于搜索字符串 BaBaBaBaBaBaBaBaB(全部 9 个 B)。但是网格的哈密顿路径问题是 NP 完全*的,因此单词搜索问题是 NP 困难的。

由于单词路径显然是多项式证明,因此矩阵上的单词搜索问题是NP完全的


*我记得不久前看到过一个证明,维基百科证实了这一点,但没有链接到参考文献>:/


我很确定这个问题还有更多内容。我刚刚从我的屁股里拿出了这个证据,我很确定我不是第一个看到这个问题的人。至少有一个很好的机会对你在真正的杂志谜题中得到的非退化案例进行很好的启发......

There is no polynomial algorithm, so I don't think you can get much better...

It is possible to encode any grid graph (a planar graph with nodes with degree <= 4) using a letter matrix. The following grid

0-0-0
| | |
0 0-0
| | 
0-0-0

Can be converted by turning edges into 'a's, vertices into 'b's and empty spaces into 'z's

B a B a B  
a z a z a  
B z B a B  
a z a z z  
B a B a B 

Looking for a hamiltonian path in the original graph is equivalent to searching for the string BaBaBaBaBaBaBaBaB (with all 9 Bs). But the Hamiltonian path problem for grids in NP-complete* so the word searching problem is NP-hard.

Since a word path is clearly a polynomial certificate, the word searching problem on matrices is NP-complete.


*I remember seeing a proof for this a while ago and Wikipedia confirms, but without linking to a reference >:/


I'm pretty sure theres more on this problem out there. I just pulled this proof out of my ass and I'm pretty sure were not the first ones to look at the problem. At the least there is a good chance for nice heuristics on the non-degenerate cases you get in a real magazine puzzle...

黯然 2024-12-14 15:48:27

一个简单的改进是在每次调用 sp 后检查 r 的值。如果r == 1则停止调用sp。你找到了你的解决方案。除非您需要找到所有可能的解决方案。

像这样的东西(如果第一个操作数为真,逻辑“或”运算符不会计算第二个操作数):

r = sp (mat, pat, c+1, i-1, j-1)) ||
  sp (mat, pat, c+1, i-1, j) ||
  sp (mat, pat, c+1, i-1, j+1) ||
  sp (mat, pat, c+1, i, j+1) ||
  sp (mat, pat, c+1, i+1, j+1) ||
  sp (mat, pat, c+1, i+1, j) ||
  sp (mat, pat, c+1, i+1, j-1) ||
  sp (mat, pat, c+1, i, j-1) ? 1 : 0;

One simple improvement is to check the value of r after each call to sp. If r == 1 then stop calling sp. You found your solution. This is unless you need to find all possible solutions.

Something like this (logical OR operator does not calculate second operand if first is true):

r = sp (mat, pat, c+1, i-1, j-1)) ||
  sp (mat, pat, c+1, i-1, j) ||
  sp (mat, pat, c+1, i-1, j+1) ||
  sp (mat, pat, c+1, i, j+1) ||
  sp (mat, pat, c+1, i+1, j+1) ||
  sp (mat, pat, c+1, i+1, j) ||
  sp (mat, pat, c+1, i+1, j-1) ||
  sp (mat, pat, c+1, i, j-1) ? 1 : 0;
风月客 2024-12-14 15:48:27

我想你可能会发现,关注最坏的情况会适得其反,因为没有真正的改进可做。然而,在“现实世界”的情况下还有许多有用的改进需要进行。例如,如果单词总是从字典中提取,它们的长度是否可能受到限制(或者从统计角度来看,具有自然的长度分布)。对于小网格,您可以想象提前搜索它们以从字典中查找所有单词,将列表存储在哈希图中,然后在需要“测试”单词时执行简单的查找。所有时间都花在构建索引上,但如果网格很少发生变化,这可能是可以接受的。

I think you might find that focusing on the worst case is counterproductive here, because there is no real improvement to be made. However, there are many useful improvements to be made in "real world" cases. For example, if the words are always drawn from a dictionary, if they may be limited in length (or have a natural distribution of lengths, statistically speaking). For small grids you could imagine searching them in advance to find all words from a dictionary, storing the list in a hashmap, and then performing a simple lookup as words need to be "tested". All the time goes into building your index, but that may be acceptable if the grid rarely changes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文