如何从字符数组中查找单词?

发布于 2024-11-08 07:24:51 字数 371 浏览 0 评论 0原文

解决这个问题的最佳方法是什么:

我有一组数组,每个数组中有 3-4 个字符,如下所示:

{p,     {a,    {t,    {m,
 q,      b,     u,     n,
 r,      c      v      o
 s      }      }      }
}

我还有一组字典单词。

查找字符数组是否可以组合形成字典单词之一的最佳/最快方法是什么?例如,上面的数组可以组成单词:

“pat”、“rat”、“at”、“to”、“bum”(笑)
但不是“nub”或“mat”

我应该循环遍历字典来查看是否有单词可以制作或获取字母的所有组合,然后将它们与字典进行比较

What is the best way to solve this:

I have a group of arrays with 3-4 characters inside each like so:

{p,     {a,    {t,    {m,
 q,      b,     u,     n,
 r,      c      v      o
 s      }      }      }
}

I also have an array of dictionary words.

What is the best/fastest way to find if the array of characters can combine to form one of the dictionary words? For example, the above arrays could make the words:

"pat","rat","at","to","bum"(lol)
but not "nub" or "mat"

Should i loop through the dictionary to see if words can be made or get all the combinations from the letters then compare those to the dictionary

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

仲春光 2024-11-15 07:24:51

我手头上有一些拼字游戏代码,所以我能够将它们组合在一起。我使用的字典是sowpods(267751个单词)。下面的代码将字典读取为文本文件,每行一个大写单词。

代码是 C#:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Diagnostics;

namespace SO_6022848
{
  public struct Letter
  {
    public const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    public static implicit operator Letter(char c)
    {
      return new Letter() { Index = Chars.IndexOf(c) };
    }
    public int Index;
    public char ToChar()
    {
      return Chars[Index];
    }
    public override string ToString()
    {
      return Chars[Index].ToString();
    }
  }

  public class Trie
  {
    public class Node
    {
      public string Word;
      public bool IsTerminal { get { return Word != null; } }
      public Dictionary<Letter, Node> Edges = new Dictionary<Letter, Node>();
    }

    public Node Root = new Node();

    public Trie(string[] words)
    {
      for (int w = 0; w < words.Length; w++)
      {
        var word = words[w];
        var node = Root;
        for (int len = 1; len <= word.Length; len++)
        {
          var letter = word[len - 1];
          Node next;
          if (!node.Edges.TryGetValue(letter, out next))
          {
            next = new Node();
            if (len == word.Length)
            {
              next.Word = word;
            }
            node.Edges.Add(letter, next);
          }
          node = next;
        }
      }
    }

  }

  class Program
  {
    static void GenWords(Trie.Node n, HashSet<Letter>[] sets, int currentArrayIndex, List<string> wordsFound)
    {
      if (currentArrayIndex < sets.Length)
      {
        foreach (var edge in n.Edges)
        {
          if (sets[currentArrayIndex].Contains(edge.Key))
          {
            if (edge.Value.IsTerminal)
            {
              wordsFound.Add(edge.Value.Word);
            }
            GenWords(edge.Value, sets, currentArrayIndex + 1, wordsFound);
          }
        }
      }
    }

    static void Main(string[] args)
    {
      const int minArraySize = 3;
      const int maxArraySize = 4;
      const int setCount = 10;
      const bool generateRandomInput = true;

      var trie = new Trie(File.ReadAllLines("sowpods.txt"));
      var watch = new Stopwatch();
      var trials = 10000;
      var wordCountSum = 0;
      var rand = new Random(37);

      for (int t = 0; t < trials; t++)
      {
        HashSet<Letter>[] sets;
        if (generateRandomInput)
        {
          sets = new HashSet<Letter>[setCount];
          for (int i = 0; i < setCount; i++)
          {
            sets[i] = new HashSet<Letter>();
            var size = minArraySize + rand.Next(maxArraySize - minArraySize + 1);
            while (sets[i].Count < size)
            {
              sets[i].Add(Letter.Chars[rand.Next(Letter.Chars.Length)]);
            }
          }
        }
        else
        {
          sets = new HashSet<Letter>[] { 
          new HashSet<Letter>(new Letter[] { 'P', 'Q', 'R', 'S' }), 
          new HashSet<Letter>(new Letter[] { 'A', 'B', 'C' }), 
          new HashSet<Letter>(new Letter[] { 'T', 'U', 'V' }), 
          new HashSet<Letter>(new Letter[] { 'M', 'N', 'O' }) };
        }

        watch.Start();
        var wordsFound = new List<string>();
        for (int i = 0; i < sets.Length - 1; i++)
        {
          GenWords(trie.Root, sets, i, wordsFound);
        }
        watch.Stop();
        wordCountSum += wordsFound.Count;
        if (!generateRandomInput && t == 0)
        {
          foreach (var word in wordsFound)
          {
            Console.WriteLine(word);
          }
        }
      }
      Console.WriteLine("Elapsed per trial = {0}", new TimeSpan(watch.Elapsed.Ticks / trials));
      Console.WriteLine("Average word count per trial = {0:0.0}", (float)wordCountSum / trials);
    }

  }

}

这是使用测试数据时的输出:

PA
PAT
PAV
QAT
RAT
RATO
RAUN
SAT
SAU
SAV
SCUM
AT
AVO
BUM
BUN
CUM
TO
UM
UN
Elapsed per trial = 00:00:00.0000725
Average word count per trial = 19.0

以及使用随机数据时的输出(不打印每个单词):

Elapsed per trial = 00:00:00.0002910
Average word count per trial = 62.2

编辑: 我通过两个更改使其速度更快: 存储单词位于特里树的每个终端节点,这样就不必重建它。并将输入字母存储为哈希集数组而不是数组数组,以便 Contains() 调用速度更快。

I had some Scrabble code laying around, so I was able to throw this together. The dictionary I used is sowpods (267751 words). The code below reads the dictionary as a text file with one uppercase word on each line.

The code is C#:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Diagnostics;

namespace SO_6022848
{
  public struct Letter
  {
    public const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    public static implicit operator Letter(char c)
    {
      return new Letter() { Index = Chars.IndexOf(c) };
    }
    public int Index;
    public char ToChar()
    {
      return Chars[Index];
    }
    public override string ToString()
    {
      return Chars[Index].ToString();
    }
  }

  public class Trie
  {
    public class Node
    {
      public string Word;
      public bool IsTerminal { get { return Word != null; } }
      public Dictionary<Letter, Node> Edges = new Dictionary<Letter, Node>();
    }

    public Node Root = new Node();

    public Trie(string[] words)
    {
      for (int w = 0; w < words.Length; w++)
      {
        var word = words[w];
        var node = Root;
        for (int len = 1; len <= word.Length; len++)
        {
          var letter = word[len - 1];
          Node next;
          if (!node.Edges.TryGetValue(letter, out next))
          {
            next = new Node();
            if (len == word.Length)
            {
              next.Word = word;
            }
            node.Edges.Add(letter, next);
          }
          node = next;
        }
      }
    }

  }

  class Program
  {
    static void GenWords(Trie.Node n, HashSet<Letter>[] sets, int currentArrayIndex, List<string> wordsFound)
    {
      if (currentArrayIndex < sets.Length)
      {
        foreach (var edge in n.Edges)
        {
          if (sets[currentArrayIndex].Contains(edge.Key))
          {
            if (edge.Value.IsTerminal)
            {
              wordsFound.Add(edge.Value.Word);
            }
            GenWords(edge.Value, sets, currentArrayIndex + 1, wordsFound);
          }
        }
      }
    }

    static void Main(string[] args)
    {
      const int minArraySize = 3;
      const int maxArraySize = 4;
      const int setCount = 10;
      const bool generateRandomInput = true;

      var trie = new Trie(File.ReadAllLines("sowpods.txt"));
      var watch = new Stopwatch();
      var trials = 10000;
      var wordCountSum = 0;
      var rand = new Random(37);

      for (int t = 0; t < trials; t++)
      {
        HashSet<Letter>[] sets;
        if (generateRandomInput)
        {
          sets = new HashSet<Letter>[setCount];
          for (int i = 0; i < setCount; i++)
          {
            sets[i] = new HashSet<Letter>();
            var size = minArraySize + rand.Next(maxArraySize - minArraySize + 1);
            while (sets[i].Count < size)
            {
              sets[i].Add(Letter.Chars[rand.Next(Letter.Chars.Length)]);
            }
          }
        }
        else
        {
          sets = new HashSet<Letter>[] { 
          new HashSet<Letter>(new Letter[] { 'P', 'Q', 'R', 'S' }), 
          new HashSet<Letter>(new Letter[] { 'A', 'B', 'C' }), 
          new HashSet<Letter>(new Letter[] { 'T', 'U', 'V' }), 
          new HashSet<Letter>(new Letter[] { 'M', 'N', 'O' }) };
        }

        watch.Start();
        var wordsFound = new List<string>();
        for (int i = 0; i < sets.Length - 1; i++)
        {
          GenWords(trie.Root, sets, i, wordsFound);
        }
        watch.Stop();
        wordCountSum += wordsFound.Count;
        if (!generateRandomInput && t == 0)
        {
          foreach (var word in wordsFound)
          {
            Console.WriteLine(word);
          }
        }
      }
      Console.WriteLine("Elapsed per trial = {0}", new TimeSpan(watch.Elapsed.Ticks / trials));
      Console.WriteLine("Average word count per trial = {0:0.0}", (float)wordCountSum / trials);
    }

  }

}

Here is the output when using your test data:

PA
PAT
PAV
QAT
RAT
RATO
RAUN
SAT
SAU
SAV
SCUM
AT
AVO
BUM
BUN
CUM
TO
UM
UN
Elapsed per trial = 00:00:00.0000725
Average word count per trial = 19.0

And the output when using random data (does not print each word):

Elapsed per trial = 00:00:00.0002910
Average word count per trial = 62.2

EDIT: I made it much faster with two changes: Storing the word at each terminal node of the trie, so that it doesn't have to be rebuilt. And storing the input letters as an array of hash sets instead of an array of arrays, so that the Contains() call is fast.

桃酥萝莉 2024-11-15 07:24:51

解决这个问题的方法可能有很多。

您感兴趣的是可用于组成单词的每个字符的数量,以及每个字典单词需要多少个字符。诀窍在于如何有效地在字典中查找这些信息。

也许您可以使用前缀树(trie)、某种智能哈希表或类似的。

不管怎样,你可能必须尝试所有的可能性,并对照字典进行检查。即,如果您有三个数组,每个数组包含三个值,则将有 3^3+3^2+3^1=39 个组合需要检查。如果这个过程太慢,那么也许你可以在字典前面添加一个 Bloom 过滤器,快速检查某个单词是否肯定不在字典中。

编辑: 不管怎样,这不是和 Scrabble 本质上一样吗?也许尝试谷歌搜索“拼字游戏算法”会给你一些很好的线索。

There are probably many way of solving this.

What you are interested in is the number of each character you have available to form a word, and how many of each character is required for each dictionary word. The trick is how to efficiently look up this information in the dictionary.

Perhaps you can use a prefix tree (a trie), some kind of smart hash table, or similar.

Anyway, you will probably have to try out all your possibilities and check them against the dictionary. I.e., if you have three arrays of three values each, there will be 3^3+3^2+3^1=39 combinations to check out. If this process is too slow, then perhaps you could stick a Bloom filter in front of the dictionary, to quickly check if a word is definitely not in the dictionary.

EDIT: Anyway, isn't this essentially the same as Scrabble? Perhaps try Googling for "scrabble algorithm" will give you some good clues.

度的依靠╰つ 2024-11-15 07:24:51

只需通过生成和测试即可回答重新表述的问题。由于您有 4 个字母和 10 个数组,因此您只有大约 100 万种可能的组合(如果允许空白字符,则为 1000 万种)。您需要一种有效的方法来查找它们,使用 BDB 或某种基于磁盘的哈希。

之前发布的 trie 解决方案应该也可以工作,只是您在搜索的每一步中可以选择的字符更多地受到限制。它也应该更快。

The reformulated question can be answered just by generating and testing. Since you have 4 letters and 10 arrays, you've only got about 1 million possible combinations (10 million if you allow a blank character). You'll need an efficient way to look them up, use a BDB or some sort of disk based hash.

The trie solution previously posted should work as well, you are just restricted more by what characters you can choose at each step of the search. It should be faster as well.

多彩岁月 2024-11-15 07:24:51

我刚刚做了一个非常大的嵌套 for 循环,如下所示:

for(NSString*s1 in [letterList objectAtIndex:0]{
    for(NSString*s2 in [letterList objectAtIndex:1]{
       8 more times...
    }
}

然后我对组合进行二分搜索,看看它是否在字典中,如果在,则将其添加到数组中

I just made a very large nested for loop like this:

for(NSString*s1 in [letterList objectAtIndex:0]{
    for(NSString*s2 in [letterList objectAtIndex:1]{
       8 more times...
    }
}

Then I do a binary search on the combination to see if it is in the dictionary and add it to an array if it is

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文