SQL 字谜效率和逻辑？

发布于 2024-11-18 01:38:10 字数 251 浏览 2 评论 0原文

我有一个大约 200,000 个单词的 SQL 数据库。我需要一个查询，我将能够解决类似的字谜问题。不同之处在于我需要输入字符可以组成的所有可能的单词。例如，如果您输入ofdg，它应该输出单词：do、go和dog。您能估计一下这样的查询需要花费多少时间吗？我怎样才能让它更快、更高效？另外，一般来说，SQL 解析 200000 行的数据库需要多长时间。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何处潇湘 2024-11-25 01:38:10

为了解决这个问题，你需要做的第一件事就是将每个单词简化为拼字游戏玩家所说的字母表。也就是说，单词中的所有字母都按字母顺序排列。所以 do、go 和 dog 生成 do、go 和 dgo。当然，任何给定的字母表可能对应于多个单词，因此，例如，字母表 dgo 对应于单词 dog 和 god 。

您需要做的下一件事是构建一个包含键字母序列号和单个属性字段单词的表。

单词列表往往是静态的。例如，英语世界中的两个 Scrabble 单词列表大约每 5 年改变一次。所以你事先构建了这个查找表。性能是O(n)并且是沉没成本。也就是说，您执行一次并存储它，因此它不计入查询成本。您必须事先这样做。每次出现查询时都动态构建这样的索引是绝对没有意义的。

您可能想知道“Scrabble 到底是怎么回事？”答案是，您的 200,000 个单词正好位于英语世界两个认可的锦标赛单词列表之间。美国国家拼字游戏协会的官方锦标赛和俱乐部单词列表（2006 年）包含 178,691 个单词，而由世界英语拼字游戏玩家协会维护的国际列表包含 246,691 个单词。

当您收到查询时，您可以将提供的单词减少为一堆字母表。输入 odfg 生成字母 od fo go df dg fg dfo dgo fgo dfg dfgo (这是纯 SQL 中的一个非常好的编程问题，所以我必须假设有一个 PHP 或 Python 或 JavaScript 前端可以为您做到这一点）。然后您在数据库中进行查找。每个查询的成本应该大约为 O(log2 n)，换句话说，非常即时。这种查询正是关系数据库所擅长的。

顺便说一句，你的示例输出很差。 Alphagram dfgo 与拼字游戏玩家所称的“build”（所有可能的子集）使得 do od of 去 狗 神 雾。

（我讨厌必须做这些繁琐的事情，但孩之宝的律师很敏感，所以：Scrabble 是 Hasbro, Inc. 在美国拥有的注册商标；在加拿大由 Hasbro Canada Corporation 拥有；在世界其他地区拥有由美泰公司 JW Spear & Sons 开发。）

To solve this problem, the first thing you need to do is reduce every word to what Scrabble players call an alphagram. That is, all the letters in the word but in alphabetical order. So do, go and dog make do, go and dgo. Of course, any given alphagram may correspond to more than one word, so, for example, alphagram dgo corresponds to both the words dog and god.

The next thing you need to do is construct a table with a key alphagram-sequence number and a single attribute field word.

Word lists tend to be static. For example, the two Scrabble word lists in the English-speaking world change about every 5 years of so. So you construct this lookup table beforehand. Performance is O( n ) and it is a sunk cost. That is, you do it once and store it, so it is not counted against the cost of the query. You have to do this beforehand. It makes absolutely no sense to build such an index on the fly every time a query comes in.

You may be wondering "What is all this about Scrabble?" The answer is that your figure of 200,000 words falls neatly between the two approved tournament word lists in the English-speaking world. The US National Scrabble Association's Official Tournament and Club Word List (2006) contains 178,691 words, and the international list, maintained by the World English Scrabble Players' Association, contains 246,691.

When you get a query you reduce the supplied word to a bunch of alphagrams. Input odfg makes alphagrams od fo go df dg fg dfo dgo fgo dfg dfgo (which is a pretty programming problem in pure SQL, so I have to assume there is a PHP or Python or JavaScript front-end that will do that for you). Then you do the lookup in the database. The cost of each query should be approximately O(log2 n), in other words pretty damn immediate. That sort of query is what relational databases are good at.

BTW, your example output is poor. Alphagram dfgo with what Scrabble players call 'build' (all possible subsets) makes do od of go dog god fog.

(I hate to have to do this rigmarole, but Hasbro's lawyers are touchy, so: Scrabble is a registered trademark owned in the USA by Hasbro, Inc.; in Canada by Hasbro Canada Corporation; and throughout the rest of the world by J. W. Spear & Sons, a Mattel Company.)

回复收藏 0 原文