生成填字游戏的算法
给定一个单词列表,您将如何将它们排列成填字游戏网格?
它不必像对称的“正确”填字游戏或类似的东西:基本上只是输出每个单词的起始位置和方向。
Given a list of words, how would you go about arranging them into a crossword grid?
It wouldn't have to be like a "proper" crossword puzzle which is symmetrical or anything like that: basically just output a starting position and direction for each word.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(13)
虽然这是一个较旧的问题,但我会根据我所做的类似工作尝试回答。
解决约束问题的方法有很多(一般属于 NPC 复杂性类)。
这与组合优化和约束规划有关。 在这种情况下,约束是网格的几何形状和单词唯一的要求等。
随机化/退火方法也可以工作(尽管在适当的设置内)。
高效简约或许才是终极智慧!
要求是或多或少完整的填字游戏编译器和(可视化所见即所得)构建器。
抛开所见即所得构建器部分,编译器概要如下:
加载可用的单词列表(按单词长度排序,即 2,3,..,20)
在用户构建的网格上查找词槽(即网格词)(例如,位于 x,y 处的词,长度为 L,水平或垂直)(复杂度 O(N) )< /p>
计算网格词的交点(需要填充的)(复杂度 O(N^2) )
计算交集单词列表中单词的数量以及所使用的字母表中的各个字母(这允许使用模板搜索匹配的单词,例如Sik Cambon 论文所使用的通过 cwc )(复杂度 O(WL*AL) )
步骤 .3 和 .4 允许执行此任务:
A。 网格单词与其自身的交叉点能够创建一个“模板”,用于尝试在该网格单词的可用单词的相关单词列表中找到匹配项(通过使用与该单词相交的其他单词的字母,这些单词已经在某个位置填充)算法的步骤)
b. 单词列表中的单词与字母表的交集能够找到与给定“模板”匹配的匹配(候选)单词(例如,第一个位置的“A”和第三个位置的“B”等。)
因此,使用这些数据结构实现所使用的算法是这样的:
注意:如果网格和单词数据库是恒定的,则前面的步骤只需完成一次。
算法的第一步是随机选择一个空词槽(网格词),并用关联词列表中的候选词填充它(随机化能够在算法的连续执行中产生不同的解决方案)(复杂度 O(1)或 O(N) )
对于每个仍然为空的字槽(与已填充的字槽有交集),计算约束比(这可能会有所不同,简单的是可用解决方案的数量该步骤)并按此比率对空字槽进行排序(复杂度 O(NlogN) 或 O(N) )
循环遍历上一步计算的空字槽并为每个空字槽尝试一些候选解决方案(确保“弧一致性被保留”,即如果使用这个词,网格在这一步之后有一个解决方案)并根据下一步的最大可用性对它们进行排序(即下一步有最大可能)解决方案,如果该单词当时在该地点使用,等等..)(复杂度 O(N*MaxCandidatesUsed) )
填充该单词(将其标记为已填充并转到步骤 2)
如果没有找到满足步骤 .3 的标准的单词,请尝试回溯到先前步骤的另一个候选解决方案(此处的标准可能有所不同)(复杂度 O(N) )
如果找到回溯,请使用替代方案,并可选择重置任何已填充的单词可能需要重置(再次将它们标记为未填充)(复杂度 O(N) )
如果没有找到回溯,则无法找到解决方案(至少使用此配置、初始种子等。)
否则,当所有字块都被填满时,您有一个解决方案
该算法对问题的解决方案树进行随机一致游走。 如果在某个时刻出现死胡同,它会回溯到前一个节点并遵循另一条路线。 直到找到解决方案或各个节点的候选数量耗尽。
一致性部分确保找到的解决方案确实是一个解决方案,而随机部分能够在不同的执行中产生不同的解决方案,并且平均而言具有更好的性能。
附言。 所有这些(以及其他)都是在纯 JavaScript(具有并行处理和所见即所得)功能
PS2 中实现的。 该算法可以轻松并行化,以便同时产生多个(不同的)解决方案
希望这有帮助
Although this is an older question, will attempt an answer based on similar work i have done.
There are many approaches to solving constraint problems (which generallay are in NPC complexity class).
This is related to combinatorial optimization and constraint programming. In this case the constraints are the geometry of the grid and the requirement that words are unique etc..
Randomization/Annealing approaches can also work (although within the proper setting).
Efficient simplicity might just be the ultimate wisdom !
The requirements were for a more or less complete crossword compiler and (visual WYSIWYG) builder.
Leaving aside the WYSIWYG builder part, the compiler outline was this:
Load the available wordlists (sorted by word length, ie 2,3,..,20)
Find the wordslots (ie grid words) on the user-constructed grid (eg word at x,y with length L, horizontal or vertical) ( complexity O(N) )
Compute the intersecting points of the grid words (that need to be filled) ( complexity O(N^2) )
Compute the intersections of the words in the wordlists with the various letters of the alphabet used (this allows to search for matching words by using a template eg. Sik Cambon thesis as used by cwc ) ( complexity O(WL*AL) )
Steps .3 and .4 allow to do this task:
a. The intersections of the grid words with themselves enable to create a "template" for trying to find matches in the associated wordlist of available words for this grid word (by using the letters of other intersecting words with this word which are already filled at a certain step of the algorithm)
b. The intersections of the words in a wordlist with the alphabet enable to find matching (candidate) words that match a given "template" (eg 'A' in 1st place and 'B' in 3rd place etc..)
So with these data structures implemented the algorithm used was sth like this:
NOTE: if the grid and the database of words are constant the previous steps can just be done once.
First step of the algorithm is select an empty wordslot (grid word) at random and fill it with a candidate word from its associated wordlist (randomization enables to produce different solutons in consecutive executions of the algorithm) ( complexity O(1) or O(N) )
For each still empty word slots (that have intersections with already filled wordslots), compute a constraint ratio (this can vary, sth simple is the number of available solutions at that step) and sort the empty wordslots by this ratio ( complexity O(NlogN) or O(N) )
Loop through the empty wordslots computed at previous step and for each one try a number of cancdidate solutions (making sure that "arc-consistency is retained", ie grid has a solution after this step if this word is used) and sort them according to maximum availability for next step (ie next step has a maximum possible solutions if this word is used at that time in that place, etc..) ( complexity O(N*MaxCandidatesUsed) )
Fill that word (mark it as filled and go to step 2)
If no word found that satisfies the criteria of step .3 try to backtrack to another candidate solution of some previous step (criteria can vary here) ( complexity O(N) )
If backtrack found, use the alternative and optionally reset any already filled words that might need reset (mark them as unfilled again) ( complexity O(N) )
If no backtrack found, the no solution can be found (at least with this configuration, initial seed etc..)
Else when all wordlots are filled you have one solution
This algorithm does a random consistent walk of the solution tree of the problem. If at some point there is a dead end, it does a backtrack to a previous node and follow another route. Untill either a solution found or number of candidates for the various nodes are exhausted.
The consistency part makes sure that a solution found is indeed a solution and the random part enables to produce different solutions in different executions and also on the average have better performance.
PS. all this (and others) were implemented in pure JavaScript (with parallel processing and WYSIWYG) capability
PS2. The algorithm can be easily parallelized in order to produce more than one (different) solution at the same time
Hope this helps
为什么不直接使用随机概率方法呢? 从一个单词开始,然后重复选择一个随机单词,并尝试将其适应拼图的当前状态,而不打破大小等限制。如果失败,就重新开始。
您会惊讶地发现,像这样的蒙特卡罗方法经常有效。
Why not just use a random probabilistic approach to start with. Start with a word, and then repeatedly pick a random word and try to fit it into the current state of the puzzle without breaking the constraints on the size etc.. If you fail, just start all over again.
You will be surprised how often a Monte Carlo approach like this works.
这是一些基于 nickf 的答案和 Bryan 的 Python 代码的 JavaScript 代码。 只是发布它以防其他人在 js 中需要它。
Here is some JavaScript code based on nickf's answer and Bryan's Python code. Just posting it in case someone else needs it in js.
我会生成两个数字:长度和拼字游戏得分。 假设拼字游戏得分低意味着加入更容易(低分=大量常见字母)。 按长度降序和拼字游戏得分升序对列表进行排序。
接下来,只需沿着列表向下查看即可。 如果该单词不与现有单词交叉(分别根据每个单词的长度和拼字游戏分数检查),则将其放入队列中,并检查下一个单词。
冲洗并重复,这应该会生成一个填字游戏。
当然,我很确定这是 O(n!) 并且不能保证为您完成填字游戏,但也许有人可以改进它。
I'd generate two numbers: Length and Scrabble score. Assume that a low Scrabble score means it's easier to join on (low scores = lots of common letters). Sort the list by length descending and Scrabble score ascending.
Next, just go down the list. If the word doesn't cross with an existing word (check against each word by their length and Scrabble score, respectively), then put it into the queue, and check the next word.
Rinse and repeat, and this should generate a crossword.
Of course, I'm pretty sure that this is O(n!) and it's not guaranteed to complete the crossword for you, but perhaps somebody can improve it.
我一直在思考这个问题。 我的感觉是,要创建一个真正密集的填字游戏,你不能指望有限的单词列表就足够了。 因此,您可能需要将字典放入“trie”数据结构中。 这将使您可以轻松找到填补剩余空间的单词。 在 trie 中,实现遍历(例如给出“c?t”形式的所有单词)是相当有效的。
因此,我的总体想法是:创建某种相对强力的方法,如此处描述的一些方法来创建低密度交叉,并用字典单词填充空白。
如果其他人采取了这种方法,请告诉我。
I've been thinking about this problem. My sense is that to create a truly dense crossword, you cannot hope that your limited word list is going to be enough. Therefore, you might want to take a dictionary and place it into a "trie" data structure. This will allow you to easily find words that fill the left over spaces. In a trie, it is fairly efficient to implement a traversal that, say, gives you all words of the form "c?t".
So, my general thinking is: create some sort of relatively brute force approach as some described here to create a low-density cross, and fill in the blanks with dictionary words.
If anyone else has taken this approach, please let me know.
我正在玩填字游戏生成器引擎,我发现这是最重要的:
0.
!/usr/bin/python
a.
allwords.sort(key=len,reverse=True)
b. 制作一些像光标这样的项目/对象,它将围绕矩阵走动以便于定位
除非您想稍后通过随机选择进行迭代。
首先,拿起第一对并将它们放在 0,0 的横向和下方;
将第一个存储为我们当前的填字游戏“领导者”。
按顺序对角线或以更大对角线概率随机移动光标到下一个空单元格
按
迭代单词,如并使用可用空间长度定义最大字长:
<代码>
温度=[]
对于范围内的 w_size( len( w_space ), 2, -1 ) :
# t
for w in [ if len(word) == w_size ] 中的字对字:
#
如果 w 不在 temp 中并且 putTheWord( w, w_space ) :
#
临时.追加(w)
将单词与我使用的可用空间进行比较 ie :
在每个成功使用的单词后,改变方向。
当所有单元格都被填充时循环,或者你用完单词或者通过迭代限制然后:
# CHANGE ALL WORDS LIST
inexOf1stWord = allwords.index(leading_w )
allwords = allwords[:inexOf1stWord+1][:] + allwords[inexOf1stWord+1:][:]
...并再次迭代新的填字游戏。
通过填写的难易程度和一些估计计算来制定评分系统。
为当前填字游戏给出分数,并通过将其附加到来缩小以后的选择范围
如果您的评分系统满足分数,则制作填字游戏列表。
在第一次迭代会话之后,再次从制作的填字游戏列表中迭代以完成工作。
在第一次迭代
通过使用更多参数,速度可以大大提高。
I was playing around crosswords generator engine, and I found this the most important :
0.
!/usr/bin/python
a.
allwords.sort(key=len, reverse=True)
b. make some item/object like cursor which will walk around matrix for easy orientation
unless you want to iterate by random choice later on.
the first, pick up first pair and place them across and down from 0,0 ;
store the first one as our current crossword 'leader'.
move cursor by order diagonal or random with greater diagonal probability to next empty cell
iterate over the words like and use free space length to define max word length :
temp=[]
for w_size in range( len( w_space ), 2, -1 ) :
# t
for w in [ word for word in allwords if len(word) == w_size ] :
#
if w not in temp and putTheWord( w, w_space ) :
#
temp.append( w )
to compare word against free space I used i.e. :
after each successfully used word, change direction.
Loop while all cells are filled OR you run out of words OR by limit of iterations then :
# CHANGE ALL WORDS LIST
inexOf1stWord = allwords.index( leading_w )
allwords = allwords[:inexOf1stWord+1][:] + allwords[inexOf1stWord+1:][:]
...and iterate again new crossword.
Make the scoring system by easiness of filling, and some estimation calcs.
Give score for the current crossword and narrow later choice by append it into
list of made crosswords if the score is satisfied by your scoring system.
After first iteration session iterate again from the list of made crosswords to finish the job.
By using more parameters speed can be improved by a huge factor.
该项目作为哈佛大学AI CS50 课程中的一个项目出现。 这个想法是将填字游戏生成问题表述为约束满足问题,并通过不同启发式的回溯来解决它,以减少搜索空间。
首先,我们需要几个输入文件:
`
`
输入词汇(单词列表/词典),从中选择候选单词(如下所示)。
<代码>一个
放弃
能力
有能力的
流产
关于
多于
国外
缺席
绝对
绝对地
...
现在,CSP 的定义和求解如下:
下面显示了使用 CSP 求解算法的实现获得的输出:
`
以下动画显示了回溯步骤:
这是另一张包含孟加拉语(孟加拉语)语言单词列表的图片:
< a href="https://i.sstatic.net/RveHK.gif" rel="nofollow noreferrer">
This one appears as a project in the AI CS50 course from Harvard. The idea is to formulate the crossword generation problem as a constraint satisfaction problem and solve it with backtracking with different heuristics to reduce the search space.
To start with we need couple of input files:
`
`
An input vocabulary (word list / dictionary) from which the candidate words will be chosen (like the one shown follows).
a
abandon
ability
able
abortion
about
above
abroad
absence
absolute
absolutely
...
Now the CSP is defined and to be solved as follows:
The following shows the output that was obtained using an implementation of the CSP solving algorithm:
`
The following animation shows the backtracking steps:
Here is another one with a Bangla (Bengali) language word-list:
我会得到每个单词所使用的每个字母的索引,以了解可能的交叉。 然后我会选择最大的单词并将其用作基础。 选择下一个大的并划掉它。 冲洗并重复。 这可能是一个NP问题。
另一个想法是创建一种遗传算法,其中强度度量是您可以在网格中放入多少单词。
我发现最困难的部分是何时知道某个列表不可能被跨越。
I would get an index of each letter used by each word to know possible crosses. Then I would choose the largest word and use it as base. Select the next large one and cross it. Rinse and repeat. It's probably an NP problem.
Another idea is creating a genetic algorithm where the strength metric is how many words you can put in the grid.
The hard part I find is when to know a certain list cannot possibly be crossed.
我已经为这个问题编写了一个 JavaScript/jQuery 解决方案:
示例演示:http://www.earthfluence.com/crossword-puzzle-demo.html
源代码:https://github.com/HoldOffHunger/jquery-crossword-puzzle-generator
我使用的算法的目的:
我将描述我使用的算法:
根据具有共同字母的单词将单词分组在一起。
从这些组中,构建新数据结构(“单词块”)的集合,该结构是一个主要单词(贯穿所有其他单词),然后是其他单词(贯穿主要单词)。
开始填字游戏时,第一个单词块位于填字游戏的左上角。
开始填字
对于其余的单词块,从填字游戏最右下角的位置开始,向上和向左移动,直到没有更多可用的空位来填充。 如果向上的空列多于向左的空列,则向上移动,反之亦然。
I have coded up a JavaScript/jQuery solution to this problem :
Sample Demo: http://www.earthfluent.com/crossword-puzzle-demo.html
Source Code: https://github.com/HoldOffHunger/jquery-crossword-puzzle-generator
The intent of the algorithm I used:
I will describe the algorithm I used:
Group the words together according to those that share a common letter.
From these groups, build sets of a new data structure ("word blocks"), which is a primary word (that runs through all other words) and then the other words (that run through the primary word).
Start out the crossword puzzle with the very first of these word blocks in the very top-left position of the crossword puzzle.
For the rest of the word blocks, starting from the right-bottom most position of the crossword puzzle, move upward and leftward, until there are no more available slots to fill. If there are more empty columns upwards than leftwards, move upwards, and vice versa.
我想出了一个可能不是最有效的解决方案,但它效果很好。 基本上:
这是一个可行但通常很糟糕的填字游戏。 我对上面的基本配方做了一些修改,以获得更好的结果。
I came up with a solution which probably isn't the most efficient, but it works well enough. Basically:
This makes a working, yet often quite poor crossword. There were a number of alterations I made to the basic recipe above to come up with a better result.
我最近刚刚用 Python 编写了自己的代码。 您可以在这里找到它: http://bryanhelmig.com/python-crossword-puzzle-generator/ 。 它不会创建密集的《纽约时报》风格的填字游戏,而是创建您可能在儿童拼图书中找到的填字游戏风格。
与我在那里发现的一些算法不同,这些算法实现了像一些人建议的那样放置单词的随机蛮力方法,我尝试在单词放置时实现一种稍微聪明的蛮力方法。 这是我的过程:
到最后,你就会得到一个不错的纵横字谜或单词搜索谜题,因为它们大致相同。 它往往运行得相当好,但如果您有任何改进建议,请告诉我。 更大的网格运行速度呈指数级下降; 更大的单词列表呈线性。 更大的单词列表也更有可能获得更好的单词位置数。
I just recently wrote my own in Python. You can find it here: http://bryanhelmig.com/python-crossword-puzzle-generator/. It doesn't create the dense NYT style crosswords, but the style of crosswords you might find in a child's puzzle book.
Unlike a few algorithms I found out there that implemented a random brute-force method of placing words like a few have suggested, I tried to implement a slightly smarter brute-force approach at word placement. Here's my process:
By the end, you have a decent crossword puzzle or word search puzzle, since they are about the same. It tends to run rather well, but let me know if you have any suggestions on improvement. Bigger grids run exponentially slower; bigger word lists linearly. Bigger word lists also have a much higher chance at better word placement numbers.
实际上,我大约十年前编写了一个填字游戏生成程序(它很神秘,但相同的规则适用于普通的填字游戏)。
它有一个单词列表(以及相关的线索)存储在一个文件中,按迄今为止的使用情况降序排序(以便较少使用的单词位于文件的顶部)。 一个模板,基本上是一个代表黑色和自由方块的位掩码,是从客户提供的池中随机选择的。
然后,对于拼图中的每个不完整的单词(基本上找到第一个空白方块,看看右边的(跨词)或下面的(下词)是否也是空白),进行搜索文件寻找第一个合适的单词,同时考虑该单词中已有的字母。 如果没有合适的单词,您只需将整个单词标记为不完整并继续。
最后是一些未完成的单词,编译器必须填写这些单词(如果需要,可以将单词和线索添加到文件中)。 如果他们无法提出任何想法,他们可以手动编辑填字游戏以更改约束或只是要求完全重新生成。
一旦单词/线索文件达到一定大小(并且每天为该客户添加 50-100 条线索),很少有需要对每个填字游戏进行超过两到三个手动修复的情况。
I actually wrote a crossword generation program about ten years ago (it was cryptic but the same rules would apply for normal crosswords).
It had a list of words (and associated clues) stored in a file sorted by descending usage to date (so that lesser-used words were at the top of the file). A template, basically a bit-mask representing the black and free squares, was chosen randomly from a pool that was provided by the client.
Then, for each non-complete word in the puzzle (basically find the first blank square and see if the one to the right (across-word) or the one underneath (down-word) is also blank), a search was done of the file looking for the first word that fitted, taking into account the letters already in that word. If there was no word that could fit, you just marked the whole word as incomplete and moved on.
At the end would be some uncompleted words which the compiler would have to fill in (and add the word and a clue to the file if desired). If they couldn't come up with any ideas, they could edit the crossword manually to change constraints or just ask for a total re-generation.
Once the word/clue file got up to a certain size (and it was adding 50-100 clues a day for this client), there was rarely a case of more than two or three manual fix ups that had to be done for each crossword.
此算法创建 50 个密集的 6x9 箭头填字游戏 在 60 秒内完成。 它使用单词数据库(带有单词+提示)和板数据库(带有预配置的板)。
更大的单词数据库大大减少了生成时间,并且某些类型的板更难填充! 更大的棋盘需要更多时间才能正确填写!
示例:
预配置的 6x9 板:
(# 表示一个单元格中有一个提示,% 表示一个单元格中有两个提示,箭头未显示)
生成的 6x9 板:
提示 [行,列]:
This algorithm creates 50 dense 6x9 arrow crosswords in 60 seconds. It uses a word database (with word+tips) and a board database (with pre-configured boards).
A bigger word database decreases generation time considerably and some kind of boards are harder to fill! Bigger boards require more time to be filled correctly!
Example:
Pre-Configured 6x9 Board:
(# means one tip in one cell, % means two tips in one cell, arrows not shown)
Generated 6x9 Board:
Tips [line,column]: