无法将 Ruby 字母中的英语单词组合起来
我需要找到所有可以由字符串中的字母组成的英语单词
sentence="Ziegler's Giant Bar"
我可以通过
sentence.split(//)
如何从 Ruby 中的句子中组成超过 4500 个英语单词?
[编辑]
最好将问题分成几个部分:
- 仅制作一个包含 10 个或更少字母的单词数组,
- 较长的单词可以单独查找
I need to find all English words which can be formed from the letters in a string
sentence="Ziegler's Giant Bar"
I can make an array of letters by
sentence.split(//)
How can I make more than 4500 English words from the sentence in Ruby?
[edit]
It may be best to split the problem into parts:
- to make only an array of words with 10 letters or less
- the longer words can be looked up separately
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
[假设您可以重复使用一个单词中的源字母]:对于字典列表中的每个单词,构造两个字母数组 - 一个用于候选单词,一个用于输入字符串。 从单词 array-of-letters 中减去输入的 array-of-letters,如果没有留下任何字母,则表示匹配。 执行此操作的代码如下所示:
您可以像这样从 irb 调试器调用该函数:
...或者这里有一个包装器,您可以使用它从脚本交互地调用该函数:
在 Mac 上运行此函数时,输出如下所示这:
这远远超过 4500 个单词,但这是因为 Mac 单词词典非常大。 如果您想准确地重现 Knuth 的结果,请从此处下载并解压 Knuth 的字典: http ://www.packetstormsecurity.org/Crackers/wordlists/dictionaries/knuth_words.gz 并将“/usr/share/dict/words”替换为解压替代目录的路径。 如果你做对了,你会得到 4514 个单词,最后是这个集合:
我相信这回答了原来的问题。
或者,提问者/读者可能希望列出可以从字符串构造的所有单词,而无需重复使用任何输入字母。 我建议的完成此操作的代码如下:复制候选单词,然后对于输入字符串中的每个字母,从副本中破坏性地删除该字母的第一个实例(使用“切片!”)。 如果此过程吸收了所有字母,请接受该单词。
[Assuming you can reuse the source letters within one word]: For each word in your dictionary list, construct two arrays of letters - one for the candidate word and one for the input string. Subtract the input array-of-letters from the word array-of-letters and if there weren't any letters left over, you've got a match. Code to do that looks like this:
You can call that function from the irb debugger like so:
...or here's a wrapper you could use to call the function interactively from a script:
When running this on a Mac, the output looks like this:
That is well over 4500 words, but that's because the Mac word dictionary is pretty large. If you want to reproduce Knuth's results exactly, download and unzip Knuth's dictionary from here: http://www.packetstormsecurity.org/Crackers/wordlists/dictionaries/knuth_words.gz and replace "/usr/share/dict/words" with the path to wherever you've unpacked the substitute directory. If you did it right you'll get 4514 words, ending in this collection:
I believe that answers the original question.
Alternatively, the questioner/reader might have wanted to list all the words one can construct from a string without reusing any of the input letters. My suggested code to accomplish that works as follows: Copy the candidate word, then for each letter in the input string, destructively remove the first instance of that letter from the copy (using "slice!"). If this process absorbs all the letters, accept that word.
如果你想查找字母和频率受给定短语限制的单词,
您可以构造一个正则表达式来为您执行此操作:
正向先行让您可以创建一个正则表达式来匹配字符串中某些指定模式匹配的位置,而无需消耗字符串中匹配的部分。
我们在这里使用它们来将同一字符串与单个正则表达式中的多个模式进行匹配。
仅当所有模式都匹配时,位置才匹配。
如果我们允许无限重用原始短语中的字母(就像 Knuth 根据 glenra 的评论所做的那样),那么它甚至更容易构建正则表达式:
If you want to find words whose letters and frequency thereof are restricted by the given phrase,
you can construct a regex to do this for you:
Positive lookaheads let you make a regex that matches a position in the string where some specified pattern matches without consuming the part of the string that matches.
We use them here to match the same string against multiple patterns in a single regex.
The position only matches if all our patterns match.
If we allow infinite reuse of letters from the original phrase (like Knuth did according to glenra's comment), then it's even easier to construct a regex:
我认为 Ruby 没有英语词典。 但是您可以尝试将原始字符串的所有排列存储在一个数组中,然后通过 Google 检查这些字符串? 说一个词实际上就是一个词,如果点击量超过10万次什么的?
I don't think that Ruby has an English dictionary. But you could try to store all permutations of the original string in an array, and check those strings against Google? Say that a word is actually a word, if has more than 100.000 hits or something?
你可以得到一个字母数组,如下所示:
You can get an array of letters like so: