Python-代码优化帮助-查找单词的所有字典有效的字谜
我用这个(极其低效的方法)解决了这个问题:
def createList(word, wordList):
#Made a set, because for some reason permutations was returning duplicates.
#Returns all permutations if they're in the wordList
return set([''.join(item) for item in itertools.permutations(word) if ''.join(item) in wordList])
def main():
a = open('C:\\Python32\\megalist.txt', 'r+')
wordList = [line.strip() for line in a]
maximum = 0
length = 0
maxwords = ""
for words in wordList:
permList = createList(words, wordList)
length = len(permList)
if length > maximum:
maximum = length
maxwords = permList
print (maximum, maxwords)
花了大约 10 分钟才找到具有最多字典有效字谜的五个字母单词。我想在没有字母限制的单词上运行这个,但这会花费大量的时间。有什么办法可以优化这个吗?
I solved the problem using this (horribly inefficient method):
def createList(word, wordList):
#Made a set, because for some reason permutations was returning duplicates.
#Returns all permutations if they're in the wordList
return set([''.join(item) for item in itertools.permutations(word) if ''.join(item) in wordList])
def main():
a = open('C:\\Python32\\megalist.txt', 'r+')
wordList = [line.strip() for line in a]
maximum = 0
length = 0
maxwords = ""
for words in wordList:
permList = createList(words, wordList)
length = len(permList)
if length > maximum:
maximum = length
maxwords = permList
print (maximum, maxwords)
It took something like 10 minutes to find the five-letter word that has the most dictionary-valid anagrams. I want to run this on words without a letter constraint, but it would take a ludicrous amount of time. Is there anyway to optimize this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
以下内容似乎在小型词典上可以正常工作。通过对单词中的字母进行排序,可以很容易地测试两个单词是否是字谜词。从这个起点开始,只需以某种方式积累单词即可。修改它以报告所有匹配项,而不仅仅是第一个匹配项并不难。
如果您确实需要添加对字母数量的限制,那么使用迭代器是过滤掉某些单词的便捷方法。
编辑:
响应评论,
hash(sortedWord)
是一种节省空间的优化,在这种情况下可能为时过早,将sortedWord减少回整数,因为我们并不真正关心密钥,只要我们总能唯一地恢复所有相关的字谜词即可。仅使用sortedWord
作为键同样有效。max
的key
关键字参数可让您根据谓词查找集合中的最大元素。因此,语句maxKey = max( d.keys(), key = lambda k : len(d[k]) )
是一个用于回答查询的简洁 Python 表达式,给定中的键字典中,哪个键具有最大长度的关联值?。在该上下文中对 max 的调用可以写成(更详细地)为 valueWithMaximumLength(d) ,其中该函数定义为:This following seems to work OK on a smallish dictionary. By sorting the letters in the word, it becomes easy to test if two words are an anagram. From that starting point, it's just a matter of accumulating the words in some way. It wouldn't be hard to modify this to report all matches, rather than just the first one
If you do need to add constraints on the number of letters, the use of iterators is a convenient way to filter out some words.
Edit:
In response to the comment, the
hash(sortedWord)
, is a space saving optimization, possibly premature in this case, to reduce sortedWord back to an integer, because we don't really care about the key, so long as we can always uniquely recover all the relevant anagrams. It would have been equally valid to just usesortedWord
as the key.The
key
keyword argument tomax
lets you find the maximum element in a collection based on a predicate. So the statementmaxKey = max( d.keys(), key = lambda k : len(d[k]) )
is a succinct python expression for answering the query, given the keys in the dictionary, which key has the associated value with maximum length?. That call tomax
in that context could have been written (much more verbosely) asvalueWithMaximumLength(d)
where that function was defined as:wordList
应该是一个集合
。测试列表中的成员资格要求您扫描列表中的所有元素,检查它们是否是您生成的单词。测试集合中的成员资格(平均而言)并不取决于集合的大小。
下一个明显的优化是,一旦测试了一个单词,您就可以从
wordList
中删除它的所有排列,因为它们将在createList
中给出完全相同的集合。如果一切都用set
完成,那么这是一个非常简单的操作——实际上,您只需使用二进制减号即可。wordList
should be aset
.Testing membership in a list requires you to scan through all the elements of the list checking whether they are the word you have generated. Testing membership in a set does not (on average) depend on the size of the set.
The next obvious optimisation is that once you have tested a word you can remove all its permutations from
wordList
, since they will give exactly the same set increateList
. This is a very easy operation if everything is done withset
s -- indeed, you simply use a binary minus.没有必要进行所有排列,这会浪费内存和 CPU。
因此,首先您的字典应该保存在二叉树中,如下所示:
就是这样:) 这个算法应该运行得更快。
编辑:
检查 bintrees 包以避免在二叉树实现上花费时间。
There is no need to do ALL permutations, - it's a waste of memory and CPU.
So, first of all your dictionary should be kept in a binary tree, like this:
That's it :) This algorithm should work much faster.
EDIT:
Check bintrees package to avoid spending time on binary tree implementation.