需要帮助加快排列速度
这是我的工作代码,我正在尝试找到方法以使其更快地找到有效单词,我正在考虑可能为每个单词制作单独的字典列表,你们觉得怎么样?
import random
import itertools
file_name='words.txt'
def load_words():
try:
f=open(file_name,'r')
str1=f.read()
f.close()
except:
print('Problem opening the file',file_name)
list1=[]
list1=str1.split()
return(list1)
def is_valid(str1,list1):
valid=False
if str1 in list1:
valid=True
return valid
def generate(words,letters):
answers=[]
for length in range(2,len(letters)+1):
for x in itertools.permutations(letters,length):
word=''
for let in x:
word+=let
if is_valid(word.upper(),words):
answers.append(word)
print(word)
print(answers)
def main():
words=load_words()
letters = input('Enter your letters')
answers = generate(words,letters)
main()
Here is my working code, I am trying to find ways to make it faster in finding the valid words, I was thinking about possibly making separate dictionary lists for each words, what do yall think?
import random
import itertools
file_name='words.txt'
def load_words():
try:
f=open(file_name,'r')
str1=f.read()
f.close()
except:
print('Problem opening the file',file_name)
list1=[]
list1=str1.split()
return(list1)
def is_valid(str1,list1):
valid=False
if str1 in list1:
valid=True
return valid
def generate(words,letters):
answers=[]
for length in range(2,len(letters)+1):
for x in itertools.permutations(letters,length):
word=''
for let in x:
word+=let
if is_valid(word.upper(),words):
answers.append(word)
print(word)
print(answers)
def main():
words=load_words()
letters = input('Enter your letters')
answers = generate(words,letters)
main()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
首先,分析代码。这会告诉你慢的部分在哪里。
其次,您可能会考虑将单词列表转换为集合,该集合应该有一个更快的“in”运算符来检查单词是否存在。
第三,考虑简化代码,删除不必要的语句,例如。
Firstly, profile the code. That will tell you where the slow parts are.
Secondly, you might consider converting the words list to a set, which should have a faster 'in' operator for checking if the word is there.
Thirdly, consider simplifying the code to remove unnecessary statements, eg.
将
list1
更改为集合:如果您在 set1 中测试
str1
将比list1
中的str1
快快得多经常进行测试,列表很长。Change your
list1
to a set:The testing
str1 in set1
will be much faster thanstr1 in list1
if you do frequent tests and the list is long.如果您太热衷于以降低可读性为代价来提高速度,您可以尝试以下
列表理解比循环更快。因此,将两个循环合并为一个理解。除此之外,该语句
可以组合为 if
is_valid(''.join(x).upper,words)
甚至''.join(x).upper in Words
,记住函数调用的成本很高。我做了速度比较,看起来列表理解运行速度快了 50%。
现在由你决定
If you are too keen in increasing the speed at the cost of making it less readable you may try the following
List comprehension is faster than loops. So combined both the loops to a single comprehension. Apart from that the statement
can be combined to if
is_valid(''.join(x).upper,words)
or even''.join(x).upper in words
, remember function call is costly.I have done a comparison in speed and looks list comprehension is running 50% faster.
Its now upto you to decide
尝试将内循环替换为:
Try replacing the inner loop with:
问题在于您的算法基本上是 O(n * m!),其中 n 是单词列表大小,m 是字母数。将单词列表更改为集合应该可以使搜索时间记录下来,并将性能提高到 O( log(n) * m!),正如其他人在此处所建议的那样。
然而,真正的性能提升将来自于完全消除搜索字母的排列。首先按字母顺序对单词列表中每个单词中的字母进行排序;它应该花费 O( n * p log(p) ) 时间,其中 p 是平均字长。然后在 O( n * log(n) ) 时间内按字母顺序对整个列表进行排序。还要跟踪原始单词,这样您就可以在 O(1) 内从排序后的单词列表中的字符串转到原始单词。接下来按字母顺序对输入的字母进行排序,并在排序的单词列表中搜索它们。
上述算法中最慢的操作是对按字母顺序排序的字符串列表进行排序,其时间复杂度为 O(n Log(n) )。搜索这样的列表是 Log( n ) 并导致整个算法在 O( n Log(n) ) 时间内执行。它应该线性缩放到 m(输入的字母数)。
实现留给读者。
The issue is with your algorithm is basically O(n * m!) where n is the word list size and m is number of letters. Changing the word list to a set should make the searches log time and improve performance to O( log(n) * m!) as recommended by other people here.
The real performance gain will however come from completely eliminating permuting the letters for the search. First sort the letters in each individual word in the word list alphabetically; it should take O( n * p log(p) ) time where p is average word length. Then sort the entire list alphabetically in O( n * log(n) ) time. Keep track of the original words as well, so that you can go from a string in the sorted word list to the original word in O(1). Next sort the imputed letters alphabetically and search for them in the sorted word list.
The slowest operation in the above algorithm is sorting the list of alphabetically sorted strings, which is O(n Log(n) ). Searching such a list is Log( n ) and results in the entire algorithm executing in O( n Log(n) ) time. It should scale linearly to m, the number of letters inputed.
Implementation is left to the reader.
你到底想用这个来实现什么目的?看起来您已经有一些正在阅读的有效单词词典。为什么要排列可以根据用户给出的输入构建的所有单词组合?
关于你的算法,你需要考虑一些事情。您创建的每个排列都会迭代字典中的每个已知单词(列表 1)。当您创建单词的所有排列时,您正在创建m!单词,其中 m 个由用户给出的字母。
你基本上有 O(n * m!)。即使对于像 7 这样的少量事物来说,这也是非常巨大的。通过使用集合而不是列表,您可以将 n 项减少到一个常数,这会将您的算法更改为 O(m!),但仍然太大。 如果我必须猜测这个算法在做什么,我会说你想知道你可以根据给定的字母创建多少个已知单词。你又没有这么说,但我们假设这就是你的意思。
更快的算法是迭代字典中的每个单词,看看是否可以通过从输入中选取字母来组成该单词。所以你只需遍历字典一次 O(n * m) 。这消除了对输入进行排列的需要。算法如下:
抱歉,我的 python 有点生疏,但希望你能明白。
What exactly are you trying to accomplish with this? It looks like you have some dictionary of valid words you are reading from already. Why are you permutating all combination of words that could be built from the input given by the user?
Something you need to consider about your algorithm. Every permutation you create you are iterating over every known word in your dictionary (list1). When you create all permutations of the words you are creating m! words where m number of the letters given by the user.
You basically have O(n * m!). That's ridiculously huge for even small number of things like 7. By using a set, instead of a list, you can take that n term and reduce it to a constant which changes your algorithm to O(m!) which is still too big. If I had to guess what this algorithm is doing I'd say you want to know how many known words you can create out of the letters you are given. Again you didn't say that, but let's assume this is what you meant.
A faster algorithm would be to iterate over every word in your dictionary and see if you can make that word by picking letters from the input. So you only walk over the dictionary once O(n * m). That eliminates the need to permutate your input. Here's the algorithm:
Sorry my python is a little rusty but hopefully you'll get the idea.
如果您打算经常查找单词,则应该根据您的数据构建树 。
简单的例子如下。代码应该是非常不言自明的,但如果有不清楚的地方请询问。
(树有很多不同类型,这只是一个非常简单的例子)
当树构建完成后,只需要
len(word)
函数调用确定输入的单词是否有效。这是对if word in wordlist
的巨大改进,后者需要O(len(wordlist))
。这种方法的缺点是生成树可能需要一些时间。通过使用 pickle 序列化
Tree()
对象,您不必在每次启动脚本时都构建树。我尝试使用 SIL International 的单词列表(总共 109582 个单词)构建一棵树。
使用 timeit 进行计时,当 unpickle 时,执行时间减少了约 50%目标文件而不是从头开始构建字典。
如果您只想检查排列,则应该更改
Tree()
的add_word()
方法以首先对字母进行排序。Tree.is_word()
的输入参数当然也应该排序。If you plan to look up words often, you should build a tree from your data.
Simple example follows. The code should be pretty self-explanatory, but please ask if something is unclear.
(There are a lot of different types of trees, this is just a very simple example)
When the tree has been built, it will only require
len(word)
function calls to determine if the entered word is valid. This is a huge improvement fromif word in wordlist
, which takesO(len(wordlist))
.The downside with an approach like this, is that generating the tree may take some time. By serializing the
Tree()
object using pickle, you don't have to build the tree each time you start your script.I tried to build a tree using a wordlist from SIL International (total of 109582 words).
Timing it with timeit, the execution time was reduced by ~50 % when unpickle the object file rather than build the dict from scratch.
If you'd like to only check for permutations, you should alter the
add_word()
method ofTree()
to sort the letters in the first place. Input argument toTree.is_word()
should also of course be sorted.