python 中的整数比较使一切变得缓慢
下面的代码让我非常头疼:
def extract_by_letters(letters, dictionary):
for word in dictionary:
for letter in letters:
if word.count(letter) != letters.count(letter):
if word in dictionary: #I cant leave this line out
dictionary.remove(word)
return dictionary
首先:“if word indictionary”行。为什么我不能忽略这个?我收到一条错误消息 ValueError: list.remove(x): x not in list
但显然是这样。
第二:字典是一个由换行符分隔的大约 50,000 个单词的文件。上面的代码大约需要 2 分钟才能运行……太长了。我玩了一下代码,发现这一行:
if word.count(letter) != letters.count(letter):
似乎导致了我所有的问题。如果我去掉该行(这会完全搞乱输出),该函数需要大约 2 秒的时间来浏览字典。
我以为是计数函数,但事实并非如此。
如果我将 if 语句更改为:
print word.count(letter)
print letters.count(letter)
该函数运行大约需要 3 秒。
我确信这就是对比。还有其他建议吗?有更好的方法吗?
提前致谢!
The following code is causing me enormous headaches:
def extract_by_letters(letters, dictionary):
for word in dictionary:
for letter in letters:
if word.count(letter) != letters.count(letter):
if word in dictionary: #I cant leave this line out
dictionary.remove(word)
return dictionary
First of all: the 'if word in dictionary' line. Why can't I leave this out? I get an error saying ValueError: list.remove(x): x not in list
But it is, obviously.
Second: Dictionary is a file of about 50,000 words separated by linebreaks. The above code takes about 2 minutes to run... Wayyy too long. I played with the code for a bit, and I found that the line:
if word.count(letter) != letters.count(letter):
seems to be causing all my problems. If I take that line out (which totally screws up the output), the function takes about 2 seconds to go through the dictionary.
I thought it was the count functions, but it's not.
if I change the if statement to something like:
print word.count(letter)
print letters.count(letter)
the function takes about 3 seconds to run.
I'm convinced it's the comparison. Any other suggestions? Is there a better way to do this?
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
出现异常的原因是,如果字母计数匹配多个字母,则您将多次尝试删除该单词。速度
如此之慢的原因是循环内循环。
如果您写一两句话来说明该函数应该做什么,那么重构它就会容易得多。同时,一旦您已经删除了某个单词,这将阻止您检查是否需要删除该单词。
如果字典是一个
集合
,你应该会得到一些速度的提升。如果字典是一个列表,这应该会带来巨大的加速The reason you get the exception is that if the letter count matches for more than one letter, you are trying to remove the word more than once
The reason it is so slow is that you have loops inside loops inside loops.
If you would write a sentence or two about what the function is supposed to do, it would be much easier to refactor it. In the mean time, this would stop you checking whether a word needs to be removed once you have already removed it.
If dictionary is a
set
you should get some speed up. If dictionary is alist
this should give a huge speedup尝试构建输出而不是从中删除:
或者,您可以使用正则表达式:
或者,也许是最简单的方法:
最后一个方法不需要花费明显的时间来扫描我的 Mac 上的 /usr/share/dict/words,这是一个列表共 234936 个字。
Try building the output instead of deleting from it:
Or, you could use regular expressions:
Or, perhaps the simplest way:
This last one takes no noticeable time to scan /usr/share/dict/words on my Mac, which is a list of 234936 words.
这是一个应该提供主要加速的函数:
这是我所做的优化:
制作了字母及其相应频率的字典。这样,当您只需要执行一次此过程并存储结果时,您就不会一遍又一遍地使用 letter.count。
我不是从字典中删除单词,而是将它们添加到从函数返回的数组中。如果你有一本很大的字典,很可能只有几个单词可以匹配。另外,如果字典变量是一个数组(我怀疑),那么每次调用remove时,它都必须首先在字典中搜索单词(从头开始线性),然后删除它。通过使用要删除的单词的索引弹出来删除速度更快。
每当发现不匹配时,就跳出循环检查字母计数。当我们已经有了答案时,这可以防止我们进行不必要的检查。
我不确定 letters 变量中是否有重复的字母,所以我确保它可以通过使用 letdict 来处理这个问题。如果您之前的字母变量中有重复的字母,那么您会重复检查单词中这些字母的计数。
Here's a function that should offer a major speed-up:
Here are the optimizations I did:
Made a dictionary of the letters with their corresponding frequencies. This way, you aren't using letters.count over and over again when you only need to do this process once and store the results.
Rather than removing the words from dictionary, I add them to an array that is returned from the function. If you have a huge dictionary, chances are that only a few words will match. Also, if the dictionary variable is an array (which I suspect), then every time you called remove it had to first search for the word in dictionary (linearly starting at the beginning) and then remove it. It is faster to remove by popping using the index of the word to be removed.
Breaking out of the loop checking the letter counts whenever a mismatch is found. This prevents us from doing needless checks when we already have our answer.
I wasn't sure if you had repeated letters in the letters variable or not so I made sure that it could handle that by using letdict. If you had letters repeated in your letters variable before, then you were checking the counts of those letters in the word repeatedly.
输出:
Output:
您想找到“字母”的所有字谜吗?
You're trying to find all anagrams of 'letters'?