需要帮助加快排列速度

发布于 2024-12-18 20:36:51 字数 935 浏览 0 评论 0原文

这是我的工作代码,我正在尝试找到方法以使其更快地找到有效单词,我正在考虑可能为每个单词制作单独的字典列表,你们觉得怎么样?

import random
import itertools

file_name='words.txt'

def load_words():
    try:
        f=open(file_name,'r')
        str1=f.read()
        f.close()
    except:
        print('Problem opening the file',file_name)
    list1=[]
    list1=str1.split()
    return(list1)

def is_valid(str1,list1):
    valid=False
    if str1 in list1:
        valid=True
    return valid

def generate(words,letters):
    answers=[]
    for length in range(2,len(letters)+1):
        for x in itertools.permutations(letters,length):
            word=''
            for let in x:
                word+=let
            if is_valid(word.upper(),words):
                answers.append(word)
                print(word)
    print(answers)

def main():
    words=load_words()
    letters = input('Enter your letters')
    answers = generate(words,letters)

main()

Here is my working code, I am trying to find ways to make it faster in finding the valid words, I was thinking about possibly making separate dictionary lists for each words, what do yall think?

import random
import itertools

file_name='words.txt'

def load_words():
    try:
        f=open(file_name,'r')
        str1=f.read()
        f.close()
    except:
        print('Problem opening the file',file_name)
    list1=[]
    list1=str1.split()
    return(list1)

def is_valid(str1,list1):
    valid=False
    if str1 in list1:
        valid=True
    return valid

def generate(words,letters):
    answers=[]
    for length in range(2,len(letters)+1):
        for x in itertools.permutations(letters,length):
            word=''
            for let in x:
                word+=let
            if is_valid(word.upper(),words):
                answers.append(word)
                print(word)
    print(answers)

def main():
    words=load_words()
    letters = input('Enter your letters')
    answers = generate(words,letters)

main()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

依 靠 2024-12-25 20:36:51

首先,分析代码。这会告诉你慢的部分在哪里。

其次,您可能会考虑将单词列表转换为集合,该集合应该有一个更快的“in”运算符来检查单词是否存在。

第三,考虑简化代码,删除不必要的语句,例如。

def is_valid(str1,list1):
   return str1 in list1

Firstly, profile the code. That will tell you where the slow parts are.

Secondly, you might consider converting the words list to a set, which should have a faster 'in' operator for checking if the word is there.

Thirdly, consider simplifying the code to remove unnecessary statements, eg.

def is_valid(str1,list1):
   return str1 in list1
我们的影子 2024-12-25 20:36:51

list1 更改为集合:

set1 = set(list1)

如果您在 set1 中测试 str1 将比 list1 中的 str1快得多经常进行测试,列表很长。

Change your list1 to a set:

set1 = set(list1)

The testing str1 in set1 will be much faster than str1 in list1 if you do frequent tests and the list is long.

源来凯始玺欢你 2024-12-25 20:36:51

如果您太热衷于以降低可读性为代价来提高速度,您可以尝试以下

def is_valid(str1,list1):
    return str1 in list1
words=["BAD","CAB","BEC"]
def generate2(words,letters):
    answers=[]
    [[answers.append(''.join(x).upper()) for x in itertools.permutations(letters,length) if ''.join(x).upper() in words] for length in range(2,len(letters)+1)]
    #print(answers)
    return answers

列表理解比循环更快。因此,将两个循环合并为一个理解。除此之外,该语句

       word=''
        for let in x:
            word+=let
        if is_valid(word.upper(),words):

可以组合为 if is_valid(''.join(x).upper,words) 甚至 ''.join(x).upper in Words,记住函数调用的成本很高。

我做了速度比较,看起来列表理解运行速度快了 50%。

现在由你决定


>>> stmt1="""
def is_valid(str1,list1):
    valid=False
    if str1 in list1:
        valid=True
    return valid
words=["BAD","CAB","BEC"]
def generate1(words,letters):
    answers=[]
    for length in range(2,len(letters)+1):
        for x in itertools.permutations(letters,length):
            word=''
            for let in x:
                word+=let
            if is_valid(word.upper(),words):
                answers.append(word)
                #print(word)
    #print(answers)
    return answers
generate1(words,['A','B','C','D','E'])
"""
>>> 
>>> stmt2="""
def is_valid(str1,list1):
    return str1 in list1
words=["BAD","CAB","BEC"]
def generate2(words,letters):
    answers=[]
    [[answers.append(''.join(x).upper()) for x in itertools.permutations(letters,length) if ''.join(x).upper() in words] for length in range(2,len(letters)+1)]
    #print(answers)
    return answers
generate2(words,['A','B','C','D','E'])
"""
>>> 
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> t1.repeat(number=1000)
>>> t2=timeit.Timer(stmt=stmt2)
>>> t1.repeat(number=1000)
[0.47923321640178074, 0.4353549521401874, 0.4362746333173959]
>>> t2.repeat(number=1000)
[0.2536238984591819, 0.2470974750062851, 0.24726312027155473]

If you are too keen in increasing the speed at the cost of making it less readable you may try the following

def is_valid(str1,list1):
    return str1 in list1
words=["BAD","CAB","BEC"]
def generate2(words,letters):
    answers=[]
    [[answers.append(''.join(x).upper()) for x in itertools.permutations(letters,length) if ''.join(x).upper() in words] for length in range(2,len(letters)+1)]
    #print(answers)
    return answers

List comprehension is faster than loops. So combined both the loops to a single comprehension. Apart from that the statement

       word=''
        for let in x:
            word+=let
        if is_valid(word.upper(),words):

can be combined to if is_valid(''.join(x).upper,words) or even ''.join(x).upper in words, remember function call is costly.

I have done a comparison in speed and looks list comprehension is running 50% faster.

Its now upto you to decide


>>> stmt1="""
def is_valid(str1,list1):
    valid=False
    if str1 in list1:
        valid=True
    return valid
words=["BAD","CAB","BEC"]
def generate1(words,letters):
    answers=[]
    for length in range(2,len(letters)+1):
        for x in itertools.permutations(letters,length):
            word=''
            for let in x:
                word+=let
            if is_valid(word.upper(),words):
                answers.append(word)
                #print(word)
    #print(answers)
    return answers
generate1(words,['A','B','C','D','E'])
"""
>>> 
>>> stmt2="""
def is_valid(str1,list1):
    return str1 in list1
words=["BAD","CAB","BEC"]
def generate2(words,letters):
    answers=[]
    [[answers.append(''.join(x).upper()) for x in itertools.permutations(letters,length) if ''.join(x).upper() in words] for length in range(2,len(letters)+1)]
    #print(answers)
    return answers
generate2(words,['A','B','C','D','E'])
"""
>>> 
>>> t1=timeit.Timer(stmt=stmt1)
>>> t2=timeit.Timer(stmt=stmt2)
>>> t1.repeat(number=1000)
>>> t2=timeit.Timer(stmt=stmt2)
>>> t1.repeat(number=1000)
[0.47923321640178074, 0.4353549521401874, 0.4362746333173959]
>>> t2.repeat(number=1000)
[0.2536238984591819, 0.2470974750062851, 0.24726312027155473]
怎言笑 2024-12-25 20:36:51

尝试将内循环替换为:

for x in itertools.permutations(letters,length):
    word = ''.join(x)
    if word.upper() in words:
        answers.append(word)
        print(word)

Try replacing the inner loop with:

for x in itertools.permutations(letters,length):
    word = ''.join(x)
    if word.upper() in words:
        answers.append(word)
        print(word)
乞讨 2024-12-25 20:36:51

问题在于您的算法基本上是 O(n * m!),其中 n 是单词列表大小,m 是字母数。将单词列表更改为集合应该可以使搜索时间记录下来,并将性能提高到 O( log(n) * m!),正如其他人在此处所建议的那样。

然而,真正的性能提升将来自于完全消除搜索字母的排列。首先按字母顺序对单词列表中每个单词中的字母进行排序;它应该花费 O( n * p log(p) ) 时间,其中 p 是平均字长。然后在 O( n * log(n) ) 时间内按字母顺序对整个列表进行排序。还要跟踪原始单词,这样您就可以在 O(1) 内从排序后的单词列表中的字符串转到原始单词。接下来按字母顺序对输入的字母进行排序,并在排序的单词列表中搜索它们。

上述算法中最慢的操作是对按字母顺序排序的字符串列表进行排序,其时间复杂度为 O(n Log(n) )。搜索这样的列表是 Log( n ) 并导致整个算法在 O( n Log(n) ) 时间内执行。它应该线性缩放到 m(输入的字母数)。

实现留给读者。

The issue is with your algorithm is basically O(n * m!) where n is the word list size and m is number of letters. Changing the word list to a set should make the searches log time and improve performance to O( log(n) * m!) as recommended by other people here.

The real performance gain will however come from completely eliminating permuting the letters for the search. First sort the letters in each individual word in the word list alphabetically; it should take O( n * p log(p) ) time where p is average word length. Then sort the entire list alphabetically in O( n * log(n) ) time. Keep track of the original words as well, so that you can go from a string in the sorted word list to the original word in O(1). Next sort the imputed letters alphabetically and search for them in the sorted word list.

The slowest operation in the above algorithm is sorting the list of alphabetically sorted strings, which is O(n Log(n) ). Searching such a list is Log( n ) and results in the entire algorithm executing in O( n Log(n) ) time. It should scale linearly to m, the number of letters inputed.

Implementation is left to the reader.

抠脚大汉 2024-12-25 20:36:51

你到底想用这个来实现什么目的?看起来您已经有一些正在阅读的有效单词词典。为什么要排列可以根据用户给出的输入构建的所有单词组合?

关于你的算法,你需要考虑一些事情。您创建的每个排列都会迭代字典中的每个已知单词(列表 1)。当您创建单词的所有排列时,您正在创建m!单词,其中 m 个由用户给出的字母。

你基本上有 O(n * m!)。即使对于像 7 这样的少量事物来说,这也是非常巨大的。通过使用集合而不是列表,您可以将 n 项减少到一个常数,这会将您的算法更改为 O(m!),但仍然太大。 如果我必须猜测这个算法在做什么,我会说你想知道你可以根据给定的字母创建多少个已知单词。你又没有这么说,但我们假设这就是你的意思。

更快的算法是迭代字典中的每个单词,看看是否可以通过从输入中选取字母来组成该单词。所以你只需遍历字典一次 O(n * m) 。这消除了对输入进行排列的需要。算法如下:

 user_input = input("Give me some words")
 for word in list1:
     current = user_input
     found = True
     for letter in word:
         if letter in current:
            current.remove( letter )
         else
            found = False
            break;
     if found:
        answers.add( word )
 print( answers )

抱歉,我的 python 有点生疏,但希望你能明白。

What exactly are you trying to accomplish with this? It looks like you have some dictionary of valid words you are reading from already. Why are you permutating all combination of words that could be built from the input given by the user?

Something you need to consider about your algorithm. Every permutation you create you are iterating over every known word in your dictionary (list1). When you create all permutations of the words you are creating m! words where m number of the letters given by the user.

You basically have O(n * m!). That's ridiculously huge for even small number of things like 7. By using a set, instead of a list, you can take that n term and reduce it to a constant which changes your algorithm to O(m!) which is still too big. If I had to guess what this algorithm is doing I'd say you want to know how many known words you can create out of the letters you are given. Again you didn't say that, but let's assume this is what you meant.

A faster algorithm would be to iterate over every word in your dictionary and see if you can make that word by picking letters from the input. So you only walk over the dictionary once O(n * m). That eliminates the need to permutate your input. Here's the algorithm:

 user_input = input("Give me some words")
 for word in list1:
     current = user_input
     found = True
     for letter in word:
         if letter in current:
            current.remove( letter )
         else
            found = False
            break;
     if found:
        answers.add( word )
 print( answers )

Sorry my python is a little rusty but hopefully you'll get the idea.

被你宠の有点坏 2024-12-25 20:36:51

如果您打算经常查找单词,则应该根据您的数据构建

简单的例子如下。代码应该是非常不言自明的,但如果有不清楚的地方请询问。

import pickle


class Tree:
    def __init__(self):
        self.letters = dict()

    def add_words(self, words):
        for word in words:
            self.add_word(word)

    def add_word(self, word):
        chars = list(word.lower())
        l = chars.pop(0)
        try:
            self.letters[l].add_word(chars)
        except KeyError:
            self.letters[l] = Letter(l)
            self.letters[l].add_word(chars)

    def is_word(self, word):
        chars = list(word.lower())
        l = chars.pop(0)
        try:
            return self.letters[l].is_word(chars)
        except KeyError:
            return False


class Letter:
    def __init__(self, letter):
        self.letter = letter
        self.sub_letters = dict()
        self.is_a_word = False

    def add_word(self, word):
        if len(word) == 0:
            self.is_a_word = True
            return
        l = word.pop(0)
        try:
            self.sub_letters[l].add_word(word)
        except KeyError:
            self.sub_letters[l] = Letter(l)
            self.sub_letters[l].add_word(word)

    def is_word(self, word):
        if len(word) == 0:
            return self.is_a_word
        l = word.pop(0)
        try:
            return self.sub_letters[l].is_word(word)
        except KeyError:
            return False


def get_dict(obj_file, dict_file):
    try:
        with open(obj_file, 'rb') as my_dict:
            return pickle.load(my_dict)
    except IOError:
        my_tree = Tree()
        with open(dict_file, 'rb') as in_file:
            for word in in_file:
                my_tree.add_word(word.strip())
        with open(obj_file, 'wb') as outfile:
            pickle.dump(my_tree, outfile, pickle.HIGHEST_PROTOCOL)
        return my_tree


obj_file = 'mydict.pk'
dict_file = 'wordlist.txt'
my_tree = get_dict(obj_file, dict_file)

(树有很多不同类型,这只是一个非常简单的例子)

当树构建完成后,只需要len(word)函数调用确定输入的单词是否有效。这是对 if word in wordlist 的巨大改进,后者需要 O(len(wordlist))

这种方法的缺点是生成树可能需要一些时间。通过使用 pickle 序列化 Tree() 对象,您不必在每次启动脚本时都构建树。

我尝试使用 SIL International 的单词列表(总共 109582 个单词)构建一棵树。

使用 timeit 进行计时,当 unpickle 时,执行时间减少了约 50%目标文件而不是从头开始构建字典。

如果您只想检查排列,则应该更改 Tree()add_word() 方法以首先对字母进行排序。 Tree.is_word() 的输入参数当然也应该排序。

If you plan to look up words often, you should build a tree from your data.

Simple example follows. The code should be pretty self-explanatory, but please ask if something is unclear.

import pickle


class Tree:
    def __init__(self):
        self.letters = dict()

    def add_words(self, words):
        for word in words:
            self.add_word(word)

    def add_word(self, word):
        chars = list(word.lower())
        l = chars.pop(0)
        try:
            self.letters[l].add_word(chars)
        except KeyError:
            self.letters[l] = Letter(l)
            self.letters[l].add_word(chars)

    def is_word(self, word):
        chars = list(word.lower())
        l = chars.pop(0)
        try:
            return self.letters[l].is_word(chars)
        except KeyError:
            return False


class Letter:
    def __init__(self, letter):
        self.letter = letter
        self.sub_letters = dict()
        self.is_a_word = False

    def add_word(self, word):
        if len(word) == 0:
            self.is_a_word = True
            return
        l = word.pop(0)
        try:
            self.sub_letters[l].add_word(word)
        except KeyError:
            self.sub_letters[l] = Letter(l)
            self.sub_letters[l].add_word(word)

    def is_word(self, word):
        if len(word) == 0:
            return self.is_a_word
        l = word.pop(0)
        try:
            return self.sub_letters[l].is_word(word)
        except KeyError:
            return False


def get_dict(obj_file, dict_file):
    try:
        with open(obj_file, 'rb') as my_dict:
            return pickle.load(my_dict)
    except IOError:
        my_tree = Tree()
        with open(dict_file, 'rb') as in_file:
            for word in in_file:
                my_tree.add_word(word.strip())
        with open(obj_file, 'wb') as outfile:
            pickle.dump(my_tree, outfile, pickle.HIGHEST_PROTOCOL)
        return my_tree


obj_file = 'mydict.pk'
dict_file = 'wordlist.txt'
my_tree = get_dict(obj_file, dict_file)

(There are a lot of different types of trees, this is just a very simple example)

When the tree has been built, it will only require len(word) function calls to determine if the entered word is valid. This is a huge improvement from if word in wordlist, which takes O(len(wordlist)).

The downside with an approach like this, is that generating the tree may take some time. By serializing the Tree() object using pickle, you don't have to build the tree each time you start your script.

I tried to build a tree using a wordlist from SIL International (total of 109582 words).

Timing it with timeit, the execution time was reduced by ~50 % when unpickle the object file rather than build the dict from scratch.

If you'd like to only check for permutations, you should alter the add_word() method of Tree() to sort the letters in the first place. Input argument to Tree.is_word() should also of course be sorted.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文