Python 中的 Anagram 代码 - 将动态生成的字符串与 txt 文件进行比较
我用 Python 编写了一个 Anagram 解决程序。我想听听你对我的做法是否正确的看法。让我解释一下逻辑:
- 首先,用户提供两个单词的输入,他/她希望为其生成单个单词字谜(2 个字符串值),
- 这两个单词被连接起来,并导出第三个值。
- 第三个值由 itertools.permutations 函数处理,其中单词的所有可能排列都以列表形式导出。
- 该列表采用从列表派生的字符串值进行格式化。
- 此时,我已经打开了一个单词列表,它将用作字典来比较字符串值是否是实际单词。
- 逐行读取文件,并将字符串值与行进行比较。
- 如果找到匹配项,则程序会将输出作为字典匹配项打印在屏幕上,
请告诉我是否正确,或者是否可以提出任何改进建议。任何反馈表示赞赏。我是Python新手。
这是代码:
#This program has been created to solve anagram puzzles
# All the imports go here
#import re
import itertools
import fileinput
def anaCore():
print 'This is a Handy-Dandy Anagram Solving Machine'
print 'First, we enter the first word....'
anaWordOnly = False
firstWord = raw_input('Please enter the first word > ')
print 'Thank you for entering %r as your first word' % firstWord
print 'Now we enter the second word....'
secondWord = raw_input('Please enter the second word > ')
print 'Thank you for entering %r as your second word' % secondWord
thirdWord = firstWord+secondWord
print thirdWord
mylist = itertools.permutations(thirdWord)
for a in mylist:
#print a
mystr = ''.join(a)
for line in fileinput.input("brit-a-z.txt"):
if mystr in line:
print 'Dictionary match found', mystr
#print mystr
anaCore()
I have written an Anagram solving program in Python. I wanted your opinion on whether I had gone about it right. Let me explain the logic:
- First, the user provides input of two words that he/she wants the single word anagram to be generated for (2 string values)
- The two are concatenated and there is third value that is derived.
- The third value is processed by the itertools.permutations function where all possible permutations of the word are derived as a list.
- The list is formatted with string value being derived from the list.
- At this point, I have opened a word list that will be used as a dictionary to compare whether the string value is an actual word.
- The file is read, line by line and the string value is compared with the lines.
- If a match is found, then the program prints the output on screen as a Dictionary Match
Please tell me if I am going about it correctly or if any improvements can be suggested. Any feedback appreciated. I am new to Python.
Here is the code:
#This program has been created to solve anagram puzzles
# All the imports go here
#import re
import itertools
import fileinput
def anaCore():
print 'This is a Handy-Dandy Anagram Solving Machine'
print 'First, we enter the first word....'
anaWordOnly = False
firstWord = raw_input('Please enter the first word > ')
print 'Thank you for entering %r as your first word' % firstWord
print 'Now we enter the second word....'
secondWord = raw_input('Please enter the second word > ')
print 'Thank you for entering %r as your second word' % secondWord
thirdWord = firstWord+secondWord
print thirdWord
mylist = itertools.permutations(thirdWord)
for a in mylist:
#print a
mystr = ''.join(a)
for line in fileinput.input("brit-a-z.txt"):
if mystr in line:
print 'Dictionary match found', mystr
#print mystr
anaCore()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
当然,您可以生成单词的所有排列。不过,我认为对单词中的字母进行排序会更方便。因此,您必须预处理整个字典,即对每个单词中的字母进行排序。然后,您只需要检查字符的排序序列。
为了简化:我将生成你的字谜词的排序序列。对于文件中的每一行,我会对它的字符进行排序并检查两者是否相同。如果是,请检查它们是否是相同的单词。如果它们不是相同的单词,那么它们就是字谜词。
Of course you can generate all permutations of words. However, I think it would be more convenient to sort the letters in the word. Therefore, you would have to preprocess your whole dictionary, i.e. sort the letters in each word. Then, you just need to check for the sorted sequence of characters.
To simplify: I would generate the sorted sequence of your anagram word. For each line in the file, i would sort it's characters and check if both are the same. If so, check if they were identical words. If they were not identical words, they're anagrams.
你的方法很好;对字符串调用 itertools.permutations 是查找匹配项的好方法。这里只是一些想法/改进
mylist = itertools.permutations(thirdWord)
:请记住permutations
并不真正返回一个列表 - 它返回一个生成器,该生成器消耗恒定的内存量(相对于排列的数量)并根据需要生成新的排列。特别是,当您循环生成器时,一次会产生一种排列。此外,生成器只能向前生成值——通常不能向后迭代生成器。生成器是 Python 中的一个关键概念。请参阅 http://wiki.python.org/moin/Generators 了解更多信息。s.lower()
返回字符串s
的小写副本。set
中,则查找每个排列的时间为 O(1)。所以你的总运行时间是 O(n!)。Your approach is fine; calling itertools.permutations on a string is a good way to find matches. Here are just a few thoughts/improvements
mylist = itertools.permutations(thirdWord)
: remember thatpermutations
does not literally return a list--it returns a generator, which consumes a constant amount of memory (relative to the number of permutations) and produces new permutations on demand. In particular, when you loop over the generator, you produce one permutation at a time. Also, a generator can only produce values in the forward direction--you cannot generally iterate backwards over a generator. Generators are a key concept in Python. See http://wiki.python.org/moin/Generators for more information.s.lower()
returns a lowercased copy of the strings
.set
, then the time to look up each permutation is O(1). So your total runtime is O(n!).你为什么要做
mystr =''.join(a)
?为什么不直接执行mystr = a
?我也不认为
if mystr in line:
是正确的,因为你可以将 mystr 作为“dog”,将 line 作为“doggerbank”,或者类似的东西。您可能应该检查是否相等。除此之外我看不出有什么问题。
如果你想变得聪明,你可以创建一个第 2、第 3、第 4、...第 n 个字典,其中包含初始字典和字典 n - 1 中单词的所有组合。这样你也可以找到多单词字谜。但不要让 n 太大,否则字典会占用大量空间。
Why are you doing
mystr =''.join(a)
? why not just domystr = a
?I don't think that
if mystr in line:
is right either, because you could have mystr as, for instance 'dog', and line as 'dogger bank', or something like that. You should probably check for equality instead.Other than that I can't see anything wrong.
If you wanted to be clever, you could create a 2st, 3nd, 4th, ... nth dictionary consisting of all combinations of words in the initial dictionary and dictionary n - 1. That way you could find multiword anagrams as well. Don't let n get too big though or the dictionary would take up lots of space.
我的一些想法:
当前的方法是首先生成“thirdWord”的所有可能的排列,然后对于每个排列,您每次都通过读取文本文件来检查它是否存在于字典中。
您不妨在程序启动时只读取一次字典文件,将单词放入“集合”中。然后,您可以使用 'in' 轻松检查排列是否存在于集合中:
此外,使用一些长的 'thirdWord' 会生成太多排列。例如,对于长度为 16 且所有字母都不同的单词,它将生成 16! = 20,922,789,888,000 种排列。这有点大了。
您可以通过迭代字典中的单词来反转该过程,并检查每个单词是否是带有“thirdWord”的字谜。对于较长的单词,这应该比检查所有排列更快。
检查字谜词很简单:
Some of my ideas:
The current approach is to first generate all possible permutations of 'thirdWord', then for each permutations, you check if it exists in the dictionary by reading the text file everytime.
You might as well read the dictionary file only once at program's start, put the words into a 'set'. Then, then you can use 'in' to easily check if the permutation exists in the set:
Also, with some long 'thirdWord' it would generate too many permutations. For example, for a word of length 16 with all different letters, it would generate 16! = 20,922,789,888,000 permutations. This is kind of large.
You might reverse the process by iterating the words in dictionary instead, and check for each word if it is anagram with 'thirdWord'. This should be faster than checking with all permutations, for longer words.
Checking for anagram is as easy as: