Python-给定输入字母可能的英语单字字谜

发布于 2024-11-15 06:29:50 字数 900 浏览 4 评论 0原文

我知道以前有人问过这个问题的变体,但我无法理解以前的任何实现,因为它们大多数都涉及使用集合和 issubset 方法。

这就是我想做的:我在字典中有一组单词和可能的字母列表。我想知道是否可以通过重新排列列表中的字母来形成该集合的成员。这是我当前的实现:

def solve(dictionary, letters):

    for word in dictionary: #for each word in the dictionary
        if len(word) > len(letters):   # first optimization, doesn't check words that are larger than letter set
            continue
        else: 
            scrambledword = "".join([b for b in sorted(list(word))]) #sorts the letters in each word
            if set(scrambledword).issubset(letters):
                print word


def main():
    dictionary = set([x.strip() for x in open("C:\\Python27\\dictionary.txt")])        
    letters = sorted(['v','r','o','o','m','a','b','c','d'])
    solve(dictionary, letters)


main()

此实现的明显问题是,会发现某些单词在“字母”中使用多个字母。例如,尽管字母列表中只有一份“a”和“r”,但单词“cardboard”显示为有效单词。如何在列表上使用“issubset”方法?

I know variations of this have been asked before, but I was unable to understand any of the previous implementations because most of them involved using sets and the issubset method.

Here is what I am trying to do: I have a set of words in a dictionary and a list of possible letters. I want to find if the members of the set can be formed through re-arranging the letters in the list. Here is my current implementation:

def solve(dictionary, letters):

    for word in dictionary: #for each word in the dictionary
        if len(word) > len(letters):   # first optimization, doesn't check words that are larger than letter set
            continue
        else: 
            scrambledword = "".join([b for b in sorted(list(word))]) #sorts the letters in each word
            if set(scrambledword).issubset(letters):
                print word


def main():
    dictionary = set([x.strip() for x in open("C:\\Python27\\dictionary.txt")])        
    letters = sorted(['v','r','o','o','m','a','b','c','d'])
    solve(dictionary, letters)


main()

The obvious problem with this implementation is that some words will be found that use more than one letter in "letters." For example, the word 'cardboard' appears as a valid word, despite there being only one copy of 'a' and 'r' in the letters list. How do I use the "issubset" method on lists?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

幻梦 2024-11-22 06:29:50

要知道你是否可以用一组字母组成一个单词[哎呀,我自己做了——我的意思是“集合”!],你希望每个字母至少出现正确的次数,所以我认为我们'我们必须以某种方式在那里进行计数。根据定义,Python 集不关心源列表中的元素数量。也许像

from collections import Counter

letters = ['v','r','o','o','m','a','b','c','d']
words = 'cardboard boom booom'.split()
letterscount = Counter(letters)

for word in words:
    wordcount = Counter(word)
    print word, all(letterscount[c] >= wordcount[c] for c in wordcount)

给予

cardboard False
boom True
booom False

Counter 这样的东西是一个方便的实用程序类:

>>> c = Counter(letters)
>>> c
Counter({'o': 2, 'a': 1, 'c': 1, 'b': 1, 'd': 1, 'm': 1, 'r': 1, 'v': 1})
>>> c['o']
2
>>> c['z']
0

[DSM:返回!我删除了一个不起作用的社区编辑,因为计数器实例不可散列。]

如果搜索速度是一个问题,那么您可以权衡内存和预计算时间:

from collections import defaultdict, Counter
from itertools import combinations

# precomputations
allwords = open('/usr/share/dict/words').read().split() 
allwords = list(w for w in allwords if len(w) >= 3) # hack, /words contains lots of silliness
allwords_by_count = defaultdict(list)
for i, word in enumerate(allwords):
    allwords_by_count[frozenset(word)].append((word, Counter(word)))
    if i % 1000 == 0:
        print i, word


def wordsfrom(letters, words_by_count):
    lettercount = Counter(letters)
    for subsetsize in range(1, len(lettercount)+1):
        for subset in combinations(lettercount, subsetsize):
            for possword, posswordcount in words_by_count[frozenset(subset)]:
                if all(posswordcount[c] <= lettercount[c] for c in posswordcount):
                    yield possword

>>> wordsfrom('thistles', allwords_by_count)
<generator object wordsfrom at 0x1032956e0>
>>> list(wordsfrom('thistles', allwords_by_count))
['ess', 'sis', 'tit', 'tst', 'hei', 'hie', 'lei', 'lie', 'sie', 'sise', 'tie', 'tite', 'she', 'het', 'teth', 'the', 'els', 'less', 'elt', 'let', 'telt', 'set', 'sett', 'stet', 'test', 'his', 'hiss', 'shi', 'sish', 'hit', 'lis', 'liss', 'sil', 'lit', 'til', 'tilt', 'ist', 'its', 'sist', 'sit', 'shies', 'tithe', 'isle', 'sile', 'sisel', 'lite', 'teil', 'teli', 'tile', 'title', 'seit', 'sesti', 'site', 'stite', 'testis', 'hest', 'seth', 'lest', 'selt', 'lish', 'slish', 'hilt', 'lith', 'tilth', 'hist', 'sith', 'stith', 'this', 'list', 'silt', 'slit', 'stilt', 'liesh', 'shiel', 'lithe', 'shiest', 'sithe', 'theist', 'thesis', 'islet', 'istle', 'sistle', 'slite', 'stile', 'stilet', 'hitless', 'tehsil', 'thistle']

[呵呵。我刚刚注意到“蓟”本身不在列表中,但那是因为它不在单词文件中..]

是的,单词文件中确实存在明显的“非单词”:

>>> assert all(w in allwords for w in (wordsfrom('thistles', allwords_by_count)))
>>> 

To know if you can make a word out of a set of letters [oops, I did it myself -- I meant 'collection'!], you want every letter to occur at least the right number of times, so I think we're going to have to work the counts in there somehow. By definition, Python sets don't care about the number of elements in a source list. Maybe something like

from collections import Counter

letters = ['v','r','o','o','m','a','b','c','d']
words = 'cardboard boom booom'.split()
letterscount = Counter(letters)

for word in words:
    wordcount = Counter(word)
    print word, all(letterscount[c] >= wordcount[c] for c in wordcount)

giving

cardboard False
boom True
booom False

Counter is a handy utility class:

>>> c = Counter(letters)
>>> c
Counter({'o': 2, 'a': 1, 'c': 1, 'b': 1, 'd': 1, 'm': 1, 'r': 1, 'v': 1})
>>> c['o']
2
>>> c['z']
0

[DSM: the return! I removed a community edit which doesn't work because Counter instances aren't hashable.]

If searching speed is a concern, then you can trade off memory and precomputation time:

from collections import defaultdict, Counter
from itertools import combinations

# precomputations
allwords = open('/usr/share/dict/words').read().split() 
allwords = list(w for w in allwords if len(w) >= 3) # hack, /words contains lots of silliness
allwords_by_count = defaultdict(list)
for i, word in enumerate(allwords):
    allwords_by_count[frozenset(word)].append((word, Counter(word)))
    if i % 1000 == 0:
        print i, word


def wordsfrom(letters, words_by_count):
    lettercount = Counter(letters)
    for subsetsize in range(1, len(lettercount)+1):
        for subset in combinations(lettercount, subsetsize):
            for possword, posswordcount in words_by_count[frozenset(subset)]:
                if all(posswordcount[c] <= lettercount[c] for c in posswordcount):
                    yield possword

>>> wordsfrom('thistles', allwords_by_count)
<generator object wordsfrom at 0x1032956e0>
>>> list(wordsfrom('thistles', allwords_by_count))
['ess', 'sis', 'tit', 'tst', 'hei', 'hie', 'lei', 'lie', 'sie', 'sise', 'tie', 'tite', 'she', 'het', 'teth', 'the', 'els', 'less', 'elt', 'let', 'telt', 'set', 'sett', 'stet', 'test', 'his', 'hiss', 'shi', 'sish', 'hit', 'lis', 'liss', 'sil', 'lit', 'til', 'tilt', 'ist', 'its', 'sist', 'sit', 'shies', 'tithe', 'isle', 'sile', 'sisel', 'lite', 'teil', 'teli', 'tile', 'title', 'seit', 'sesti', 'site', 'stite', 'testis', 'hest', 'seth', 'lest', 'selt', 'lish', 'slish', 'hilt', 'lith', 'tilth', 'hist', 'sith', 'stith', 'this', 'list', 'silt', 'slit', 'stilt', 'liesh', 'shiel', 'lithe', 'shiest', 'sithe', 'theist', 'thesis', 'islet', 'istle', 'sistle', 'slite', 'stile', 'stilet', 'hitless', 'tehsil', 'thistle']

[Heh. I just noticed that 'thistles' itself isn't in the list, but that's because it's not in the words file..]

And yes, the apparent "nonwords" there are really in the words file:

>>> assert all(w in allwords for w in (wordsfrom('thistles', allwords_by_count)))
>>> 
一笔一画续写前缘 2024-11-22 06:29:50

如果您寻找字谜词,换句话说,您想要重新排列,但使用所有它们(而不是仅使用子集),那么还有另一种解决方案。

您首先预处理字典中的所有单词。给定一个单词,您可以生成用相同字母但按字母顺序书写的单词:

def alphabetize(word):
    "".join(sorted(word))

并将这些新单词放入一个集合 newDictionary
然后你的函数可以对字母调用alphabetize并检查结果是否在字典中。

def solve(newDictionary, letters):
    query = alphabetize(letters)
    return query in newDictionary

Alphabetize 函数是 anagrams 的一个特征:当且仅当两个单词应用 Alphabetize 时产生相同的结果时,它们才是彼此的 anagrams。

If you look for anagrams, in other words you want to rearrange, but use all of them (as opposed to use only a subset) then there is another solution.

You first preprocess all the words in the dictionary. Given a word, you produce the word written with the same letters but in alphabetical order:

def alphabetize(word):
    "".join(sorted(word))

and put those new words in a set newDictionary
Then your function can call alphabetize on letters and check whether the result in in the dictionary.

def solve(newDictionary, letters):
    query = alphabetize(letters)
    return query in newDictionary

The alphabetize function is a characteristic of anagrams: two words are anagrams of each other if and only if they produce the same result when alphabetize is applied to them.

如梦亦如幻 2024-11-22 06:29:50

导入集合,我们定义一个可散列的多重集:

def Letters(x):
    return frozenset(Counter(x).items())

我们现在将词汇表预处理为字母字典 ->{anagram1,anagram2,...}:

vocabulary = ['apple', 'banana', 'rats', 'star', 'tars']
countsToWords = defaultdict(lambda: set())
for word in vocabulary:
    countsToWords[Letters(word)].add(word)

您的“解决”函数现在需要 O(1)时间:

def solve(query):
    return countsToWords[Letters(query)]

示例:

print( solve('star') )
# {'tars', 'star', 'rats'} 

Importing collections, we define a hashable multiset:

def Letters(x):
    return frozenset(Counter(x).items())

We now preprocess the vocabulary into a dict of letters->{anagram1,anagram2,...}:

vocabulary = ['apple', 'banana', 'rats', 'star', 'tars']
countsToWords = defaultdict(lambda: set())
for word in vocabulary:
    countsToWords[Letters(word)].add(word)

Your 'solve' function now takes O(1) time:

def solve(query):
    return countsToWords[Letters(query)]

Example:

print( solve('star') )
# {'tars', 'star', 'rats'} 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文