Python-给定输入字母可能的英语单字字谜

发布于 2024-11-15 06:29:50 字数 900 浏览 4 评论 0原文

我知道以前有人问过这个问题的变体，但我无法理解以前的任何实现，因为它们大多数都涉及使用集合和 issubset 方法。

这就是我想做的：我在字典中有一组单词和可能的字母列表。我想知道是否可以通过重新排列列表中的字母来形成该集合的成员。这是我当前的实现：

def solve(dictionary, letters):

    for word in dictionary: #for each word in the dictionary
        if len(word) > len(letters):   # first optimization, doesn't check words that are larger than letter set
            continue
        else: 
            scrambledword = "".join([b for b in sorted(list(word))]) #sorts the letters in each word
            if set(scrambledword).issubset(letters):
                print word


def main():
    dictionary = set([x.strip() for x in open("C:\\Python27\\dictionary.txt")])        
    letters = sorted(['v','r','o','o','m','a','b','c','d'])
    solve(dictionary, letters)


main()

此实现的明显问题是，会发现某些单词在“字母”中使用多个字母。例如，尽管字母列表中只有一份“a”和“r”，但单词“cardboard”显示为有效单词。如何在列表上使用“issubset”方法？

原文

I know variations of this have been asked before, but I was unable to understand any of the previous implementations because most of them involved using sets and the issubset method.

Here is what I am trying to do: I have a set of words in a dictionary and a list of possible letters. I want to find if the members of the set can be formed through re-arranging the letters in the list. Here is my current implementation:

def solve(dictionary, letters):

    for word in dictionary: #for each word in the dictionary
        if len(word) > len(letters):   # first optimization, doesn't check words that are larger than letter set
            continue
        else: 
            scrambledword = "".join([b for b in sorted(list(word))]) #sorts the letters in each word
            if set(scrambledword).issubset(letters):
                print word


def main():
    dictionary = set([x.strip() for x in open("C:\\Python27\\dictionary.txt")])        
    letters = sorted(['v','r','o','o','m','a','b','c','d'])
    solve(dictionary, letters)


main()

The obvious problem with this implementation is that some words will be found that use more than one letter in "letters." For example, the word 'cardboard' appears as a valid word, despite there being only one copy of 'a' and 'r' in the letters list. How do I use the "issubset" method on lists?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

幻梦 2024-11-22 06:29:50

要知道你是否可以用一组字母组成一个单词[哎呀，我自己做了——我的意思是“集合”！]，你希望每个字母至少出现正确的次数，所以我认为我们'我们必须以某种方式在那里进行计数。根据定义，Python 集不关心源列表中的元素数量。也许像

from collections import Counter

letters = ['v','r','o','o','m','a','b','c','d']
words = 'cardboard boom booom'.split()
letterscount = Counter(letters)

for word in words:
    wordcount = Counter(word)
    print word, all(letterscount[c] >= wordcount[c] for c in wordcount)

给予

cardboard False
boom True
booom False

Counter 这样的东西是一个方便的实用程序类：

>>> c = Counter(letters)
>>> c
Counter({'o': 2, 'a': 1, 'c': 1, 'b': 1, 'd': 1, 'm': 1, 'r': 1, 'v': 1})
>>> c['o']
2
>>> c['z']
0

[DSM：返回！我删除了一个不起作用的社区编辑，因为计数器实例不可散列。]

如果搜索速度是一个问题，那么您可以权衡内存和预计算时间：

from collections import defaultdict, Counter
from itertools import combinations

# precomputations
allwords = open('/usr/share/dict/words').read().split() 
allwords = list(w for w in allwords if len(w) >= 3) # hack, /words contains lots of silliness
allwords_by_count = defaultdict(list)
for i, word in enumerate(allwords):
    allwords_by_count[frozenset(word)].append((word, Counter(word)))
    if i % 1000 == 0:
        print i, word


def wordsfrom(letters, words_by_count):
    lettercount = Counter(letters)
    for subsetsize in range(1, len(lettercount)+1):
        for subset in combinations(lettercount, subsetsize):
            for possword, posswordcount in words_by_count[frozenset(subset)]:
                if all(posswordcount[c] <= lettercount[c] for c in posswordcount):
                    yield possword

>>> wordsfrom('thistles', allwords_by_count)
<generator object wordsfrom at 0x1032956e0>
>>> list(wordsfrom('thistles', allwords_by_count))
['ess', 'sis', 'tit', 'tst', 'hei', 'hie', 'lei', 'lie', 'sie', 'sise', 'tie', 'tite', 'she', 'het', 'teth', 'the', 'els', 'less', 'elt', 'let', 'telt', 'set', 'sett', 'stet', 'test', 'his', 'hiss', 'shi', 'sish', 'hit', 'lis', 'liss', 'sil', 'lit', 'til', 'tilt', 'ist', 'its', 'sist', 'sit', 'shies', 'tithe', 'isle', 'sile', 'sisel', 'lite', 'teil', 'teli', 'tile', 'title', 'seit', 'sesti', 'site', 'stite', 'testis', 'hest', 'seth', 'lest', 'selt', 'lish', 'slish', 'hilt', 'lith', 'tilth', 'hist', 'sith', 'stith', 'this', 'list', 'silt', 'slit', 'stilt', 'liesh', 'shiel', 'lithe', 'shiest', 'sithe', 'theist', 'thesis', 'islet', 'istle', 'sistle', 'slite', 'stile', 'stilet', 'hitless', 'tehsil', 'thistle']

[呵呵。我刚刚注意到“蓟”本身不在列表中，但那是因为它不在单词文件中..]

是的，单词文件中确实存在明显的“非单词”：

>>> assert all(w in allwords for w in (wordsfrom('thistles', allwords_by_count)))
>>>

To know if you can make a word out of a set of letters [oops, I did it myself -- I meant 'collection'!], you want every letter to occur at least the right number of times, so I think we're going to have to work the counts in there somehow. By definition, Python sets don't care about the number of elements in a source list. Maybe something like

from collections import Counter

letters = ['v','r','o','o','m','a','b','c','d']
words = 'cardboard boom booom'.split()
letterscount = Counter(letters)

for word in words:
    wordcount = Counter(word)
    print word, all(letterscount[c] >= wordcount[c] for c in wordcount)

giving

cardboard False
boom True
booom False

Counter is a handy utility class:

>>> c = Counter(letters)
>>> c
Counter({'o': 2, 'a': 1, 'c': 1, 'b': 1, 'd': 1, 'm': 1, 'r': 1, 'v': 1})
>>> c['o']
2
>>> c['z']
0

[DSM: the return! I removed a community edit which doesn't work because Counter instances aren't hashable.]

If searching speed is a concern, then you can trade off memory and precomputation time:

from collections import defaultdict, Counter
from itertools import combinations

# precomputations
allwords = open('/usr/share/dict/words').read().split() 
allwords = list(w for w in allwords if len(w) >= 3) # hack, /words contains lots of silliness
allwords_by_count = defaultdict(list)
for i, word in enumerate(allwords):
    allwords_by_count[frozenset(word)].append((word, Counter(word)))
    if i % 1000 == 0:
        print i, word


def wordsfrom(letters, words_by_count):
    lettercount = Counter(letters)
    for subsetsize in range(1, len(lettercount)+1):
        for subset in combinations(lettercount, subsetsize):
            for possword, posswordcount in words_by_count[frozenset(subset)]:
                if all(posswordcount[c] <= lettercount[c] for c in posswordcount):
                    yield possword

>>> wordsfrom('thistles', allwords_by_count)
<generator object wordsfrom at 0x1032956e0>
>>> list(wordsfrom('thistles', allwords_by_count))
['ess', 'sis', 'tit', 'tst', 'hei', 'hie', 'lei', 'lie', 'sie', 'sise', 'tie', 'tite', 'she', 'het', 'teth', 'the', 'els', 'less', 'elt', 'let', 'telt', 'set', 'sett', 'stet', 'test', 'his', 'hiss', 'shi', 'sish', 'hit', 'lis', 'liss', 'sil', 'lit', 'til', 'tilt', 'ist', 'its', 'sist', 'sit', 'shies', 'tithe', 'isle', 'sile', 'sisel', 'lite', 'teil', 'teli', 'tile', 'title', 'seit', 'sesti', 'site', 'stite', 'testis', 'hest', 'seth', 'lest', 'selt', 'lish', 'slish', 'hilt', 'lith', 'tilth', 'hist', 'sith', 'stith', 'this', 'list', 'silt', 'slit', 'stilt', 'liesh', 'shiel', 'lithe', 'shiest', 'sithe', 'theist', 'thesis', 'islet', 'istle', 'sistle', 'slite', 'stile', 'stilet', 'hitless', 'tehsil', 'thistle']

[Heh. I just noticed that 'thistles' itself isn't in the list, but that's because it's not in the words file..]

And yes, the apparent "nonwords" there are really in the words file:

>>> assert all(w in allwords for w in (wordsfrom('thistles', allwords_by_count)))
>>>

回复收藏 0 原文

一笔一画续写前缘 2024-11-22 06:29:50

如果您寻找字谜词，换句话说，您想要重新排列，但使用所有它们（而不是仅使用子集），那么还有另一种解决方案。

您首先预处理字典中的所有单词。给定一个单词，您可以生成用相同字母但按字母顺序书写的单词：

def alphabetize(word):
    "".join(sorted(word))

并将这些新单词放入一个集合 newDictionary
然后你的函数可以对字母调用alphabetize并检查结果是否在字典中。

def solve(newDictionary, letters):
    query = alphabetize(letters)
    return query in newDictionary

Alphabetize 函数是 anagrams 的一个特征：当且仅当两个单词应用 Alphabetize 时产生相同的结果时，它们才是彼此的 anagrams。

If you look for anagrams, in other words you want to rearrange, but use all of them (as opposed to use only a subset) then there is another solution.

You first preprocess all the words in the dictionary. Given a word, you produce the word written with the same letters but in alphabetical order:

def alphabetize(word):
    "".join(sorted(word))

and put those new words in a set newDictionary
Then your function can call alphabetize on letters and check whether the result in in the dictionary.

def solve(newDictionary, letters):
    query = alphabetize(letters)
    return query in newDictionary

The alphabetize function is a characteristic of anagrams: two words are anagrams of each other if and only if they produce the same result when alphabetize is applied to them.

回复收藏 0 原文

如梦亦如幻 2024-11-22 06:29:50

导入集合，我们定义一个可散列的多重集：

def Letters(x):
    return frozenset(Counter(x).items())

我们现在将词汇表预处理为字母字典 ->{anagram1,anagram2,...}：

vocabulary = ['apple', 'banana', 'rats', 'star', 'tars']
countsToWords = defaultdict(lambda: set())
for word in vocabulary:
    countsToWords[Letters(word)].add(word)

您的“解决”函数现在需要 O(1)时间：

def solve(query):
    return countsToWords[Letters(query)]

示例：

print( solve('star') )
# {'tars', 'star', 'rats'}

Importing collections, we define a hashable multiset:

def Letters(x):
    return frozenset(Counter(x).items())

We now preprocess the vocabulary into a dict of letters->{anagram1,anagram2,...}:

vocabulary = ['apple', 'banana', 'rats', 'star', 'tars']
countsToWords = defaultdict(lambda: set())
for word in vocabulary:
    countsToWords[Letters(word)].add(word)

Your 'solve' function now takes O(1) time:

def solve(query):
    return countsToWords[Letters(query)]

Example:

print( solve('star') )
# {'tars', 'star', 'rats'}

回复收藏 0 原文

~没有更多了~

关于作者

明媚殇

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

Python-给定输入字母可能的英语单字字谜

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

初遇

听闻余生

Z_dy

左岸枫

1848719402

婷

友情链接

Python-给定输入字母可能的英语单字字谜

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

初遇

听闻余生

Z_dy

左岸枫

1848719402

婷

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。