计算文件中单词的音节数的代码

发布于 2024-10-29 13:39:46 字数 652 浏览 1 评论 0原文

到目前为止,我有以下代码来计算 cmudict(CMU 发音词典)中单词的音节数。它计算字典中所有单词的音节数。现在我需要用输入文件替换 cmudict 并查找作为输出打印的文件中每个单词的音节数。仅以读取模式打开输入文件是行不通的,因为 dict() 无法作为文件的属性提供。 代码如下:

  
from curses.ascii import isdigit 
from nltk.corpus import cmudict 

d = cmudict.dict() # get the CMU Pronouncing Dict

def nsyl(word): 
    """return the max syllable count in the case of multiple pronunciations"""
    return max([len([y for y in x if isdigit(y[-1])]) for x in d[word.lower()]])


w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])
worth_abbreviating = [(k,v) for (k,v) in w_words.iteritems() if v > 3]
print worth_abbreviating 

有人可以帮我吗?

I have the following piece of code so far to count the number of syllables in the words in the cmudict ( CMU pronunciation dictionary). It counts the number of syllables for all the words in the dictionary. Now I need to replace cmudict with my input file and find the number of syllables for each word in the file which is printed as output. Just opening the input file in read mode does not work as dict() cannot be provided as the attribute to the file.
The code is given below :

  
from curses.ascii import isdigit 
from nltk.corpus import cmudict 

d = cmudict.dict() # get the CMU Pronouncing Dict

def nsyl(word): 
    """return the max syllable count in the case of multiple pronunciations"""
    return max([len([y for y in x if isdigit(y[-1])]) for x in d[word.lower()]])


w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])
worth_abbreviating = [(k,v) for (k,v) in w_words.iteritems() if v > 3]
print worth_abbreviating 

Can anyone please help me out?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

情绪少女 2024-11-05 13:39:46

不确定这是否能解决整个问题,但是:

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])

可能应该是

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a' or w[0] == 'z'])

因为

if w[0] == 'a'or'z' 意味着 if (w[0] == 'a ') 或 ('z')。字符串 'z' 为 True,因此条件始终为 True。

例如,

In [36]: 'x' == 'a'or'z'
Out[36]: 'z'

In [37]: 'x' == 'a' or 'x'=='z'
Out[37]: False

Not sure this will solve the whole problem, but:

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])

should probably be

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a' or w[0] == 'z'])

since

if w[0] == 'a'or'z' means if (w[0] == 'a') or ('z'). The string 'z' is Truish, so the condition is always True.

For example,

In [36]: 'x' == 'a'or'z'
Out[36]: 'z'

In [37]: 'x' == 'a' or 'x'=='z'
Out[37]: False
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文