计算文件中单词的音节数的代码

发布于 2024-10-29 13:39:46 字数 652 浏览 5 评论 0原文

到目前为止，我有以下代码来计算 cmudict（CMU 发音词典）中单词的音节数。它计算字典中所有单词的音节数。现在我需要用输入文件替换 cmudict 并查找作为输出打印的文件中每个单词的音节数。仅以读取模式打开输入文件是行不通的，因为 dict() 无法作为文件的属性提供。代码如下：

  
from curses.ascii import isdigit 
from nltk.corpus import cmudict 

d = cmudict.dict() # get the CMU Pronouncing Dict

def nsyl(word): 
    """return the max syllable count in the case of multiple pronunciations"""
    return max([len([y for y in x if isdigit(y[-1])]) for x in d[word.lower()]])


w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])
worth_abbreviating = [(k,v) for (k,v) in w_words.iteritems() if v > 3]
print worth_abbreviating

有人可以帮我吗？

原文

I have the following piece of code so far to count the number of syllables in the words in the cmudict ( CMU pronunciation dictionary). It counts the number of syllables for all the words in the dictionary. Now I need to replace cmudict with my input file and find the number of syllables for each word in the file which is printed as output. Just opening the input file in read mode does not work as dict() cannot be provided as the attribute to the file.
The code is given below :

  
from curses.ascii import isdigit 
from nltk.corpus import cmudict 

d = cmudict.dict() # get the CMU Pronouncing Dict

def nsyl(word): 
    """return the max syllable count in the case of multiple pronunciations"""
    return max([len([y for y in x if isdigit(y[-1])]) for x in d[word.lower()]])


w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])
worth_abbreviating = [(k,v) for (k,v) in w_words.iteritems() if v > 3]
print worth_abbreviating

Can anyone please help me out?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情绪少女 2024-11-05 13:39:46

不确定这是否能解决整个问题，但是：

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])

可能应该是

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a' or w[0] == 'z'])

因为

if w[0] == 'a'or'z' 意味着 if (w[0] == 'a ') 或 ('z')。字符串 'z' 为 True，因此条件始终为 True。

例如，

In [36]: 'x' == 'a'or'z'
Out[36]: 'z'

In [37]: 'x' == 'a' or 'x'=='z'
Out[37]: False

Not sure this will solve the whole problem, but:

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a'or'z'])

should probably be

w_words = dict([(w, nsyl(w)) for w in d.keys() if w[0] == 'a' or w[0] == 'z'])

since

if w[0] == 'a'or'z' means if (w[0] == 'a') or ('z'). The string 'z' is Truish, so the condition is always True.

For example,

In [36]: 'x' == 'a'or'z'
Out[36]: 'z'

In [37]: 'x' == 'a' or 'x'=='z'
Out[37]: False

回复收藏 0 原文

~没有更多了~

关于作者

情徒

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

计算文件中单词的音节数的代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

隔纱相望

昵称有卵用

梨涡

蓝咒

白芷

樱娆

友情链接

计算文件中单词的音节数的代码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

隔纱相望

昵称有卵用

梨涡

蓝咒

白芷

樱娆

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。