Vim、词频函数和法语口音
我最近发现了 Vim Tip n° 1531(文件的词频统计)。
按照建议,我将以下代码放入我的 .vimrc 中,
function! WordFrequency() range
let all = split(join(getline(a:firstline, a:lastline)), '\A\+')
let frequencies = {}
for word in all
let frequencies[word] = get(frequencies, word, 0) + 1
endfor
new
setlocal buftype=nofile bufhidden=hide noswapfile tabstop=20
for [key,value] in items(frequencies)
call append('$', key."\t".value)
endfor
sort i
endfunction
command! -range=% WordFrequency <line1>,<line2>call WordFrequency()
除了重音符号和其他法语细节(拉丁小连字 a 或 o 等)之外,它工作正常。
我应该在这个函数中添加什么来使它满足我的需要?
提前致谢
I have recently discovered the Vim Tip n° 1531 (Word frequency statistics for a file).
As suggested I put the following code in my .vimrc
function! WordFrequency() range
let all = split(join(getline(a:firstline, a:lastline)), '\A\+')
let frequencies = {}
for word in all
let frequencies[word] = get(frequencies, word, 0) + 1
endfor
new
setlocal buftype=nofile bufhidden=hide noswapfile tabstop=20
for [key,value] in items(frequencies)
call append('
It works fine except for accents and other french specifics (latin small ligature a or o, etc…).
What am I supposed to add in this function to make it suit my needs ?
Thanks in advance
, key."\t".value)
endfor
sort i
endfunction
command! -range=% WordFrequency <line1>,<line2>call WordFrequency()
It works fine except for accents and other french specifics (latin small ligature a or o, etc…).
What am I supposed to add in this function to make it suit my needs ?
Thanks in advance
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于 8 位字符,您可以尝试将分割模式从
\A\+
更改为[^[:alpha:]]\+
。For 8-bit characters you can try to change the split pattern from
\A\+
to[^[:alpha:]]\+
.模式
\A\+
匹配任意数量的连续非字母字符,不幸的是,其中包括多字节字符,例如我们心爱的çàéô
和朋友。这意味着您的文本将按空格和多字节字符分割。
对于
\A\+
,该短语给出:
如果您确定您的文本不包含花哨的空格,您可以将此模式替换为仅匹配空格的
\s\+
但这可能是自由主义的。对于这种模式,
\s\+
,相同的短语给出:我认为,这更接近您想要的。
可能需要进行一些定制以排除标点符号。
The pattern
\A\+
matches any number of consecutive non-alphabetic characters which — unfortunately — includes multibytes characters like our belovedçàéô
and friends.That means that your text is split at spaces AND at multibyte characters.
With
\A\+
, the phrasegives:
If you are sure your text doesn't include fancy spaces you could replace this pattern with
\s\+
that matches whitespace only but it's probably to liberal.With this pattern,
\s\+
, the same phrase gives:which, I think, is closer to what you want.
Some customizing may be necessary to exclude punctuations.
如果所有标点符号应为单词分隔符,则表达式会缩短为
If all punctuation characters should be word separators, the expression shortens to