Vim、词频函数和法语口音

发布于 2024-12-05 20:29:04 字数 652 浏览 10 评论 0原文

我最近发现了 Vim Tip n° 1531（文件的词频统计）。

按照建议，我将以下代码放入我的 .vimrc 中，

function! WordFrequency() range
  let all = split(join(getline(a:firstline, a:lastline)), '\A\+')
  let frequencies = {}
  for word in all
    let frequencies[word] = get(frequencies, word, 0) + 1
  endfor
  new
  setlocal buftype=nofile bufhidden=hide noswapfile tabstop=20
  for [key,value] in items(frequencies)
    call append('$', key."\t".value)
  endfor
  sort i
endfunction
command! -range=% WordFrequency <line1>,<line2>call WordFrequency()

除了重音符号和其他法语细节（拉丁小连字 a 或 o 等）之外，它工作正常。

我应该在这个函数中添加什么来使它满足我的需要？

提前致谢

原文

I have recently discovered the Vim Tip n° 1531 (Word frequency statistics for a file).

As suggested I put the following code in my .vimrc

function! WordFrequency() range
  let all = split(join(getline(a:firstline, a:lastline)), '\A\+')
  let frequencies = {}
  for word in all
    let frequencies[word] = get(frequencies, word, 0) + 1
  endfor
  new
  setlocal buftype=nofile bufhidden=hide noswapfile tabstop=20
  for [key,value] in items(frequencies)
    call append('
It works fine except for accents and other french specifics (latin small ligature a or o, etc…).
What am I supposed to add in this function to make it suit my needs ? 
Thanks in advance
, key."\t".value)
  endfor
  sort i
endfunction
command! -range=% WordFrequency <line1>,<line2>call WordFrequency()

It works fine except for accents and other french specifics (latin small ligature a or o, etc…).

What am I supposed to add in this function to make it suit my needs ?

Thanks in advance

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

单挑你×的.吻 2024-12-12 20:29:04

对于 8 位字符，您可以尝试将分割模式从 \A\+ 更改为
[^[:alpha:]]\+。

回复收藏 0 原文

知你几分 2024-12-12 20:29:04

模式 \A\+ 匹配任意数量的连续非字母字符，不幸的是，其中包括多字节字符，例如我们心爱的 çàéô 和朋友。

这意味着您的文本将按空格和多字节字符分割。

对于 \A\+，该短语

Rendez-vous après l'apéritif.

给出：

ap      1
apr     1
l       1
Rendez  1
ritif   1
s       1
vous    1

如果您确定您的文本不包含花哨的空格，您可以将此模式替换为仅匹配空格的 \s\+但这可能是自由主义的。

对于这种模式，\s\+，相同的短语给出：

après       1
l'apéritif. 1
Rendez-vous 1

我认为，这更接近您想要的。

可能需要进行一些定制以排除标点符号。

The pattern \A\+ matches any number of consecutive non-alphabetic characters which — unfortunately — includes multibytes characters like our beloved çàéô and friends.

That means that your text is split at spaces AND at multibyte characters.

With \A\+, the phrase

Rendez-vous après l'apéritif.

gives:

ap      1
apr     1
l       1
Rendez  1
ritif   1
s       1
vous    1

If you are sure your text doesn't include fancy spaces you could replace this pattern with \s\+ that matches whitespace only but it's probably to liberal.

With this pattern, \s\+, the same phrase gives:

après       1
l'apéritif. 1
Rendez-vous 1

which, I think, is closer to what you want.

Some customizing may be necessary to exclude punctuations.

回复收藏 0 原文

只是我以为 2024-12-12 20:29:04

function! WordFrequency() range
  " Whitespace and all punctuation characters except dash and single quote
  let wordSeparators = '[[:blank:],.;:!?%#*+^@&/~_|=<>\[\](){}]\+'
  let all = split(join(getline(a:firstline, a:lastline)), wordSeparators)
  "...
endfunction

如果所有标点符号应为单词分隔符，则表达式会缩短为

let wordSeparators = '[[:blank:][:punct:]]\+'

function! WordFrequency() range
  " Whitespace and all punctuation characters except dash and single quote
  let wordSeparators = '[[:blank:],.;:!?%#*+^@&/~_|=<>\[\](){}]\+'
  let all = split(join(getline(a:firstline, a:lastline)), wordSeparators)
  "...
endfunction

If all punctuation characters should be word separators, the expression shortens to

let wordSeparators = '[[:blank:][:punct:]]\+'

回复收藏 0 原文

~没有更多了~

关于作者

じ违心

暂无简介

文章

27 人气

关注发私信

李珊平

文章 0 评论 0

关注

Quxin

文章 0 评论 0

关注

范无咎

文章 0 评论 0

关注

github_ZOJ2N8YxBm

文章 0 评论 0

关注

若言

文章 0 评论 0

关注

南…巷孤猫

文章 0 评论 0

友情链接

文江博客

Vim、词频函数和法语口音

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

Vim、词频函数和法语口音

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。