LaTeX 文档的正确字数统计

发布于 2024-09-04 02:17:45 字数 317 浏览 2 评论 0原文

我目前正在寻找能够对 LaTeX 文档进行正确字数统计的应用程序或脚本。

到目前为止,我只遇到过仅适用于单个文件的脚本,但我想要的是一个可以安全地忽略 LaTeX 关键字并且还遍历链接文件的脚本...即遵循 \ include\input 链接,为整个文档生成正确的字数统计。

对于 vim,我目前使用 ggVGg CTRL+G ,但显然它显示了当前文件的计数并且不会忽略 LaTeX 关键字。

有谁知道任何可以完成这项工作的脚本(或应用程序)?

I'm currently searching for an application or a script that does a correct word count for a LaTeX document.

Up till now, I have only encountered scripts that only work on a single file but what I want is a script that can safely ignore LaTeX keywords and also traverse linked files...ie follow \include and \input links to produce a correct word-count for the whole document.

With vim, I currently use ggVGg CTRL+G but obviously that shows the count for the current file and does not ignore LaTeX keywords.

Does anyone know of any script (or application) that can do this job?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

や三分注定 2024-09-11 02:17:45

我使用texcount网页有一个可供下载的 Perl 脚本(和手册)。

它将包含文档中包含的 tex 文件(\input\include)(请参阅 -inc ),支持宏,并具有许多其他不错的功能。

当跟踪包含的文件时,您将获得有关每个单独文件以及总数的详细信息。例如,这里是我的 12 页文档的总输出:

TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19

如果您只对总数感兴趣,请使用 -total 参数。

I use texcount. The webpage has a Perl script to download (and a manual).

It will include tex files that are included (\input or \include) in the document (see -inc), supports macros, and has many other nice features.

When following included files you will get detail about each separate file as well as a total. For example here is the total output for a 12 page document of mine:

TOTAL COUNT
Files: 20
Words in text: 4188
Words in headers: 26
Words in float captions: 404
Number of headers: 12
Number of floats: 7
Number of math inlines: 85
Number of math displayed: 19

If you're only interested in the total, use the -total argument.

烂人 2024-09-11 02:17:45

我采纳了 icio 的评论,通过将 pdftotext 的输出传输到 wc 来对 pdf 本身进行字数统计:

pdftotext file.pdf - | wc - w 

I went with icio's comment and did a word-count on the pdf itself by piping the output of pdftotext to wc:

pdftotext file.pdf - | wc - w 
虚拟世界 2024-09-11 02:17:45
latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w

应该会给你一个相当准确的字数。

latex file.tex
dvips -o - file.dvi | ps2ascii | wc -w

should give you a fairly accurate word count.

软甜啾 2024-09-11 02:17:45

要添加到@aioobe,

如果您使用pdflatex,

pdftops file.pdf
ps2ascii file.ps|wc -w

我只需将此计数与1599字文档中的Microsoft Word中的计数进行比较(根据Word)。 pdftotext 生成了包含 1700 多个单词的文本。 texcount 不包含参考文献并生成了 1088 个单词。 ps2ascii 返回 1603 个单词。比 Word 多 4 个。

我说这是一个相当不错的数字。不过,我不确定这四个字的区别在哪里。 :)

To add to @aioobe,

If you use pdflatex, just do

pdftops file.pdf
ps2ascii file.ps|wc -w

I compared this count to the count in Microsoft Word in a 1599 word document (according to Word). pdftotext produced a text with 1700+ words. texcount did not include the references and produced 1088 words. ps2ascii returned 1603 words. 4 more than in Word.

I say that's a pretty good count. I am not sure where's the 4 word difference, though. :)

落墨 2024-09-11 02:17:45

在 Texmaker 界面中,您可以通过右键单击 PDF 预览来获取字数统计:

在此处输入图像描述

< img src="https://i.sstatic.net/7QY2w.png" alt="在此处输入图像描述">

In Texmaker interface you can get the word count by right clicking in the PDF preview:

enter image description here

enter image description here

So尛奶瓶 2024-09-11 02:17:45

Overleaf 具有字数统计功能:

Overleaf v2:

在此处输入图像描述

在此处输入图像描述

背页 v1:

在此处输入图像描述

在此处输入图像描述

Overleaf has a word count feature:

Overleaf v2:

enter image description here

enter image description here

Overleaf v1:

enter image description here

enter image description here

怪我入戏太深 2024-09-11 02:17:45

我使用以下 VIM 脚本:

function! WC()
    let filename = expand("%")
    let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
    let result = system(cmd)
    echo result . " words"
endfunction

...但它不跟踪链接。这基本上需要解析 TeX 文件来获取所有链接的文件,不是吗?

相对于其他答案的优点是,它不必生成输出文件(PDF 或 PS)来计算字数,因此它可能(取决于使用情况)更加更加高效。

虽然icio的评论在理论上是正确的,但我发现上述方法对单词数给出了相当准确的估计。对于大多数文本来说,它完全在许多作业中使用的 5% 的范围内。

I use the following VIM script:

function! WC()
    let filename = expand("%")
    let cmd = "detex " . filename . " | wc -w | perl -pe 'chomp; s/ +//;'"
    let result = system(cmd)
    echo result . " words"
endfunction

… but it doesn’t follow links. This would basically entail parsing the TeX file to get all linked files, wouldn’t it?

The advantage over the other answers is that it doesn’t have to produce an output file (PDF or PS) to compute the word count so it’s potentially (depending on usage) much more efficient.

Although icio’s comment is theoretically correct, I found that the above method gives quite accurate estimates for the number of words. For most texts, it’s well within the 5% margin that is used in many assignments.

安稳善良 2024-09-11 02:17:45

如果 vim 插件的使用适合您,vimtex 插件集成了 texcount 工具非常好。

以下是他们的文档的摘录:

:VimtexCountLetters       Shows the number of letters/characters or words in
:VimtexCountWords         the current project or in the selected region. The
                          count is created with `texcount` through a call on
                          the main project file similar to: >

                            texcount -nosub -sum [-letter] -merge -q -1 FILE
<
                          Note: Default arguments may be controlled with
                                |g:vimtex_texcount_custom_arg|.

                          Note: One may access the information through the
                                function `vimtex#misc#wordcount(opts)`, where
                                `opts` is a dictionary with the following
                                keys (defaults indicated): >

                                'range' : [1, line('

最好的部分是它的可扩展性。除了计算当前文件中的字数之外,您还可以进行视觉选择(例如两个或三个段落),然后仅将命令应用于您的选择。

)] 'count_letters' : 0/1 'detailed' : 0 < If `detailed` is 0, then it only returns the total count. This makes it possible to use for e.g. statusline functions. If the `opts` dict is not passed, then the defaults are assumed. *VimtexCountLetters!* *VimtexCountWords!* :VimtexCountLetters! Similar to |VimtexCountLetters|/|VimtexCountWords|, but :VimtexCountWords! show separate reports for included files. I.e. presents the result of: > texcount -nosub -sum [-letter] -inc FILE < *VimtexImapsList* *<plug>(vimtex-imaps-list)*

最好的部分是它的可扩展性。除了计算当前文件中的字数之外,您还可以进行视觉选择(例如两个或三个段落),然后仅将命令应用于您的选择。

If the use of a vim plugin suits you, the vimtex plugin has integrated the texcount tool quite nicely.

Here is an excerpt from their documentation:

:VimtexCountLetters       Shows the number of letters/characters or words in
:VimtexCountWords         the current project or in the selected region. The
                          count is created with `texcount` through a call on
                          the main project file similar to: >

                            texcount -nosub -sum [-letter] -merge -q -1 FILE
<
                          Note: Default arguments may be controlled with
                                |g:vimtex_texcount_custom_arg|.

                          Note: One may access the information through the
                                function `vimtex#misc#wordcount(opts)`, where
                                `opts` is a dictionary with the following
                                keys (defaults indicated): >

                                'range' : [1, line('

The nice part about this is how extensible it is. On top of counting the number of words in your current file, you can make a visual selection (say two or three paragraphs) and then only apply the command to your selection.

)] 'count_letters' : 0/1 'detailed' : 0 < If `detailed` is 0, then it only returns the total count. This makes it possible to use for e.g. statusline functions. If the `opts` dict is not passed, then the defaults are assumed. *VimtexCountLetters!* *VimtexCountWords!* :VimtexCountLetters! Similar to |VimtexCountLetters|/|VimtexCountWords|, but :VimtexCountWords! show separate reports for included files. I.e. presents the result of: > texcount -nosub -sum [-letter] -inc FILE < *VimtexImapsList* *<plug>(vimtex-imaps-list)*

The nice part about this is how extensible it is. On top of counting the number of words in your current file, you can make a visual selection (say two or three paragraphs) and then only apply the command to your selection.

素衣风尘叹 2024-09-11 02:17:45

对于非常基本的文章类文档,我只查看正则表达式的匹配数来查找单词。我使用 Sublime Text,因此此方法可能不适用于其他编辑器,但我只需按 Ctrl+F(在 Mac 上为 Command+F),然后使用启用正则表达式,搜索

(^|\s+|"|((h|f|te){)|\()\w+

应忽略声明浮动环境的文本或图形标题以及大多数类型的基本方程和 \usepackage 声明,同时包括引号和括号。它还对脚注和 \emph 大小的文本进行计数,并将 \hyperref 链接计为一个单词。它并不完美,但通常准确到几十个字左右。您可以改进它以适合您,但脚本可能是更好的解决方案,因为 LaTeX 源代码不是常规语言。只是想我会把这个扔在这里。

For a very basic article class document I just look at the number of matches for a regex to find words. I use Sublime Text, so this method may not work for you in a different editor, but I just hit Ctrl+F (Command+F on Mac) and then, with regex enabled, search for

(^|\s+|"|((h|f|te){)|\()\w+

which should ignore text declaring a floating environment or captions on figures as well as most kinds of basic equations and \usepackage declarations, while including quotations and parentheticals. It also counts footnotes and \emphasized text and will count \hyperref links as one word. It's not perfect, but it's typically accurate to within a few dozen words or so. You could refine it to work for you, but a script is probably a better solution, since LaTeX source code isn't a regular language. Just thought I'd throw this up here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文