aspell 在管道模式下可以输出行号而不是偏移量吗?

发布于 2024-10-31 08:18:28 字数 121 浏览 6 评论 0原文

aspell 可以在管道模式下为 html 和 xml 文件输出行号而不是偏移量吗?我无法逐行读取文件,因为在这种情况下 aspell 无法识别闭合标签(如果标签位于下一行)。

Can aspell output line number and not offset in pipe mode for html and xml files? I can't read the file line by line because in this case aspell can't identify closed tag (if tag situated on the next line).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

雨后咖啡店 2024-11-07 08:18:28

这将输出所有出现的拼写错误单词以及行号:

# Get aspell output...
<my_document.txt aspell pipe list -d en_GB --personal=./aspell.ignore.txt |

# Proccess the aspell output...
grep '[a-zA-Z]\+ [0-9]\+ [0-9]\+' -oh | \
grep '[a-zA-Z]\+' -o | \
while read word; do grep -on "\<$word\>" my_document.txt; done

其中:

  • my_document.txt 是您的原始文档
  • en_GB 是您的主要词典选择(例如尝试 en_US)
  • aspell.ignore.txt 是 aspell 个人词典(示例如下)
  • aspell_output.txt 是管道模式下 aspell 的输出(ispell 样式)
  • result.txt 是最终结果文件

aspell.ignore.txt 示例:

personal_ws-1.1 en 500
foo
bar

example results.txt 输出(对于 en_GB 字典):

238:color
302:writeable
355:backends
433:dataonly

您还可以通过更改最后一个 < 来打印整行将代码>grep -on 转换为grep -n

This will output all occurrences of misspelt words with line numbers:

# Get aspell output...
<my_document.txt aspell pipe list -d en_GB --personal=./aspell.ignore.txt |

# Proccess the aspell output...
grep '[a-zA-Z]\+ [0-9]\+ [0-9]\+' -oh | \
grep '[a-zA-Z]\+' -o | \
while read word; do grep -on "\<$word\>" my_document.txt; done

Where:

  • my_document.txt is your original document
  • en_GB is your primary dictionary choice (e.g. try en_US)
  • aspell.ignore.txt is an aspell personal dictionary (example below)
  • aspell_output.txt is the output of aspell in pipe mode (ispell style)
  • result.txt is a final results file

aspell.ignore.txt example:

personal_ws-1.1 en 500
foo
bar

example results.txt output (for an en_GB dictionary):

238:color
302:writeable
355:backends
433:dataonly

You can also print the whole line by changing the last grep -on into grep -n.

橪书 2024-11-07 08:18:28

这只是一个想法,我还没有真正尝试过(我在 Windows 机器上:()。但也许你可以通过 head 管道传输 html 文件(有字节限制)并使用 grep 计算换行数来查找你的行它既不高效也不漂亮,但它可能会起作用。

cat icantspell.html | head -c <offset from aspell> | egrep -Uc "$"

This is just an idea, I haven't really tried it yet (I'm on a windows machine :(). But maybe you could pipe the html file through head (with byte limit) and count newlines using grep to find your line number. It's neither efficient nor pretty, but it might just work.

cat icantspell.html | head -c <offset from aspell> | egrep -Uc "$"
七秒鱼° 2024-11-07 08:18:28

我使用以下脚本来执行拼写检查并解决 aspell -a / ispell 的尴尬输出。同时,该脚本还解决了 aspell 无法识别像 2nd 这样的序数的问题,方法是简单地忽略 aspell 报告的所有非其自身的单词。

#!/bin/bash

set +o pipefail

if [ -t 1 ] ; then
    color="--color=always"
fi

! for file in "$@" ; do
    <"$file" aspell pipe list -p ./dict --mode=html |
    grep '[[:alpha:]]\+ [0-9]\+ [0-9]\+' -oh |
    grep '[[:alpha:]]\+' -o |
    while read word ; do
        grep $color -n "\<$word\>" "$file"
    done
done | grep .

如果脚本的 stdout 是终端,您甚至会得到彩色输出,并且如果脚本发现拼写错误,您会得到退出状态 1,否则退出状态为脚本是0

此外,该脚本还可以保护自己免受 pipefail 的影响,这是一个比较流行的选项,即在 Makefile 中设置的选项,但不适用于该脚本。最后但并非最不重要的一点是,此脚本显式使用 [[:alpha:]] 而不是 [a-zA-Z],当它也匹配非 ASCII 时,不会那么混乱德语 äöüäÖÜß 等字符。 [a-zA-Z] 也是如此,但在某种程度上这令人惊讶。

I use the following script to perform spell-checking and to work-around the awkward output of aspell -a / ispell. At the same time, the script also works around the problem that ordinals like 2nd aren't recognized by aspell by simply ignoring everything that aspell reports which is not a word of its own.

#!/bin/bash

set +o pipefail

if [ -t 1 ] ; then
    color="--color=always"
fi

! for file in "$@" ; do
    <"$file" aspell pipe list -p ./dict --mode=html |
    grep '[[:alpha:]]\+ [0-9]\+ [0-9]\+' -oh |
    grep '[[:alpha:]]\+' -o |
    while read word ; do
        grep $color -n "\<$word\>" "$file"
    done
done | grep .

You even get colored output if the stdout of the script is a terminal, and you get an exit status of 1 in case the script found spelling mistakes, otherwise the exit status of the script is 0.

Also, the script protects itself from pipefail, which is a somewhat popular option to be set i.e. in a Makefile but doesn't work for this script. Last but not least, this script explicitly uses [[:alpha:]] instead of [a-zA-Z] which is less confusing when it's also matching non-ASCII characters like German äöüÄÖÜß and others. [a-zA-Z] also does, but that to some level comes at a surprise.

一个人练习一个人 2024-11-07 08:18:28

aspell pipe / aspell -a / ispell 为每个输入行输出一个空行(报告该行的错误后)。


使用 awk: 打印行号的演示会

$ aspell pipe < testFile.txt |
awk '/^$/ { countedLine=countedLine+1; print "#L=" countedLine; next; } //'

产生以下输出:

@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.7-20110707)
& iinternational 7 0: international, Internationale, internationally, internationals, intentional, international's, Internationale's
#L=1
*
*
*
& reelly 22 11: Reilly, really, reel, rely, rally, relay, resell, retell, Riley, rel, regally, Riel, freely, real, rill, roll, reels, reply, Greeley, cruelly, reel's, Reilly's
#L=2
*
#L=3
*
*
& sometypo 18 8: some typo, some-typo, setup, sometime, someday, smote, meetup, smarty, stupor, Smetana, somatic, symmetry, mistype, smutty, smite, Sumter, smut, steppe
#L=4

with testFile.txt

iinternational
I say this reelly.
hello
here is sometypo.

(仍然不如 hunspell -u (https://stackoverflow.com/a/10778071/4124767)。但是 hunspell 错过了一些我喜欢的命令行选项。)

aspell pipe / aspell -a / ispell output one empty line for each input line (after reporting the errors of the line).


Demonstration printing the line number with awk:

$ aspell pipe < testFile.txt |
awk '/^$/ { countedLine=countedLine+1; print "#L=" countedLine; next; } //'

produces this output:

@(#) International Ispell Version 3.1.20 (but really Aspell 0.60.7-20110707)
& iinternational 7 0: international, Internationale, internationally, internationals, intentional, international's, Internationale's
#L=1
*
*
*
& reelly 22 11: Reilly, really, reel, rely, rally, relay, resell, retell, Riley, rel, regally, Riel, freely, real, rill, roll, reels, reply, Greeley, cruelly, reel's, Reilly's
#L=2
*
#L=3
*
*
& sometypo 18 8: some typo, some-typo, setup, sometime, someday, smote, meetup, smarty, stupor, Smetana, somatic, symmetry, mistype, smutty, smite, Sumter, smut, steppe
#L=4

with testFile.txt

iinternational
I say this reelly.
hello
here is sometypo.

(Still not as nice as hunspell -u (https://stackoverflow.com/a/10778071/4124767). But hunspell misses some command line options I like.)

一个人的夜不怕黑 2024-11-07 08:18:28

对于将 aspell 与其中一种过滤模式(texhtml 等)一起使用的其他人,这里有一种仅打印拼写错误单词的行号的方法在过滤后的文本中。例如,它不会打印注释中的拼写错误。

ASPELL_ARGS="--mode=html --personal=./.aspell.en.pws"

for file in "$@"; do
  for word in $(aspell $ASPELL_ARGS list < "$file" | sort -u); do
      grep -no "\<$word\>" <(aspell $ASPELL_ARGS filter < "$file")
  done | sort -n
done

这是有效的,因为 aspell 过滤器 不会删除空行。我意识到这并没有按照OP的要求使用aspell管道,但它与使aspell打印行号的精神相同。

For others using aspell with one of the filter modes (tex, html, etc), here's a way to only print line numbers for misspelled words in the filtered text. So for example, it won't print misspellings in the comments.

ASPELL_ARGS="--mode=html --personal=./.aspell.en.pws"

for file in "$@"; do
  for word in $(aspell $ASPELL_ARGS list < "$file" | sort -u); do
      grep -no "\<$word\>" <(aspell $ASPELL_ARGS filter < "$file")
  done | sort -n
done

This works because aspell filter does not delete empty lines. I realize this isn't using aspell pipe as requested by OP, but it's in the same spirit of making aspell print line numbers.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文