使用 ispell/aspell 对驼峰式单词进行拼写检查
我需要对包含许多驼峰式单词的大型文档进行拼写检查。我想要 ispell 或 aspell 检查各个单词是否拼写正确。
所以,对于这个词:
ScientificProgrezGoesBoink
我很想让它建议这样:
科学进步引起轰动
有什么办法可以做到这一点吗? (我的意思是,在 Emacs 缓冲区上运行它时。)请注意,我不一定希望它建议完整的替代方案。然而,如果它知道 Progrez 不被识别,我希望至少能够替换该部分,或者将该单词添加到我的私人词典中,而不是将每个驼峰式单词都包含到词典中。
I need to spell check a large document containing many camelcased words. I want ispell or aspell to check if the individual words are spelled correctly.
So, in case of this word:
ScientificProgrezGoesBoink
I would love to have it suggest this instead:
ScientificProgressGoesBoink
Is there any way to do this? (And I mean, while running it on an Emacs buffer.) Note that I don't necessarily want it to suggest the complete alternative. However, if it understands that Progrez is not recognized, I would love to be able to replace that part at least, or add that word to my private dictionary, rather than including every camel-cased word into the dictionary.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我采纳了 @phils 的建议并进行了更深入的研究。事实证明,如果您获得 camelCase-mode 并重新配置一些 ispell,如下所示:
然后,在在这种情况下,单个驼峰式单词 suchAsThisOne 实际上会被正确地进行拼写检查。 (除非你在文档的开头——我刚刚发现。)
所以这显然不是完整的解决方案,但至少是这样。
I took @phils suggestions and dug around a little deeper. It turns out that if you get camelCase-mode and reconfigure some of ispell like this:
then, in that case, individual camel cased words suchAsThisOne will actually be spell-checked correctly. (Unless you're at the beginning of a document -- I just found out.)
So this clearly isn't the fullblown solution, but at least it's something.
aspell 中有“--run-together”选项。 Hunspell 无法检查驼峰式单词。
如果你读过aspell的代码,你会发现它的算法实际上并没有将camelcase单词拆分成子单词列表。也许这个算法更快,但它会错误地将包含两个字符子词的单词报告为拼写错误。不要浪费时间调整其他拼写选项。我尝试过,但他们没有用。
因此,我们遇到了两个问题:
aspell 将某些驼峰式单词报告为拼写错误
hunspell 将所有驼峰式单词报告为拼写错误
错误解决这两个问题的方法是在 Emacs Lisp 中编写我们自己的谓词。
下面是为 javascript 编写的示例谓词:
或者只使用我的新 pacakge https://github.com/redguardtoo/wucuo
There is "--run-together" option in aspell. Hunspell can't check camelcased word.
If you read the code of aspell, you will find its algorithm actually does not split camelcase word into a list of sub-words. Maybe this algorithm is faster, but it will wrongly report word containing two character sub-word as typo. Don't waste time to tweak other aspell options. I tried and they didn't work.
So we got two problems:
aspell reports SOME camelcased words as typos
hunspell reports ALL camelcased words as typos
Solution to solve BOTH problems is to write our own predicate in Emacs Lisp.
Here is a sample predicate written for javascript:
Or just use my new pacakge https://github.com/redguardtoo/wucuo
您应该解析驼峰式单词并拆分它们,然后检查每个单词的单独拼写,并考虑每个拼写错误标记的单个建议来组合建议。考虑到每个拼写错误的标记可能有多个建议,这对我来说听起来有点低效。
You should parse the camel cased words and split them, then check the individual spelling for each one and assemble a suggestion taking into account the single suggestion for each misspelled token. Considering that each misspelled token can have multiple suggestions this sounds a bit inefficient to me.