如何为 Hunspell 制作自定义词典

发布于 2024-12-06 15:16:17 字数 1558 浏览 1 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

橪书 2024-12-13 15:16:18

为您的语言创建您自己的单词列表和附加文件(如果不存在)。嗯,对于库拉索岛的母语 papiamentu 来说,这样的字典不存在。但我很难找到如何创建此类文件,因此我在这里记录它: http://www.suares.com/index.php?page_id=25&news_id=233

create your own word-list and affix file for your language, if that doesn't exist. Well, for papiamentu - Curaçao's native language - such dictionary doesn't exist. But I had a hard time finding out how to create such files, so I am documenting it here: http://www.suares.com/index.php?page_id=25&news_id=233

滥情空心 2024-12-13 15:16:18

我正在尝试做同样的事情,但还没有找到足够的信息来开始。

但是,您可能需要查看 hunspell - Hunspell 词典的格式并粘贴文件

更新

如果您使用 .NET,您可以下载 Hunspell .NET 端口。使用它也相当容易。

var bee = new Hunspell();
bee.Load("path_to_en_US.aff");
bee.Load("path_to_en_US.dic");
bee.Add("my_custom_word1");
bee.Add("my_custom_word2");
var suggestions = bee.Suggest("misspel_word");

I'm trying to do the same but haven't found enough information to begin yet.

However, you may want to look at hunspell - format of Hunspell dictionaries and affix files
.

UPDATE

If you are working with .NET, you can download Hunspell .NET port. Using it is fairly easy too.

var bee = new Hunspell();
bee.Load("path_to_en_US.aff");
bee.Load("path_to_en_US.dic");
bee.Add("my_custom_word1");
bee.Add("my_custom_word2");
var suggestions = bee.Suggest("misspel_word");
秋叶绚丽 2024-12-13 15:16:18

让 hunspell 工作的秘诀(至少对我来说)是找出它要搜索的属于我的位置,并将自定义词典放在那里。另请记住,词典采用特定格式,因此您需要遵守这些规则。

运行 hunspell -D 将会显示搜索路径。在 MacOS 上,我的目录包含 /Users/scott/Library/Spelling 所以我创建了该目录并将我的目录放在那里。假设您想将您的字典命名为 mydict,并且您的单词输入数据文件名为 dict.txt。我们将使用我刚刚展示的路径。

首先,复制默认的 .aff 文件。当您按上述方式运行 hunspell -D 时,您将看到它。对我来说,它位于 /Library/Spelling/en_US/ 中。那么

cp /Library/Spelling/en_US.aff /Users/scott/Library/Spelling/mydict.aff

,每次更新输入列表 (dict.txt) 时,请执行以下操作:

DICT=/Users/scott/Library/Spelling/mydict.dic
cd ~/doc/dict
cat dict.txt | sort | uniq > dict.in
wc -l dict.in > $DICT
cat dict.in >> $DICT
rm dict.in

要运行 hunspell,只需指定两个词典即可。所以对我来说,因为我想要一个拼写错误列表,所以我使用

hunspell -l -d scott,en_US <filename>

The secret to getting hunspell to work (at least for me) was to figure out the locations it would search that were owned by me, and put the custom dictionaries there. Also bear in mind that the dictionaries are in a specific format, so you need to obey those rules.

Running hunspell -D will show you the search path. On MacOS, mine includes /Users/scott/Library/Spelling so I created that directory and put mine there. Let's say you want to call your dictionary mydict and your input datafile of words is called dict.txt. We'll use the path I just showed.

First, copy the default .aff file. You will see it when you run hunspell -D as described above. For me, it's in /Library/Spelling/en_US/. So

cp /Library/Spelling/en_US.aff /Users/scott/Library/Spelling/mydict.aff

Then, every time you update your input list (dict.txt), do this:

DICT=/Users/scott/Library/Spelling/mydict.dic
cd ~/doc/dict
cat dict.txt | sort | uniq > dict.in
wc -l dict.in > $DICT
cat dict.in >> $DICT
rm dict.in

To run hunspell, just specify both dictionaries. So for me, because I want a list of misspellings, I use

hunspell -l -d scott,en_US <filename>
丑丑阿 2024-12-13 15:16:18

我也在实现这种类型的功能。创建带有关联词典的 Hunspell 对象后,您可以向其中添加单个单词。

但请记住,这些单词仅在 Hunspell 对象存在时才可用。每次访问新对象时,您都必须再次添加所有用户定义的单词。

I am implementing this type of feature as well. Once you've created the Hunspell object with an associated dictionary you can add individual words to it.

Keep in mind though that these words will only be available for as long as the Hunspell object is alive. Every time you access a new object you will have to add all the user defined words again.

掌心的温暖 2024-12-13 15:16:18

看一下 openoffice

http://www.openoffice.org/lingucomponent/

中的文档,特别是这个文档
http://www.openoffice.org/lingucomponent/dictionary.html

这是一个很好的开始观点

Have a look at the documentation in openoffice

http://www.openoffice.org/lingucomponent/

specially this document
http://www.openoffice.org/lingucomponent/dictionary.html

It's a good starting point

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文