返回介绍

Adding custom dictionaries

发布于 2019-05-06 06:50:17 字数 4008 浏览 1103 评论 0 收藏 0

Configuring the custom dictionary feature

Additional configuration to your application.conf file is required. (Don’t forget to restart the Java application server after updating the configuration.)

Adding the ephox.spelling.custom-dictionaries-path element activates the custom dictionary feature. It points to a directory on the servier’s file system that will contain custom dictionary files and should not contain anything else. It is a good idea to store these files where the application.conf file lives, i.e. if application.conf is in a directory called /opt/ephox, the dictionary files could live in a sub-directory /opt/ephox/dictionaries.

Example:

ephox {
  spelling {
    custom-dictionaries-path = "/opt/ephox/dictionaries"
  }
}

Creating custom dictionary files

One custom dictionary can be created for each language supported by the spell checker (see supported languages), as well as an additional “global” dictionary that contains words that are valid across all languages, such as trademarks.

A dictionary file for a particular language must be named with the language code of the language (see supported languages for language codes), plus the suffix .txt: E.g. en.txt, en_gb.txt, fr.txt, de.txt etc.

The “global” dictionary file for language-independent words must be called “global.txt”.

The server will scan the dictionary directory as per configuration above and pick up “txt”-files for each language and the global file as present.

Custom dictionary file format

A dictionary file must be a simple text file with:

  • one word on each line,
  • either Windows-style or Linux-style line endings (CR or CR+LF)
  • no comments or blank lines, and
  • saved in UTF-8 encoding, with or without BOM (byte-order mark).

The last point is important for files created or edited on non-Linux (Windows or Mac) systems, as these will usually encode text files differently. However, Windows or Mac editors such as Windows Notepad can optionally save files in UTF-8 if asked to do so. Please check your editor of choice for this functionality. Failure to chose the correct encoding will result in problems with non-English letters such as umlauts and accents.

NOTE for German and Finnish languages: Spell checking in German and Finnish will employ compound word spell checking. Compound words such as “Fußballtennis” will be assumed correct as long as the root words “Fußball” and “Tennis” are individually present in the dictionary. It is not necessary to add “Fußballtennis” separately.

Verifying custom dictionary functionality

If successfully configured, the custom dictionary feature will report dictionaries found in the application server’s log at service startup.

Example:

2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - Starting task (booting Ironbark)
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: [global] = 1 words
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: "en" = 3 words
2017-06-12 17:46:00 [main] INFO  com.ephox.ironbark.IronbarkBoot - using custom dictionary: "fr" = 2 words
2017-06-12 17:46:01 [main] INFO  com.ephox.ironbark.IronbarkBoot - Finished task (booting Ironbark)

The above log shows that 3 custom dictionaries were found, one “global”, language-independent one and one each for English and French. They were found to contain 1, 3 and 2 words, respectively. Please check that this report matches your expectations.

Ongoing dictionary maintenance

Future additions/changes to dictionaries after the initial deployment will require a restart of the spell check service each time.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文