自动检查文本中的单词拼写

发布于 2024-07-09 20:08:15 字数 1077 浏览 10 评论 0原文

[编辑]简而言之：您将如何编写自动拼写检查器？这个想法是，检查器从已知的良好来源（字典）构建单词列表，并在使用足够频繁时自动添加新单词。一段时间没有使用的词语应该被淘汰。因此，如果我删除包含“Mungrohyperiofier”的场景部分，检查器应该会记住它一段时间，并且当我输入“Mung”时在另一个场景中，它应该再次提供它。如果我几天不使用这个词，它就会忘记它。

同时，我想避免在字典中添加拼写错误。[/编辑]

我想为科幻故事编写一个文本编辑器。编辑器应该为当前故事中任何地方使用的任何单词提供单词补全。它只会提供故事的单个场景进行编辑（因此您可以轻松地移动场景）。

这意味着我有三个集合：

所有其他场景中所有单词的集合在
我开始编辑之前当前场景中的单词集合
当前编辑器中的单词集合

我需要存储这些集合因为每次从头开始构建列表的成本太高。我认为一个简单的纯文本文件，每行一个单词就足够了。

当用户编辑场景时，我们会遇到以下情况：

她删除了一个单词。当前场景中的其他任何地方都没有使用该词。
她键入一个新单词
她键入一个已存在的单词
她键入一个已存在但出现拼写错误的单词
她更正集合 #2 中的一个单词中的拼写错误。
她纠正了第 1 组中的一个单词中的拼写错误（即拼写错误也在其他地方）。
她删除了一个她打算再次使用的词。不过，删除后，该单词不再位于集合 #1 和 #3 中。

显而易见的策略是在保存场景时重建单词集，并从每个场景的单词列表文件构建集合#1。

所以我的问题是：是否有一个聪明的策略来保留不再使用的单词，但仍然能够逐步消除拼写错误？如果可能的话，这个策略应该在后台工作，而用户甚至不会注意到发生了什么（即我想避免必须抓住鼠标从菜单中选择“将单词添加到字典”）。

[编辑] 基于 grieve 的评论

原文

[EDIT]In Short: How would you write an automatic spell checker? The idea is that the checker builds a list of words from a known good source (a dictionary) and automatically adds new words when they are used often enough. Words which haven't been used a while should be phased out. So if I delete part of a scene which contains "Mungrohyperiofier", the checker should remember it for a while and when I type "Mung<Ctrl+Space>" in another scene, it should offer it again. If I don't use the word for, say, a few days, it should forget about it.

At the same time, I'd like to avoid adding typos to the dictionary.[/EDIT]

I want to write a text editor for SciFi stories. The editor should offer word completion for any word used anywhere in the current story. It will only offer a single scene of the story for editing (so you can easily move scenes around).

This means I have three sets:

The set of all words in all other scenes
The set of word in the current scene before I started editing it
The set of words in the current editor

I need to store the sets somewhere as it would be too expensive to build the list from scratch every time. I think a simple plain text file with one-word-per-line is enough for that.

As the user edits the scene, we have these situations:

She deletes a word. This word is not used anywhere else in the current scene.
She types a word which is new
She types a word which already exists
She types a word which already exists but makes a typo
She corrects a typo in a word which is in set #2.
She corrects a typo in a word which is in set #1 (i.e. the typo is elsewhere, too).
She deletes a word which she plans to use again. After the deletion, the word is no longer in the sets #1 and #3, though.

The obvious strategy would be to rebuilt the word sets when a scene is saved and build the set #1 from a word-list file per scene.

So my question is: Is there a clever strategy to keep words which aren't used anywhere anymore but still be able to phase out typos? If possible, this strategy should work in the background without the user even noticing what is going on (i.e. I want to avoid to have to grab the mouse to select "add word to dictionary" from the menu).

[EDIT] Based on a comment from grieve

分享到QQ

分享到微博