连词的拼写建议

发布于 2024-08-11 05:30:14 字数 335 浏览 6 评论 0原文

我正在致力于为基于网络的所见即所得编辑器实现拼写检查功能。我目前正在使用 Damerau-Levenshtein 距离算法来生成拼写建议列表。这一切都很顺利，但我很好奇如何改进功能。

具体来说，我的实现当前不处理连接词。例如，我希望能够检测“areyou”并建议“are you”。我想我可以通过将可能连接的单词在可能看起来的片段上分开并测试两半来做到这一点。由于所有英语单词都必须至少有一个元音，我想我可以寻找元音来帮助我决定在哪里分解单词。

Damerau-Levenshtein 距离算法非常有用；很明显，其他人对此比我投入了更多的思考。我是否应该考虑使用类似的聪明算法来检测连体单词，或者我已经走在正确的轨道上了？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对你而言 2024-08-18 05:30:14

我想候选连词不会超过四十（40）个字符左右；大多数时候它会小于十 (10)。

考虑到体积小，这个伪代码怎么样？

if (is_spelled_wrong(word)):
    N = len(word)
    list_suggestions = []
    for i = 1 to N-1:
        wordA = word[0:i] // Pythonic 'slice' notation
        wordB = word[i+1:N]
        if (!is_spelled_wrong(wordA) && !is_spelled_wrong(wordB))
            list_suggestions.appened((wordA, wordB))

换句话说，只需扫描字符串以查找所有可能性。他们的数量很少。对于“areyou”，您将循环五 (5) 次。

I imagine the candidate conjoined word will not be longer than forty (40) characters or so; most of the time it will be less than ten (10).

Considering the small size, what about this pseudocode?

if (is_spelled_wrong(word)):
    N = len(word)
    list_suggestions = []
    for i = 1 to N-1:
        wordA = word[0:i] // Pythonic 'slice' notation
        wordB = word[i+1:N]
        if (!is_spelled_wrong(wordA) && !is_spelled_wrong(wordB))
            list_suggestions.appened((wordA, wordB))

In other words, just scan the string for all possibilities. There are a small number of them. In the case of "areyou", you would loop five (5) times.

回复收藏 0 原文