当前位置：文江博客话题详情

PHP algorithm levenshtein-distance

在拼写错误的单词处放置点

发布于 2024-08-16 07:41:19 字数 746 浏览 10 评论 0 原文

我正在用 PHP 创建一个网络应用程序，人们可以尝试翻译他们在学校需要学习的单词。

例如，有人需要将荷兰语单词“weer”翻译为英语的“weather”，但不幸的是他输入了“whether”。因为他几乎打错了单词，所以我想再给他一次尝试，在他犯错的地方用点“.”：

Language A:   weer
Language B:   weather
Input:        whether
Output:       w..ther

或者，例如

Language A:   fout
Language B:   mistake
Input:        mitake
Output:       mi.take

或者：

Language A:   echt
Language B:   genuine
Input:        genuinely
Output:       genuinely (almost good, shorten the word a little bit)

但是，如果输入也不同与所需的翻译相差很大，我不想得到像 ........ 这样的输出，

我听说过 Levenshtein 距离，我认为我需要一种与该算法非常相似的算法，但是我不知道如何将点放在正确的位置，而不是回显要完成多少操作。

那么，如何在有人写错的地方返回带有点的拼错单词呢？

原文

I'm creating a web app in PHP where people can try to translate words they need to learn for school.

For example, someone needs to translate the Dutch word 'weer' to 'weather' in English, but unfortunately he types 'whether'. Because he almost typed the right word, I want to give him another try, with dots '.' on the places where he made a mistake:

Language A:   weer
Language B:   weather
Input:        whether
Output:       w..ther

Or, for example

Language A:   fout
Language B:   mistake
Input:        mitake
Output:       mi.take

Or:

Language A:   echt
Language B:   genuine
Input:        genuinely
Output:       genuinely (almost good, shorten the word a little bit)

But, if the input differs too much from the desired translation, I don't want to get output like ........

I heard about Levenshtein distance and I think I need an algorithm that is much like that one, but I don't know how to place dots on the right place instead of echoing how many operations are to be done.

So, how can I return the misspelled word with dots on the places where someone made a mistake?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

噩梦成真你也成魔 2024-08-23 07:41:19

首先，看看维基百科上的 Levenshtein 算法。
然后继续查看文章页面上的示例和结果矩阵d：

                *k*     *i*     *t*     *t*     *e*     *n*
        >0       1       2       3       4       5       6 
*s*      1      >1       2       3       4       5       6 
*i*      2       2      >1       2       3       4       5 
*t*      3       3       2      >1       2       3       4 
*t*      4       4       3       2      >1       2       3 
*i*      5       5       4       3       2      >2       3 
*n*      6       6       5       4       3       3      >2 
*g*      7       7       6       5       4       4      >3

距离位于矩阵的右下角，d[m, n]。但从那里开始
现在可以回溯到矩阵 d[1, 1] 左上角的最小步骤。您只需向左、向左或向上走每一步，以最小化路径为准。
在上面的示例中，您会找到标有“>”的路径标志：

  s i t t i n g      k i t t e n
0 1 1 1 1 2 2 3    0 1 1 1 1 2 2 3
  ^       ^   ^      ^       ^   ^
  changes in the distance, replace by dots

现在您可以在最小路径上找到距离发生变化的位置 d[i,j]（在上面的示例中用 ^ 标记），并且对于那些您在第一个（或第二个）单词中放置点的字母i（或 j 分别）。

结果：

  s i t t i n g      k i t t e n
  ^       ^   ^      ^       ^   ^
  . i t t . n .      . i t t . n .

First, take a look at the Levenshtein algorithm on wikipedia.
Then go on and look at the examples and the resulting matrix d on the article page:

                *k*     *i*     *t*     *t*     *e*     *n*
        >0       1       2       3       4       5       6 
*s*      1      >1       2       3       4       5       6 
*i*      2       2      >1       2       3       4       5 
*t*      3       3       2      >1       2       3       4 
*t*      4       4       3       2      >1       2       3 
*i*      5       5       4       3       2      >2       3 
*n*      6       6       5       4       3       3      >2 
*g*      7       7       6       5       4       4      >3

The distance is found on lower right corner of the matrix, d[m, n]. But from there it
is now possible to follow backtrack the minimum steps to the upper left of of the matrix, d[1, 1]. You just go left, up-left, or up at each step whichever minimizes the path.
In the example above, you'd find the path marked by the '>' signs:

  s i t t i n g      k i t t e n
0 1 1 1 1 2 2 3    0 1 1 1 1 2 2 3
  ^       ^   ^      ^       ^   ^
  changes in the distance, replace by dots

Now you can find on the minimum path at which locations d[i,j] the distance changes (marked by ^ in the example above), and for those letters you put in the first (or second) word a dot at position i (or j respectively).

Result:

  s i t t i n g      k i t t e n
  ^       ^   ^      ^       ^   ^
  . i t t . n .      . i t t . n .

回复收藏 0 原文

瀟灑尐姊 2024-08-23 07:41:19

您正在寻找的术语称为“编辑距离”。使用诸如 Levenshtein distance 之类的东西会告诉您将一个字符串转换为另一个字符串所需的操作数（插入、删除、替换等）。

以下是其他“编辑距离”算法的列表。

一旦您确定某个单词“足够接近”（即，它没有超过所需编辑的某个阈值），您就可以显示需要进行编辑的位置（通过显示点）。

那么你怎么知道把点放在哪里呢？

关于“Levenshtein 距离”的有趣之处在于，它使用 M x N 矩阵，每个轴上有一个单词（请参阅 Levenshtein 文章）。创建矩阵后，您可以确定哪些字母需要“额外编辑”才能正确。这就是你放置点的地方。如果信件需要“无需额外编辑”，您只需打印信件即可。很酷。

回复收藏 0 原文

指尖微凉心微凉 2024-08-23 07:41:19

我认为你需要做一个多步骤的过程，而不仅仅是 Levenshtein。首先，我会检查输入单词是否是目标单词的一种形式。这将抓住你的第三个例子，而且不用担心添加点。您还可以使用此步骤来捕获同义词。下一步是检查两个字符串的长度差异。

如果差异为 0，您可以进行字母比较以放置点。如果您不想显示所有点，那么您应该保留放置的点的数量，一旦超过限制，就会显示一些错误消息。（抱歉，这是不正确的）

如果差异显示输入较长，您需要检查要删除的字母才能解决问题。在这里，您可以使用 Levenshtein 来查看它们的距离，如果距离太远，则显示错误消息，如果在范围内，您将需要反向执行 Levenshtein 的步骤，并以某种方式标记更改。不确定您想如何表明一封信需要删除。

如果差异显示输入较短，您可以使用编辑距离来查看两个单词是否足够接近或显示错误。然后再次执行相反的步骤，添加用于插入的点和用于替换的点。

实际上，最后两个步骤可以合并为一个函数，该函数运行该算法并记住插入、删除或替换并相应地更改输出。

回复收藏 0 原文

~没有更多了~

关于作者

瘫痪情歌

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

在拼写错误的单词处放置点

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

那么你怎么知道把点放在哪里呢？

So how do you know where to put the dots?

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

在拼写错误的单词处放置点

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

那么你怎么知道把点放在哪里呢？

So how do you know where to put the dots?

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。