使用 Damerau Levenshtein 算法进行抄袭检测

发布于 2024-08-07 18:42:58 字数 50 浏览 10 评论 0原文

我将如何模拟damerau leveshtein距离算法以检测文档中的抄袭行为?谢谢!

how will i simulate the damerau leveshtein distance algorithm so as to detect plagiarism in documents? thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

氛圍 2024-08-14 18:42:58

编辑距离主要用于比较两个字符串,例如比较名称或在拼写检查器中查找替代项。对整个文档使用此算法来检测抄袭并不常见。

不过该地区还有一些工作。一切都指向这篇文章,需要订阅:

使用 Levenshtein 距离和 Smith-Waterman 算法进行剽窃检测

http://www.computer.org/portal/web/csdl/doi/10.1109/ICICIC.2008.422

文本抄袭是学术界日益关注的问题。现在,最常见的文本剽窃是通过进行各种微小的修改而发生的,包括插入、删除或替换单词。然而,这种简单的更改需要大量的字符串比较。在本文中,我们提出了一种混合抄袭检测方法。我们研究了从 Levenshtein 距离导出的对角线的使用,以及简化的 SmithWaterman 算法,这是识别和量化生物序列局部相似性的经典工具,以期在抄袭检测中应用。我们的方法避免了全局涉及的字符串比较,并考虑了心理因素,这可以通过实验结果显着加快速度。基于结果,我们使用 Levenshtein 距离和 Smith-Waterman 算法表明了这种改进的实用性,并说明了效率增益。将来,在文本比较领域探索适当的启发式方法将会很有趣

Levenshtein distance is primarily used to compare two strings, such as comparing names or finding alternates in a spell checker. Using this algorithm for a whole document to detect plagiarism is not typical.

There is some work in the area though. Everything points to this article, which requires subscription:

Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

http://www.computer.org/portal/web/csdl/doi/10.1109/ICICIC.2008.422

Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified SmithWaterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文