判断两个字符串在Javascript中是否相似?

发布于 2024-09-28 00:24:25 字数 181 浏览 3 评论 0原文

假设我有两个字符串,有什么方法可以检查它们是否至少 90% 相似?

var string1 = "theBoardmeetstoday,tomorrow51";
var string2 = "Board meets today, tomorrow";

谢谢,

泰根

Let's say I have two strings, is there any way to check if they are at least 90% similar?

var string1 = "theBoardmeetstoday,tomorrow51";
var string2 = "Board meets today, tomorrow";

Thanks,

Tegan

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

下雨或天晴 2024-10-05 00:24:25

Levenshtein distance 的 wikipedia 条目包含一个示例实现。

The wikipedia entry for Levenshtein distance includes a sample implementation.

从此见与不见 2024-10-05 00:24:25

jsdifflib 是 Python 优秀 difflib 库。

它有一个函数 ratio(),该函数“以 [0, 1] 范围内的浮点形式返回序列相似性的度量。

jsdifflib is a JavaScript port of Python's excellent difflib library.

It has a function ratio() which "return[s] a measure of the sequences’ similarity as a float in the range [0, 1]."

你穿错了嫁妆 2024-10-05 00:24:25

另请考虑骰子系数,它被认为“基本上更好”比 string-similarity github 存储库及其相应的 npm 模块

其文档中的用法:

var stringSimilarity = require('string-similarity');

var similarity = stringSimilarity.compareTwoStrings('healed', 'sealed'); 

var matches = stringSimilarity.findBestMatch('healed', ['edward', 'sealed', 'theatre']);

Also consider Dice's Coefficient which is considered "mostly better" than the Levenshtein distance by the creator of the string-similarity github repo and its corresponding npm module.

Usage from its docs:

var stringSimilarity = require('string-similarity');

var similarity = stringSimilarity.compareTwoStrings('healed', 'sealed'); 

var matches = stringSimilarity.findBestMatch('healed', ['edward', 'sealed', 'theatre']);
草莓味的萝莉 2024-10-05 00:24:25

String.levenshtein(MooTools 插件)

查看一下: http://mootools.net/forge/p/string_levenshtein

GitHub:https://github.com/thinkphp/String.levenshtein

这个方法计算两个字符串之间的编辑距离。在信息论和计算机科学中,编辑距离是衡量两个序列之间差异量(称为编辑距离)的度量。两个字符串之间的编辑距离由将一个字符串转换为另一个给定字符串所需的最小操作数给出,其中可能的操作是插入、删除或替换单个字符。

Levenshtein 距离算法已用于:

  • 拼写检查
  • 语音识别
  • DNA 分析
  • 抄袭检测

String.levenshtein (a plugin MooTools)

check it out: http://mootools.net/forge/p/string_levenshtein

GitHub: https://github.com/thinkphp/String.levenshtein

This method calculates Levenshtein distance between two strings. In information theory and computer science, the Levenshtein distance is a metric for measuring the amount of difference between two sequences (called edit distance). The Levenshtein distance between two strings is given by minimum number of operations needed to transform one string into another given string, where possible operations are insertion, deletion, or substitution of a single character.

The Levenshtein distance algorithm has been used in:

  • Spell checking
  • Speech recognition
  • DNA analysis
  • plagiarism detection
辞慾 2024-10-05 00:24:25

借助其他人的答案,我编写了一个简单的 js 函数 stringsAreSimilar 来执行此操作:

// https://github.com/thinkphp/String.levenshtein/blob/master/Source/String.levenshtein.js

function getStringDifference(stringA, stringB) {
  var cost = [],
    str1 = stringA,
    str2 = stringB,
    n = str1.length,
    m = str2.length,
    i, j;

  var minimum = function (a, b, c) {
    var min = a;
    if (b < min) {
      min = b;
    }
    if (c < min) {
      min = c;
    }
    return min;
  };

  if (n == 0) {
    return;
  }
  if (m == 0) {
    return;
  }

  for (var i = 0; i <= n; i++) {
    cost[i] = [];
  }

  for (i = 0; i <= n; i++) {
    cost[i][0] = i;
  }

  for (j = 0; j <= m; j++) {
    cost[0][j] = j;
  }

  for (i = 1; i <= n; i++) {

    var x = str1.charAt(i - 1);

    for (j = 1; j <= m; j++) {

      var y = str2.charAt(j - 1);

      if (x == y) {

        cost[i][j] = cost[i - 1][j - 1];

      } else {

        cost[i][j] = 1 + minimum(cost[i - 1][j - 1], cost[i][j - 1], cost[i - 1][j]);
      }

    } //endfor

  } //endfor

  return cost[n][m];
}

function stringsAreSimilar(stringA, stringB) {
  var difference = getStringDifference(stringA, stringB);
  debugConsoleLog("stringA" + stringA);
  debugConsoleLog("stringB" + stringB);
  debugConsoleLog("difference" + difference);
  
  return difference < 10;
}

var string1 = "theBoardmeetstoday,tomorrow51";
var string2 = "Board meets today, tomorrow";

if(similar) {
    console.log("they are similar");
} else {
    console.log("they are not similar");
}


piggybacking on other people's answers, I wrote a simple js function stringsAreSimilar to do this:

// https://github.com/thinkphp/String.levenshtein/blob/master/Source/String.levenshtein.js

function getStringDifference(stringA, stringB) {
  var cost = [],
    str1 = stringA,
    str2 = stringB,
    n = str1.length,
    m = str2.length,
    i, j;

  var minimum = function (a, b, c) {
    var min = a;
    if (b < min) {
      min = b;
    }
    if (c < min) {
      min = c;
    }
    return min;
  };

  if (n == 0) {
    return;
  }
  if (m == 0) {
    return;
  }

  for (var i = 0; i <= n; i++) {
    cost[i] = [];
  }

  for (i = 0; i <= n; i++) {
    cost[i][0] = i;
  }

  for (j = 0; j <= m; j++) {
    cost[0][j] = j;
  }

  for (i = 1; i <= n; i++) {

    var x = str1.charAt(i - 1);

    for (j = 1; j <= m; j++) {

      var y = str2.charAt(j - 1);

      if (x == y) {

        cost[i][j] = cost[i - 1][j - 1];

      } else {

        cost[i][j] = 1 + minimum(cost[i - 1][j - 1], cost[i][j - 1], cost[i - 1][j]);
      }

    } //endfor

  } //endfor

  return cost[n][m];
}

function stringsAreSimilar(stringA, stringB) {
  var difference = getStringDifference(stringA, stringB);
  debugConsoleLog("stringA" + stringA);
  debugConsoleLog("stringB" + stringB);
  debugConsoleLog("difference" + difference);
  
  return difference < 10;
}

var string1 = "theBoardmeetstoday,tomorrow51";
var string2 = "Board meets today, tomorrow";

if(similar) {
    console.log("they are similar");
} else {
    console.log("they are not similar");
}


决绝 2024-10-05 00:24:25

所以去年我就想这么做。我读到 Levenshtein distance 是 OP 在此询问的解决方案。

我看到了一个线程中的代码,但没有得到太多的支持。也许是因为它的时间复杂度(在 for 循环中有一个 for 循环)。我尝试过,它似乎对我有用。该函数返回一个分数,其中 0 表示非常匹配,另一方面,较高的分数意味着偏离接近的匹配。我将分享,也许有人可以扩展它或更好地解释它:

function editDistance(s1, s2) {
    s1 = s1.toLowerCase();
    s2 = s2.toLowerCase();

    var costs = new Array();
    for (var i = 0; i <= s1.length; i++) {
        var lastValue = i;
        for (var j = 0; j <= s2.length; j++) {
            if (i == 0) costs[j] = j;
            else {
                if (j > 0) {
                    var newValue = costs[j - 1];
                    if (s1.charAt(i - 1) != s2.charAt(j - 1))
                        newValue =
                            Math.min(Math.min(newValue, lastValue), costs[j]) +
                            1;
                    costs[j - 1] = lastValue;
                    lastValue = newValue;
                }
            }
        }
        if (i > 0) costs[s2.length] = lastValue;
    }
    return costs[s2.length];
}

祝你好运并分享你对此的想法 - 那就太棒了!干杯!

So I was trying to do this last year. I read that Levenshtein distance is the solution for this purpose the OP asked here.

I saw a code from a thread and was not given much upvotes. Maybe because of its time complexity (having a forloop inside a forloop). I tried and it seem to work for me. The function returns a score wherein 0 is a great match on the other hand a higher score means deviating from a close match. Ill share and maybe someone can extend it or explain it better:

function editDistance(s1, s2) {
    s1 = s1.toLowerCase();
    s2 = s2.toLowerCase();

    var costs = new Array();
    for (var i = 0; i <= s1.length; i++) {
        var lastValue = i;
        for (var j = 0; j <= s2.length; j++) {
            if (i == 0) costs[j] = j;
            else {
                if (j > 0) {
                    var newValue = costs[j - 1];
                    if (s1.charAt(i - 1) != s2.charAt(j - 1))
                        newValue =
                            Math.min(Math.min(newValue, lastValue), costs[j]) +
                            1;
                    costs[j - 1] = lastValue;
                    lastValue = newValue;
                }
            }
        }
        if (i > 0) costs[s2.length] = lastValue;
    }
    return costs[s2.length];
}

Good luck and share your thoughts on this - would be awesome! Cheers!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文