字符串中的字符差异是否有正式的定义?如果有,如何计算?

发布于 2024-09-25 16:38:05 字数 889 浏览 0 评论 0原文

概述

我希望分析两个字符之间的差异,作为密码强度检查过程的一部分。

我将解释我想要实现的目标以及原因,并想知道我想要做的事情是否已正式定义以及是否有任何推荐的算法可以实现这一目标。

我想要做什么

在整个字符串中,我希望将当前字符与前一个字符进行比较,并确定它们的不同程度。

由于这与密码强度检查有关,字符串中一个字符与其前一个字符之间的差异可能被定义为通过知道字符 N - 1 可预测字符 N 的程度。对此可能有一个正式的定义,但我没有提供意识到的。

示例

密码 abc123 可能不如 azu590 安全。两者都包含三个字母,后跟三个数字,但前者的序列更容易预测。

我假设密码猜测者可能会尝试一些明显的序列,例如 abc123 会在 azu590 之前尝试。

考虑这些字符串中字符的十进制 ASCII 值,并考虑到 ba 相差 1,而 c 又与 a 相差 1 code>b,我们可以得出一个简单的差异计算。

忽略两个连续字符不属于同一字符类的情况,我们可以说 abc123 的整体字符间差异为 4,而 azu590 的整体差异为 25 + 5 + 4 + 9 = 43。

这存在吗?

可以定义字符串中字符间差异的概念,类似于两个字符串之间的编辑距离。我不知道这个概念是否被定义或者它可能被称为什么。它是否已定义?如果有,它叫什么?

我计算字符串中字符间差异的示例方法是一种简单而明显的方法。它可能有缺陷,也可能无效。是否有任何已知的算法可以有效地计算字符之间的差异?

Overview

I'm looking to analyse the difference between two characters as part of a password strength checking process.

I'll explain what I'm trying to achieve and why and would like to know if what I'm looking to do is formally defined and whether there are any recommended algorithms for achieving this.

What I'm looking to do

Across a whole string, I'm looking to compare the current character with the previous character and determine how different they are.

As this relates to password strength checking, the difference between one character and it's predecessor in a string might be defined as being how predictable character N is from knowing character N - 1. There might be a formal definition for this of which I'm not aware.

Example

A password of abc123 could be arguably less secure than azu590. Both contain three letters followed by three numbers, however in the case of the former the sequence is more predictable.

I'm assuming that a password guesser might try some obvious sequences such that abc123 would be tried much before azu590.

Considering the decimal ASCII values for the characters in these strings, and given that b is 1 different from a and c is 1 different again from b, we could derive a simplistic difference calculation.

Ignoring cases where two consecutive characters are not in the same character class, we could say that abc123 has an overall character to character difference of 4 whereas azu590 has a similar difference of 25 + 5 + 4 + 9 = 43.

Does this exist?

This notion of character to character difference across a string might be defined, similar to the Levenshtein distance between two strings. I don't know if this concept is defined or what it might be called. Is it defined and if so what is it called?

My example approach to calculating the character to character difference across a string is a simple and obvious approach. It may be flawed, it may be ineffective. Are there any known algorithms for calculating this character to character difference effectively?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

花开柳相依 2024-10-02 16:38:06

听起来您想要一个 马尔可夫链 密码模型。马尔可夫链具有多个状态以及状态之间转换的概率。在您的情况下,状态是允许的字符集中的字符,并且转换的概率与这两个字母连续出现的频率成正比。您可以通过查看现有文本(例如免费提供的单词列表或密码数据库)中的转换频率来构建马尔可夫链。

还可以使用此技术的变体(m 阶马尔可夫链),例如,您考虑前两个字符而不是仅一个字符。

创建模型后,您可以使用从模型生成密码的概率来衡量其强度。这是每个状态转换的概率的乘积。

It sounds like you want a Markov Chain model for passwords. A Markov Chain has a number of states and a probability of transitioning between the states. In your case the states are the characters in the allowed character set and the probability of a transition is proportional to the frequency that those two letters appear consecutively. You can construct the Markov Chain by looking at the frequency of the transitions in an existing text, for example a freely available word list or password database.

It is also possible to use variations on this technique (Markov chain of order m) where you for example consider the previous two characters instead of just one.

Once you have created the model you can use the probability of generating the password from the model as a measure of its strength. This is the product of the probabilities of each state transition.

深海夜未眠 2024-10-02 16:38:06

对于一般信号/时间序列数据,这称为自相关。
您可以尝试调整 Durbin–Watson 统计 并测试积极的自动人物之间的相关性。一种简单的方法可能是使用每个字符的 unicode 代码点,但我确信这还不够好。

For general signals/time-series data, this is known as Autocorrelation.
You could try adapting the Durbin–Watson statistic and test for positive auto-correlation between the characters. A naïve way may be to use the unicode code-points of each character, but I'm sure that will not be good enough.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文