如何衡量字符串的复杂度？

发布于 2024-11-08 17:53:41 字数 718 浏览 6 评论 0原文

我有一些长字符串（~ 1.000.000 个字符）。每个字符串仅包含定义字母表中的符号，例如

A = {1,2,3}

示例字符串

string S1 = "1111111111 ..."; //[meta complexity] = 0
string S2 = "1111222333 ..."; //[meta complexity] = 10
string S3 = "1213323133 ..."; //[meta complexity] = 100

Q 我可以使用什么样的度量来量化这些字符串的复杂性？我可以看到 S1 没有 S3 复杂，但如何从 .NET 以编程方式做到这一点？任何算法或指向工具/文献的点都将不胜感激。

编辑

我尝试了香农熵，但事实证明它对我来说并不是真正有用。我将为这些序列 AAABBBCCC 和 ABCABCABC 以及 ACCCBABAB 和 BBACCABAC 提供相同的 H 值强>

This is what I ended up doing

原文

I have a few long strings (~ 1.000.000 chars). Each string only contains symbols from the defined alphabet, for example

A = {1,2,3}

Sample strings

string S1 = "1111111111 ..."; //[meta complexity] = 0
string S2 = "1111222333 ..."; //[meta complexity] = 10
string S3 = "1213323133 ..."; //[meta complexity] = 100

Q What kind of measures can I use to quantify the complexity of these strings? I can see that S1 is less complex than S3, but how can I do that programmatically from .NET? Any algorithm or point to the tool/literature would be greatly appreciated.

Edit

I tried Shannon entropy, but it turned out that it is not really useful for me. I will have the same H value for these sequences AAABBBCCC and ABCABCABC and ACCCBABAB and BBACCABAC

This is what I ended up doing

分享到QQ

分享到微博