在 C# 中使用频率分析来解密文本。
我的任务是使用频率分析来解密文本文件。这不是一个为我做的问题,但我绝对不知道下一步该做什么。到目前为止,我从文件中读取文本并计算每个字母的频率。如果有人能指出我根据字母频率交换字母的正确方向,我将不胜感激。
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace freqanaly
{
class Program
{
static void Main()
{
string text = File.ReadAllText("c:\\task_2.txt");
char[,] message = new char[2,26];
Console.Write(text); int count = 0;
for (int x = 'A'; x <= 'Z'; x++)
{
message[0, count] = (char)x;
Console.WriteLine(message[0, count]);
count++;
}
foreach (char c in text)
{ count = 0;
for (int x = 'A'; x <= 'Z'; x++)
{
if (c == x)
{
message[1, count]++;
}
count++;
}
}
Console.ReadKey();
for (int x = 0; x <= 25; x++)
{
Console.Write(message[0, x]); Console.Write(" = "); Console.WriteLine((int)message[1, x]);
}
Console.ReadKey();
}
}
}
I've been tasked with decrypting a text file using frequency analysis. This isn't a do it for me question but i have absolutley no idea what to do next. What i have so far reads in the text from file and counts the frequency of each letter. If someone could point me in the right direction as to swapping letters depending on their frequency it would be much appreciated.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
namespace freqanaly
{
class Program
{
static void Main()
{
string text = File.ReadAllText("c:\\task_2.txt");
char[,] message = new char[2,26];
Console.Write(text); int count = 0;
for (int x = 'A'; x <= 'Z'; x++)
{
message[0, count] = (char)x;
Console.WriteLine(message[0, count]);
count++;
}
foreach (char c in text)
{ count = 0;
for (int x = 'A'; x <= 'Z'; x++)
{
if (c == x)
{
message[1, count]++;
}
count++;
}
}
Console.ReadKey();
for (int x = 0; x <= 25; x++)
{
Console.Write(message[0, x]); Console.Write(" = "); Console.WriteLine((int)message[1, x]);
}
Console.ReadKey();
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是加密数据,仅使用简单的替换密码(我假设)。请参阅编码/加密的定义。
http://www.perlmonks.org/index.pl?node_id=66249
无论如何,正如谢尔盖建议的那样,获取字母频率表并匹配频率。您必须考虑一些偏差,因为无法保证文档中的“A”精确度为 8.167%(也许在本文档中“A”的百分比为 8.78 或 7.65%)。另外,请务必对 A 的每次出现进行评估,并且不要区分“a”和“A”。这可以通过对角色进行简单的 ToUpper 或 ToLower 转换来处理;只要保持一致即可。
另外,当你开始接触不太常见但仍然流行的字母时,你将需要处理这个问题。 C、F、G、W 和 M 都在 2% +/- 标记周围,因此您需要处理解密的文本,直到字母适合单词,换句话说,在文档中该字符替换也会发生。这个概念类似于在 Suduko 矩阵中拟合数字。幸运的是,一旦你找到了一封信应该去的地方,它会级联整个文档,你可以开始看到解密的纯文本出现。例如,“(F)it”和“(W)it”都是有效单词,但如果您在替换“F”时在文档中看到“(F)hen”,则可以很好地猜测您应该用“W”代替这个字符。 (T)here 和 (W)here 是另一个示例,单词 ()hen 本身不会提供任何指导,因为 (W)hen 和 (T)hen 都是有效单词。在这里,您必须结合上下文线索来确定哪个词有意义。 “那么现在是我们发起攻击的好时机吗?”并不像“什么时候是开始攻击的好时机?”那么有意义。
所有这些都假设您使用单字母替换。多字母替换更加困难,您可能需要研究破解维吉尼亚密码示例以尝试找出解决此问题的方法。
我建议阅读 S. Singh 的《密码书》。这是一本非常有趣的读物,很容易理解所使用的历史密码以及它们是如何破解的。
<一href="http://www.google.com/products/catalog?q=the+code+book&rls=com.microsoft:en-us:IE-SearchBox&oe=&um =1&ie=UTF-8&tbm=shop&cid=5361323398438876518&sa=X&ei=hpR0T-HyObSK2QWvgvH-Dg&ved=0CFoQ8wIwBQ#" rel="nofollow">http://www.google.com/products/catalog?q=the+code+book&rls=com.microsoft:en-us:IE-SearchBox&oe=& ;um=1&ie=UTF-8&tbm=shop&cid=5361323398438876518&sa=X&ei=hpR0T-HyObSK2QWvgvH-Dg&ved=0CFoQ8wIwBQ#
This IS encrypted data, just using a simple subsitution cipher (I assume). See the definition of encoding/encrypting.
http://www.perlmonks.org/index.pl?node_id=66249
Regardless, as Sergey suggested, get a letter frequency table and match frequencies. You will have to take into account some deviation, since there is no guarantee there are exacltly 8.167% of 'A's in the document (perhaps in this document the percent of 'A's are 8.78 or 7.65%). Also, be sure to evaluate on every occurance of A, and not differentiate 'a' from 'A'. This can be handled with a simple ToUpper or ToLower transform on the character; just be consistant.
Also, when you start getting into less common, but still popular letters, you will need to handle that. C, F, G, W, and M are all around the 2% +/- mark, so you will need to play with the decrypted text till the letters fit in the word, and in other words within the document where this character substitution will also happen. This concept is similar to fitting numbers in a Suduko matrix. Luckily, once you find where a letter should go, it cascades through out the document and you can start to see the decrypted plain text emerge. As an example, '(F)it' and '(W)it' are both valid words, but if you see '(F)hen' in the document when you substitute a 'F', you can make a good guess that you should substitute this character with a 'W' instead. (T)here and (W)here is another example, and a word ()hen won't provide any guidance by itself, since both (W)hen and (T)hen are valid words. It is here you have to incorporate contextual clues as to which word makes sense. "Then is a good time to start our attack?" doesn't make as much sense as "When is a good time to start our attack?".
All of this is assusming you are using a monoalphebetic substitution. A polyalphebetic substitution is more difficult, and you may need to look into cracking the Vigenère cipher examples to try to figure out a way around this problem.
I suggest reading "The Code Book" by S. Singh. It is a very interesting read and easy to digest the historical ciphers used and how they were cracked.
http://www.google.com/products/catalog?q=the+code+book&rls=com.microsoft:en-us:IE-SearchBox&oe=&um=1&ie=UTF-8&tbm=shop&cid=5361323398438876518&sa=X&ei=hpR0T-HyObSK2QWvgvH-Dg&ved=0CFoQ8wIwBQ#
接下来,您应该获取一些公开可用的英语频率列表(例如,来自维基百科)并将您获得的实际频率表与它进行比较 - 以便找到字母的替代品。
Next you should grab some of publically available English frequency lists (from Wikipedia, for example) and compare the actual frequencies table you got with it - in order to find the replacements for letters.