解密非 ASCII 格式的文本文件
我收到了一个加密文件,其中明文具有“常见(但目前不太常见)”格式。 ~80000 字节
它已使用我所描述的带有修改后的加密表的维吉尼亚密码进行了加密。密钥的一字节和明文的一字节映射到密文的一字节。 密钥串有一定的长度,因此用于加密的密钥字符在密钥串中循环。
该密钥仅包含字母数字字符。
到目前为止,我通过找到密文中重复三元组的起始位置的最小公约数来确定密钥长度为 30/60。维杰内尔的相当标准。
现在,我通过观察解密的字节是什么来猜测密钥的可能字符,并消除它们超出可接受范围的可能性(因为 32-126 是可见的,16-31 之间没有值等)< br> 这适用于第一部分,它有一个小密钥,并且明文是纯 ASCII。
当我尝试使用较大的文件和“新文件格式”时,它会拒绝所有可能的字符。
这消除了 ASCII、Ascii85、Base64、windows-1252、utf-7、QP 和 uuencode,因为它们都依赖于 ASCII 字符集。我还为 EBCDIC 和 ISO8859-1 制作了过滤器,拒绝所有密钥。 Utf-8 也失败了,因为没有密钥会使所有字节以 0,10,110,1110,11110,111110 或 1111110 开头。
我没有尝试过的其余字符编码我怀疑是 UTF-16,32,1,我'我不确定如何过滤。
我的问题是:
- 是否还有其他我忘记的字符编码?
- 是否有可能我过滤掉了太多并且应该允许一些超出范围的字符滑过?
- 文件格式是否意味着字符编码以外的其他含义?如果是这样,我该如何协调过滤器仍然破坏 ASCII 字符的情况?
- 如果文件格式意味着压缩或存档怎么办?
这是我使用的过滤代码,过滤器是可变的,具体取决于我要筛选的内容。
void guessCrypt(string fileName, int keyLength, int index)
{
byte[] file = cast(byte[])read(fileName);
foreach(key;ValidKeyChars)
{
bool work = true;
for(int x = index; x < file.length-10; x+=keyLength)
{
byte single = file[x];
int res = sdecrypt(single,key);
if ((res < 32 && res > 15) || res > 126) //FILTER - this one ASCII
{
work = false;
break;
}
}
if (work == true)
{
writefln("\nwork: %s",key);
}
}
}
I've been given an encrypted file where the plaintext has a "common (but not extremely common these days)" format. ~80000 bytes
It has been encrypted with what I would describe as a Vigenere cipher with modified encryption table. One byte of the key and one byte of the plaintext map to one byte of the ciphertext.
The keystring has a certain length, so the key character used for encryption cycles through the keystring.
The key contains only alphanumeric characters.
So far I've determined that the keylength is 30/60 by finding the least common divisor of the start position of repeated triples in the ciphertext. Pretty standard for Vigenere.
Now, I've been guessing at possible characters for the key by observing what the decrypted bytes would be and eliminating possibilities if they fall outside of acceptable ranges (since 32-126 are visible, no values between 16-31, etc.)
This worked for the first part, which had a small key and the plaintext was straight ASCII.
When I try it for the larger file and "new file format" it rejects all possible characters.
This eliminates ASCII, Ascii85, Base64, windows-1252, utf-7, QP and uuencode as they all rely on the ASCII character set. I've also made filters for EBCDIC and ISO8859-1 which rejected all keys. Utf-8 also failed because no key would make all bytes begin with either 0,10,110,1110,11110,111110 or 1111110.
The remaining character encodings I haven't tried that I suspect are UTF-16,32,1 which I'm unsure of how to filter.
My questions are:
- are there other character encodings that I am forgetting?
- Is it possible that I'm filtering out too much and that some out of bound characters should be allowed to slide through?
- Could file format mean something other than character encoding? if so, how do I reconcile that with the filter still breaking on ASCII characters?
- What if the file format means compressed or archived?
Here is the filtering code I use, the filter is variable depending on what I'm trying to sift out.
void guessCrypt(string fileName, int keyLength, int index)
{
byte[] file = cast(byte[])read(fileName);
foreach(key;ValidKeyChars)
{
bool work = true;
for(int x = index; x < file.length-10; x+=keyLength)
{
byte single = file[x];
int res = sdecrypt(single,key);
if ((res < 32 && res > 15) || res > 126) //FILTER - this one ASCII
{
work = false;
break;
}
}
if (work == true)
{
writefln("\nwork: %s",key);
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议甚至不要打扰现有的编码;将其视为替换密码的另一层,并根据字母频率计算出什么映射到什么。如果这些字符实际上是连续的,那只会有助于您的分析。一旦您知道它们真正映射到什么,您就可以从那里找到它实际上是什么字符集。
I would suggest not even bothering with existing encodings; treat it as another layer of a substitution cipher, and work out based on letter frequencies what maps to what. If the characters are, in fact, contiguous, that will only help your analysis. And once you know what they really map to, you can work from there to find what character set it, in fact, was.