密文字母频率替换:比较 2 个字典'按值字典键并更改文本
我看过类似的主题,但没有找到与我想要实现的目标完全匹配的解决方案。
我有一个密文,需要根据文本中每个字母出现的频率进行简单的字母替换。我已经有一个函数来标准化文本(小写,没有非字母字符,没有,计算字母出现次数,然后获取每个字母的相对频率。字母是字典中的键,频率是值。
我在单独的字典中也有 AZ 的预期字母频率(k=字母,v=频率),但我对下一步要做的
事情有点困惑,我需要做的是采用标准化的密文,预期的字母 freq dict [d1] 和密码字母 freq dict [d2] 并按如下方式迭代它们(部分伪代码):
for word in text:
for item in word:
for k,v in d2.items():
if d2[v] == d1[v]:
replace any instance of d2[k] with d1[k] in text
decoded_text=open('decoded_text.txt', 'w')
decoded_text.write(str('the decoded text')
在这里,我想获取文本并说“如果 d2 中的值与 d1 中的值匹配,则替换文本中带有 d1[k] 的 d2[k] 的任何实例”。
我意识到我一定在那里犯了一些基本的 python 逻辑错误(我在 Python 方面相对较新),但是我走在正确的轨道上吗?
谢谢提前
更新:
感谢您提供的所有有用的建议。我决定尝试 Karl Knechtel 的方法,并进行一些修改以适应我的代码,但是我仍然遇到问题(完全在我的实现中),
我已经做了一个解码函数。获取有问题的密文文件。这会调用之前创建的 count 函数,该函数返回一个字典(字母:频率为浮点数)。这意味着“make uppercase version”代码将不起作用,因为 k 和 v 不是浮点数并且不能将 .upper 作为属性。因此,调用此解码函数将返回密文字母频率,然后返回密文本身(仍已编码)。
def sorted_histogram(a_dict):
return [x[1] for x in sorted(a_dict.items(), key=itemgetter(1))]
def decode(filename):
text=open(filename).read()
cipher=text.lower()
cipher_dict=count(filename)
english_histogram = sorted_histogram(english_dict)
cipher_histogram = sorted_histogram(cipher_dict)
mapping = dict(zip(english_histogram, cipher_histogram)
translated = ''.join(
mapping.get(c, c)
for c in cipher
)
return translated
I've had a look at similar topics, but no solution I can find exactly compares to what I'm trying to achieve.
I have a cipher text that needs to undergo a simple letter substitution based on the frequency of each letter's occurrence in the text. I already have a function to normalise the text (lowercase, no none-letter characters, no , count letter occurrences and then get the relative frequency of each letter. The letter is the key in a dictionary, and the frequency is the value.
I also have the expected letter frequencies for A-Z in a separate dictionary (k=letter, v=frequency), but i'm a bit befuddled by what to do next.
What I think I need to do is to take the normalised cipher text, the expected letter freq dict [d1] and the cipher letter freq dict [d2] and iterate over them as follows (part psuedocode):
for word in text:
for item in word:
for k,v in d2.items():
if d2[v] == d1[v]:
replace any instance of d2[k] with d1[k] in text
decoded_text=open('decoded_text.txt', 'w')
decoded_text.write(str('the decoded text')
Here, I want to take text and say "if the value in d2 matches a value in d1, replace any instance of d2[k] with d1[k] in text".
I realise i must have made a fair few basic python logic errors there (I'm relatively new at Python), but am I on the right track?
Thanks in advance
Update:
Thank you for all the helpful suggestions. I decided to try Karl Knechtel's method, with a few alterations to fit in my code. However, i'm still having problems (entirely in my implementation)
I have made a decode function to take the ciphertext file in question. This calls the count function previously made, which returns a dictionary (letter:frequency as a float). This meant that the "make uppercase version" code wouldn't work, as k and v didn't were floats and couldn't take .upper as an attribute. So, calling this decode function returns the ciphertext letter frequencies, and then the ciphertext itself, still encoded.
def sorted_histogram(a_dict):
return [x[1] for x in sorted(a_dict.items(), key=itemgetter(1))]
def decode(filename):
text=open(filename).read()
cipher=text.lower()
cipher_dict=count(filename)
english_histogram = sorted_histogram(english_dict)
cipher_histogram = sorted_histogram(cipher_dict)
mapping = dict(zip(english_histogram, cipher_histogram)
translated = ''.join(
mapping.get(c, c)
for c in cipher
)
return translated
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您实际上并不想做您想做的事情,因为样本中字符的频率通常与参考数据中的确切频率分布不匹配。您真正想做的是找到最常见字符并将其替换为“e”,第二个最常见的字符并将其替换为“t”,依此类推。
所以我们要做的是:
(我假设你已经可以完成这部分)构建密文中实际字母频率的字典。
我们定义一个函数,它接受一个 {letter:Frequency} 字典并按频率顺序生成一个字母列表。
我们定义
我们在参考文献(即现在我们有一个最常见的英语字母的有序列表)和样本中(类似地)中按频率顺序获取字母。
假设样本中最常见的字母对应于英语中最常见的字母,依此类推:我们创建一个新字典,将第一个列表中的字母映射到第二个列表中的字母。 (我们还可以创建一个与
str.translate
一起使用的翻译表。)我们将制作同一词典的大写和小写版本(我假设您的原始词典只有小写)并将它们合并 假设样本中最常见的我们使用此映射来翻译密文,保留其他字符(空格、标点符号等)。
因此:
You don't really want to do what you're thinking of doing, because the frequencies of characters in the sample won't, in general, match the exact frequency distribution in the reference data. What you're really trying to do is find the most common character and replace it with 'e', the next most and replace it with 't', and so on.
So what we're going to do is the following:
(I assume you can already do this part) Construct a dictionary of actual letter frequency in the ciphertext.
We define a function that takes a {letter: frequency} dictionary and produces a list of the letters in order of frequency.
We get the letters, in order of frequency, in our reference (i.e., now we have an ordered list of the most common letters in English), and in the sample (similarly).
On the assumption that the most common letter in the sample corresponds to the most common letter in English, and so on: we create a new dictionary that maps letters from the first list into letters from the second list. (We could also create a translation table for use with
str.translate
.) We'll make uppercase and lowercase versions of the same dictionary (I'll assume your original dictionaries have only lowercase) and merge them together.We use this mapping to translate the cipher text, leaving other characters (spaces, punctuation, etc.) alone.
Thus:
用法:
Usage:
首先,请注意,频率不太可能为您提供完全匹配,除非您的消息很长。因此,您可能需要进行一些手动调整才能获得准确的消息。但是如果频率足够接近......
您可以获取两个字典(字母)的键,按它们的值(频率)排序:
然后将它们转换为字符串:
然后使用它们来翻译字符串:
First off, note that it's very unlikely that the frequencies will give you an exact match, unless your message is very long. So you might need to do some manual tweaking to get the exact message. But if the frequencies are close enough...
You could get the keys of both dictionaries (letters), sorted by their values (frequencies):
Then turn them into strings:
Then use them to translate the string: