用python解决替换密码
我知道也有人问过类似的问题,但这是一个微不足道的案例。
给定一个用替换密码结尾编码的文本文件,我需要使用 python 对其进行解码。我没有得到任何正确解读单词的例子。关系是一对一的,大小写没有区别。此外,标点符号不会改变,空格也保留在原来的位置。我不需要代码方面的帮助,而是需要有关如何在代码中完成此操作的一般概念的帮助。我的主要方法包括:
- 通过首先解决 1、2 或 3 个字符的单词来缩小选择范围。
- 我可以使用不同大小的英语单词列表来进行比较。
- 我可以使用字母的频率分布。
有谁知道我可以采取的一般方法来做到这一点?
I know similar questions have been asked, but this is kind of a trivial case.
Given a text file endcoded with a substitution cipher, I need to decode it using python. I am not given any examples of correctly deciphered words. The relationship is 1-to-1 and case doesn't make a difference. Also, punctuation isn't changed and spaces are left where they are. I don't need help with the code as much as I need help with a general idea of how this could be done in code. My main approaches involve:
- Narrowing down the choices by first solving 1, 2 or 3 character words.
- I could use an list of English words of different sizes to compare.
- I could use frequency distributions of the letters.
Does anyone have an idea of a general approach I could take to do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我首先会得到一份英语单词列表以供参考。接下来构建可能的 2 和 3 个字母单词的列表。然后开始测试密码中的那些小单词。一旦你猜出一个小单词,就根据你的单词列表检查较大的单词。如果列表中的某些单词不再有可能的补全,那么您就走错了路。如果一个单词只有一种可能的完成方式,请接受它作为正确的并继续。最终,您要么会找到所有单词都在英语单词列表中的解决方案,要么会遇到某个单词没有解决方案的情况。
I would first get a list of English words for reference. Next construct a list of possible 2 and 3 letter words. Then just start testing those small words in your cipher. Once you guess at a small word, check the larger words against your word list. If some of the words no longer have possible completions in the list, you're on the wrong track. If a word only has one possible completion, accept it as correct and continue. Eventually, you'll either reach a solution where all words are in your English word list, or you'll reach a point where there is no solution for a word.
当Haley演讲全是乱码时,我写了这样的东西。但这并不是自动发生的。它根据 etaoinshrdlu(最常用的英语字母,从大到小排序)进行猜测,并让用户交互式地更改给定密文字母的含义。
因此,它会向您显示类似以下内容:
您将手动猜测每个数字代表什么字母,直到您获得清晰的内容。
这种方法的优点是可以容忍拼写错误。如果您的加密器出现任何错误(或在明文中使用字典中没有的任何单词),您可能会发现自己遇到了无法解决的难题。
也就是说,拼写检查器有大量的英语单词列表。我在 Debian 的 dictionaries-common 包中使用了我的 hangman 求解器。
I wrote something like this for when Haley's speech was all garbled. It wasn't automagic though; it made guesses based on etaoinshrdlu (the most frequently used letters in English, sorted most to least) and let the user interactively change the meaning of a given ciphertext letter.
So it would show you something like:
and you'd manually guess what letter each number represented until you had something legible.
The advantage of this approach is that it can tolerate typos. If your encryptor makes any errors (or uses any words not in your dictionary in the plaintext) you may find yourself with an unsolveable puzzle.
That said, spell checkers have great lists of English words. I used the one in Debian's dictionaries-common package for my hangman solver.
您可以尝试以下方法:
存储有效单词列表(在字典中)和您的语言的“正常”字母分布(在列表中)。
计算乱码文本中字母的分布。
将您的乱码分布与正常分布进行比较,并根据该分布重新排列您的文本。
重复:将所有 26 个字母的数组(排名)设置为浮点数 (rank('A')=rank('B')=...=rank('Z')=0.0)
检查生成的单词文本与字典中的单词相对应。如果字典中存在某个单词,则提高该单词字母的排名(例如:添加一个标准值,例如 1.0)。换句话说,计算分数(总排名和字典中单词数的函数)。
将文本保存到高分表中(如果分数足够高)。
如果所有单词都在字典中或者如果总排名足够高或者如果循环执行超过 10000 次,则结束。
如果不是,则随机选择两个字母并交换它们。但在分布偏差的情况下,排名高的字母被互换的机会应该较小。
重复。
结束:打印高分文本。
该过程类似于模拟退火
You could try this approach:
Store a list of valid words (in a dictionary) and a "normal" letter distibution for your language (in a list).
Calculate the distribution of the letters in the garbled text.
Compare your garbled distribution with the normal one and regarble your text according to that.
Repeat: Set an array (rank) from all 26 letters to float (rank('A')=rank('B')=...=rank('Z')=0.0)
Check the words in the produced text against words in the dictionary. If a word is in the dictionary, raise the rank of that word's letters (something like: add a standard value, say 1.0). In other words calculate Score (a function of total rank and number of words in dictionary).
Save text into High score table (if score high enough).
If all words are in the dictionary or if the total rank is high enough or if the loop was done more than 10000 times, End.
If not, choose randomly two letters and interchange them. But with a deviated distribution, letters with high rank should have less chances of being interchanged.
Repeat.
End: Print High score texts.
The procedure resembles Simulated Annealing