Bash 脚本对独特字母和重复字母对的频率分析我应该如何构建这个脚本?
好的,第一篇文章..
所以我有这个任务来手动解密密码,但我也想自动化这个过程一点,如果不是全部,至少有几个部分,所以我浏览了一下,发现了一些 sed 和 awk oneliners 来做有些事情我想做,但不是我想要/需要的全部。
有一些网站可以做我想做的事情,但由于某种原因我真的想在 bash 中做,只是因为我想更好地理解它,这样:)
该脚本将采用文件名作为参数并输出另一个文件例如完成后的 solution$1
。
if [ -e "$PWD/$1" ]; then
echo "$1 exists"
else
echo "$1 doesnt exists"
fi
将启动脚本以查看参数中的文件是否存在。
然后我发现这个班轮
sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c" ; done
工作正常,但我需要每个字母出现的次数,我真的不知道如何做到这一点。
这是我试图或多或少实现的目标 http://25yearsofprogramming.com/fun/ciphers.htm 用于计算唯一字母出现次数等。
然后我需要将所有字母都小写。
之后我看到脚本在做这些事情.. - 扫描字典文件以查找特定模式和单词大小的下标 字越大越好。 例如:假设解决方案是单词“apparel”,加密单词是“zxxzgvk” 是否有正则表达式方法来表达比较这两个单词并在字典文件中列出单词“apparel”的模式,因为“appa”和“zxxz”是相似的模式,并且“zxxzgvk”与“apparel”的长度相似
这可以部分完成吗?这样看待问题是否现实,还是牵强?
- 另一个下标,它从前一个输出单词中获取找到的字母并进行交换 密码中的字母。
交换的字母将变为大写,以便随着时间的推移进行区分。
然后我必须弄清楚如何继续重新扫描新找到的单词,看看它们是否部分或全部在字典文件中找到,然后交换更多字母。
过去有没有人见过这个问题,并尝试用文字模式来解决它 就像我所描述的那样,还是这太复杂了?
我应该记录任何掉期吗?
也许只是扫描所有加密的单词并在我进行时交换然后再进行一次扫描 在第一次扫描时有约束不改变大写字母(实际上是为了将它们用作更精确的模式..!)
有人用另一种语言做过一些类似的脚本/程序吗?如果有的话是哪一个?也许我可以以某种方式联系起来:)
也许我们可以利用您对您如何构思代码的见解。
我很乐意包含我已解码的密码和尚未解码的密码:)
再次强调,我的任务重点不是执行此脚本,而只是解析密码。但是编写脚本或者至少尝试看看如何执行此脚本确实可以帮助我更多地了解如何从代码角度进行思考。请随时为我指出正确的方向!
密码本身基于简单的字母替换。
我在这里做了一个pastebin,代码为:) http://pastebin.com/UEQDsbPk
在伪代码中我的看法是:
- 在参数中使用输入文件名和可选的第二个文件名(字典)调用程序
- 验证输入文件是否存在并且不为空
- 读取文件的内容并在屏幕上回显它
- 转换为小写
- 扫描文本并计算数量对每个字母进行频率分析
- 询问用户文本应该是什么语言(默认英语)
- 使用响应指定哪个字母频率用作基线
- 交换与大写频率分析相对应的字母..
- 打印更改的内容屏幕上的文档
- 要求用户交换加密文本中的字母(
- 如果用户提供了字典文件作为第二个参数)
- ,然后扫描密码中的单词并找到较大的单词,
- 在字典中查找具有相似模式的单词(某些字母重复字母)文件
- 在屏幕上列出结果如果有任何
- 提议交换密码中对应的字母
- 在屏幕上打印修改后的密码
- 再次要求交换字母或找到更多相似的单词
或多或少是我看到的脚本结构的方式。
- 你看到我应该补充什么吗?我错过了什么吗?
希望这次修改后的版本大家能够更加清晰!
Ok,first post..
So I have this assignment to decrypt cryptograms by hand,but I also wanted to automate the process a little if not all at least a few parts,so i browsed around and found some sed and awk one liners to do some things I wanted done,but not all i wanted/needed.
There are some websites that sort of do what I want, but I really want to just do it in bash for some reason,just because I want to understand it better and such :)
The script would take a filename as parameter and output another file such as solution$1
when done.
if [ -e "$PWD/$1" ]; then
echo "$1 exists"
else
echo "$1 doesnt exists"
fi
Would start the script to see if the file in param exists..
Then I found this one liner
sed -e "s/./\0\n/g" $1 | while read c;do echo -n "$c" ; done
Which works fine but I would need to have the number of occurences per letter, I really don't see how to do that.
Here is what I'm trying to achieve more or less http://25yearsofprogramming.com/fun/ciphers.htm for the counting unique letter occurences and such.
I then need to put all letters in lowercase.
After this I see the script doing theses things..
-a subscript that scans a dictionary file for certain pattern and size of words
the bigger words the better.
For example: let's say the solution is the word "apparel" and the crypted word is "zxxzgvk"
is there a regex way to express the pattern that compares those two words and lists the word "apparel" in a dictionnary file because "appa" and "zxxz" are similar patterns and "zxxzgvk" is of similar length with "apparel"
Can this be part done and is it realistic to view the problem like this or is this just far fetched ?
- Another subscript who takes the found letters from the previous output word and that swap
letters in the cryptogram.
- Another subscript who takes the found letters from the previous output word and that swap
The swapped letters will be in uppercase to differentiate them over time.
I'll have to figure out then how to proceed to maybe rescan the new found words to see if they're found in a dictionnary file partly or fully as well,then swap more letters or not.
Did anyone see this problem in the past and tried to solve it with the patterns in words
like i described it,or is this just too complex ?Should I log any of the swaps ?
Maybe just scan through all the crypted words and swap as I go along then do another sweep
with having for constraint in the first sweep to not change uppercase letters(actually to use them as more precise patterns..!)
Anyone did some similar script/program in another langage? If so which one? Maybe I can relate somehow :)
Maybe we can use your insight as to how you thought out your code.
I will happily include the cryptograms I have decoded and the one I have yet to decode :)
Again, the focus of my assignment is not to do this script but just to resolve the cryptograms. But doing scripts or at least trying to see how I would do this script does help me understand a little more how to think in terms of code. Feel free to point me in the right directions!
The cryptogram itself is based on simple alphabetic substitution.
I have done a pastebin here with the code to be :) http://pastebin.com/UEQDsbPk
In pseudocode the way I see it is :
- call program with an input filename in param and optionally a second filename(dictionary)
- verify the input file exists and isnt empty
- read the file's content and echo it on screen
- transform to lowercase
- scan through the text and count the amount of each letter to do a frequency analysis
- ask the user what langage is the text supposed to be (english default)
- use the response to specify which letter frequencies to use as a baseline
- swap letters corresponding to the frequency analysis in uppercase..
- print the changed document on screen
- ask the user to swap letters in the crypted text
- if user had given a dictionary file as the second argument
- then scan the cipher for words and find the bigger words
- find words with a similar pattern (some letters repeating letters) in the dictionary file
- list on screen the results if any
- offer to swap the letters corresponding in the cipher
- print modified cipher on screen
- ask again to swap letters or find more similar words
More or less it the way I see the script structured.
- Do you see anything that I should add,did i miss something?
I hope this revised version is more clear for everyone!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
坦白说,Tl,博士。对于我发现的唯一问题 - 答案是肯定的:) 请将其分成较小的任务,我们将很乐意为您提供帮助 - 如果您之前找不到这些较小问题的答案。
如果你能把它写成伪代码,那就更容易了。 unix 中有各种文本操作的东西。采用的方法取决于您的文本有多大。我相信它们没有那么大,否则你会使用一些编译语言。
例如,简单但昂贵的 gawk 方法来计算频率:
至于音译,有
tr
实用程序。您可以伪造然后将每种情况下的实际字符串传递给它(这对于类似凯撒的密码来说是正确的)。Tl,dr to be frank. To the only question i've found - the answer is yes:) Please split it to smaller tasks and we'll be happy to assist you - if you won't find the answer to these smaller questions before.
If you can put it out in pseudocode, it would be easier. There's all kinds of text-manipulating stuff in unix. The means to employ depend on how big are your texts. I believe they are not so big, or you would have used some compiled language.
For example the easy but costly gawk way to count frequences:
As for transliterating, there is
tr
utility. You can forge and then pass to it the actual strings in each case (that stands true for Caesar-like ciphers).例子:
Example: