如何在十六进制转储上查找重复出现的模式?
我需要从十六进制转储输出中找到重复出现的模式。 我的输出文件中的每一行都类似于:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
其中 00
是十六进制的字节。
这些图案的长度不固定,但它们总是排成一行。
我有一个关于如何做到这一点的想法,但我想知道您认为最有效的方法是什么,比如是否有某种我不知道的已知算法。
我也想用 Python 编写这个代码。
任何建议都非常感谢:)
谢谢
编辑: 我需要在磁盘转储中找到分区引导扇区。问题是该文件系统不常见,因此我需要扫描十六进制转储以查找经常使用的模式,以限制研究领域。
例如,我正在寻找如下字节模式:
00 56 f0 43 d0
I need to find recurring patterns from an hexdump output.
Every line in my output file is something like:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Where 00
is a byte in hexadecimal.
The patterns aren't of fixed length but they always lie in one line.
I have an idea on how to do this but I'd like to know what would be the most efficent method in your opinion, like if there is some sort of known algorhitm I am unaware of.
Also I'd like to code this in Python.
Any suggestion is grealty appreciated :)
Thanks
EDIT:
I need to find partition boot sectors in a disk dump. The problem is that the filesystem is uncommon so I need to scan the hexdump to find pattern frequently used in order to restrict the area of research.
For example I am looking for byte-patterns like:
00 56 f0 43 d0
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您是否知道要搜索的子字符串,或者是否需要首先发现一组查询子字符串并不明显。我认为可以通过查找频繁出现的 n 元语法来实现这一发现。一旦您有了一组查询子字符串,您就可以继续了解它们所在的位置以及它们之间的距离(例如,如果某个子字符串每 1024 个字节出现一次,则可能是一个块大小)。
第 1 步:读取 hexdump 文件并将其转换回单个字符串。我将把细节留给你。
步骤 2:对于每个有趣的 n 值(例如 3、4、5(如您的示例)、6 等),请使用此函数:
这将为您提供最频繁出现的子字符串。
步骤 3:这些字符串出现的位置:
步骤 4:这些字符串出现的间隔有多远:
It is not apparent whether you know the substrings that you want to search for, or whether you need to discover a set of query substrings first. I think that discovery can be achieved by finding frequently occurring n-grams. One you have your set of query substrings, you can proceed to where they are, and how far apart they are (e.g. if some substring occurs every 1024 bytes, that may be a block size).
Step 1: read your hexdump file and convert it back to a single string. I'll leave the details up to you.
Step 2: for each interesting value of n (say 3, 4, 5 (like your example), 6, etc) use this function:
That will give you the most frequent occurring substrings.
Step 3: where those strings occur:
Step 4: how far apart those occurrences are: