如何在十六进制转储上查找重复出现的模式？

发布于 2024-10-07 03:15:24 字数 466 浏览 0 评论 0原文

我需要从十六进制转储输出中找到重复出现的模式。我的输出文件中的每一行都类似于：

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

其中 00 是十六进制的字节。

这些图案的长度不固定，但它们总是排成一行。

我有一个关于如何做到这一点的想法，但我想知道您认为最有效的方法是什么，比如是否有某种我不知道的已知算法。

我也想用 Python 编写这个代码。

任何建议都非常感谢:)

谢谢

编辑： 我需要在磁盘转储中找到分区引导扇区。问题是该文件系统不常见，因此我需要扫描十六进制转储以查找经常使用的模式，以限制研究领域。

例如，我正在寻找如下字节模式：

00 56 f0 43 d0

原文

I need to find recurring patterns from an hexdump output.
Every line in my output file is something like:

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Where 00 is a byte in hexadecimal.

The patterns aren't of fixed length but they always lie in one line.

I have an idea on how to do this but I'd like to know what would be the most efficent method in your opinion, like if there is some sort of known algorhitm I am unaware of.

Also I'd like to code this in Python.

Any suggestion is grealty appreciated :)

Thanks

EDIT:
I need to find partition boot sectors in a disk dump. The problem is that the filesystem is uncommon so I need to scan the hexdump to find pattern frequently used in order to restrict the area of research.

For example I am looking for byte-patterns like:

00 56 f0 43 d0

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

躲猫猫 2024-10-14 03:15:24

您是否知道要搜索的子字符串，或者是否需要首先发现一组查询子字符串并不明显。我认为可以通过查找频繁出现的 n 元语法来实现这一发现。一旦您有了一组查询子字符串，您就可以继续了解它们所在的位置以及它们之间的距离（例如，如果某个子字符串每 1024 个字节出现一次，则可能是一个块大小）。

第 1 步：读取 hexdump 文件并将其转换回单个字符串。我将把细节留给你。

步骤 2：对于每个有趣的 n 值（例如 3、4、5（如您的示例）、6 等），请使用此函数：

from collections import Counter # needs 2.7
from operator import itemgetter
def get_ngrams(strg, n, top=10, min_count=2):
    counter = Counter()
    for i in xrange(len(strg) - n + 1):
        gram = strg[i:i+n]
        counter[gram] += 1
    sort_these = [(gram, count) for gram, count in counter.iteritems() if count >= min_count]
    best = sorted(sort_these, key=itemgetter(1), reverse=True)[:top]
    return best

这将为您提供最频繁出现的子字符串。

步骤 3：这些字符串出现的位置：

def multifind(strg, gram):
    positions = []
    end = len(strg)
    pos = 0
    while pos < end:
        pos = strg.find(gram, pos)
        if pos == -1:
            break
        positions.append(pos)
        pos += 1
    return positions

步骤 4：这些字符串出现的间隔有多远：

deltas = [b - a for a, b in zip(positions, positions[1:])]

It is not apparent whether you know the substrings that you want to search for, or whether you need to discover a set of query substrings first. I think that discovery can be achieved by finding frequently occurring n-grams. One you have your set of query substrings, you can proceed to where they are, and how far apart they are (e.g. if some substring occurs every 1024 bytes, that may be a block size).

Step 1: read your hexdump file and convert it back to a single string. I'll leave the details up to you.

Step 2: for each interesting value of n (say 3, 4, 5 (like your example), 6, etc) use this function:

from collections import Counter # needs 2.7
from operator import itemgetter
def get_ngrams(strg, n, top=10, min_count=2):
    counter = Counter()
    for i in xrange(len(strg) - n + 1):
        gram = strg[i:i+n]
        counter[gram] += 1
    sort_these = [(gram, count) for gram, count in counter.iteritems() if count >= min_count]
    best = sorted(sort_these, key=itemgetter(1), reverse=True)[:top]
    return best

That will give you the most frequent occurring substrings.

Step 3: where those strings occur:

def multifind(strg, gram):
    positions = []
    end = len(strg)
    pos = 0
    while pos < end:
        pos = strg.find(gram, pos)
        if pos == -1:
            break
        positions.append(pos)
        pos += 1
    return positions

Step 4: how far apart those occurrences are:

deltas = [b - a for a, b in zip(positions, positions[1:])]

回复收藏 0 原文

~没有更多了~

关于作者

思慕

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

如何在十六进制转储上查找重复出现的模式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

13886483628

流年已逝

℡寂寞咖啡

笑看君怀她人

wkeithbarry

素手挽清风

友情链接

如何在十六进制转储上查找重复出现的模式？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

13886483628

流年已逝

℡寂寞咖啡

笑看君怀她人

wkeithbarry

素手挽清风

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。