如何对算法进行逆向工程？

发布于 2024-10-02 09:01:27 字数 392 浏览 3 评论 0原文

我想知道如何反转一种算法，例如用于存储登录名或个人识别码的算法。

假设我有大量数据，其中：

7262627 -> ? -> 8172

5353773 -> ? -> 1132

等等。这只是一个例子。或者说将一个十六进制字符串转换为另一个字符串。

<代码>&h8712-> &h1283 或类似的东西。

我该如何开始弄清楚该算法是什么？一个人从哪里开始？

你会开始尝试不同的转变、异或并希望有什么脱颖而出吗？我确信有更好的方法，因为这看起来就像在黑暗中刺伤。

是否有可能对这种算法进行逆向工程？

抱歉，如果这是一个愚蠢的问题。感谢您的帮助/指点。

原文

I'm wondering how does one go about reversing an algorithm such as one for storing logins or pin codes.

Lets say I have an amount of data where:

7262627 -> ? -> 8172

5353773 -> ? -> 1132

etc. This is just an example. Or say a hex string that is tansformed into another.

&h8712 -> &h1283 or something like that.

How do I go about starting to figure out what that algorithm is? Where does one start?

Would you start trying different shifts, xors and hope something stands out? I'm sure there's a better way as this seems like stabbing in the dark.

Is it even practically possible to reverse engineer this kind of algorithm?

Sorry if this is a stupid question. Thanks for your help / pointers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

请恋爱 2024-10-09 09:01:28

人们会尝试一些事情：

获取源代码，或者反汇编可执行文件。
根据其他人使用的哈希函数进行猜测。例如，由 32 个十六进制数字组成的哈希很可能是 MD5 的一次或多次重复，如果您可以获得单个输入/输出对，那么很容易确认或反驳这一点（尽管请参阅下面的“盐”）。
对大量输入和输出对进行统计分析，寻找任何类型的模式或相关性，并将这些相关性与已知散列函数的属性和/或系统设计者可能使用的可能操作相关联。这超出了单一技术的范围，并且进入了一般密码分析的领域。
询问作者。安全系统通常不依赖于它们所使用的哈希算法的保密性（即使这样做，通常也不会长时间保持安全）。不过，您给出的示例非常小，并且密码的安全散列总是涉及盐，而您的显然没有。所以我们可能不会谈论作者有信心做到这一点的那种系统。

对于输出仅为 4 位十进制数字的散列，您可以简单地通过构建每个可能的 7 位输入及其散列值的表来攻击它。然后，您可以反转该表并进行（一对多）去哈希操作。您永远不需要知道哈希值的实际计算方式。如何获得输入/输出对？好吧，如果局外人可以以某种方式指定要散列的值，并查看结果，那么您就拥有了所谓的“选择明文”，而依赖于此的攻击就是“选择明文攻击”。所以 7 位数字 ->如果 4 位哈希的使用方式允许选定的明文攻击生成大量输入/输出对，那么 4 位哈希确实会非常弱。我意识到这只是一个例子，但这也是逆转它的技术的一个例子。

请注意，对哈希进行逆向工程和实际反转它是两件不同的事情。您可能会发现我正在使用 SHA-256，但这不会帮助您反转它（即，给定输出，计算出输入值）。没有人知道如何完全反转 SHA-256，尽管当然总是有彩虹表（参见上面的“盐”）<阴谋>至少没有人承认他们这样做，所以这对你来说没有用或者我。

There are a few things people try:

Get the source code, or disassemble an executable.
Guess, based on the hash functions other people use. For example, a hash consisting of 32 hex digits might well be one or more repetitions of MD5, and if you can get a single input/output pair then it is quite easy to confirm or refute this (although see "salt", below).
Statistically analyze a large number of pairs of inputs and outputs, looking for any kind of pattern or correlations, and relate those correlations to properties of known hash functions and/or possible operations that the designer of the system might have used. This is beyond the scope of a single technique, and into the realms of general cryptanalysis.
Ask the author. Secure systems don't usually rely on the secrecy of the hash algorithms they use (and don't usually stay secure long if they do). The examples you give are quite small, though, and secure hashing of passwords would always involve a salt, which yours apparently don't. So we might not be talking about the kind of system where the author is confident to do that.

In the case of a hash where the output is only 4 decimal digits, you can attack it simply by building a table of every possible 7 digit input, together with its hashed value. You can then reverse the table and you have your (one-to-many) de-hashing operation. You never need to know how the hash is actually calculated. How do you get the input/output pairs? Well, if an outsider can somehow specify a value to be hashed, and see the result, then you have what's called a "chosen plaintext", and an attack relying on that is a "chosen plaintext attack". So a 7 digit -> 4 digit hash would be very weak indeed if it was used in a way which allowed chosen plaintext attacks to generate a lot of input/output pairs. I realise that's just one example, but it's also just one example of a technique to reverse it.

Note that reverse engineering the hash, and actually reversing it, are two different things. You could figure out that I'm using SHA-256, but that wouldn't help you reverse it (i.e., given an output, work out the input value). Nobody knows how to fully reverse SHA-256, although of course there are always rainbow tables (see "salt", above) <conspiracy>At least nobody admits they do, so it's no use to you or me.</conspiracy>

回复收藏 0 原文

淡写薰衣草的香 2024-10-09 09:01:28

也许，你不能。假设变换函数是已知的，类似于

function hash(text):
    return sha1("secret salt"+text)

但“秘密盐”未知，并且在密码学上是强大的（一个非常大的随机整数）。即使是大量的明文、密文对，你也永远无法暴力破解秘密盐。

事实上，如果已知所使用的精确哈希函数是两个同样强大的函数之一，那么您甚至无法很好地猜测正在使用哪一个函数。

Probably, you can't. Suppose the transformation function is known, something like

function hash(text):
    return sha1("secret salt"+text)

But the "secret salt" is not known, and is cryptographically strong (a very large, random integer). You could never brute force the secret salt from even a very large number of plain-text, crypttext pairs.

In fact, if the precise hash function used was known to be one of two equally strong functions, you could never even get a good guess between which one was being used.

回复收藏 0 原文

得不到的就毁灭 2024-10-09 09:01:28

在黑暗中刺伤会让你发疯。有一些算法，根据目前的理解，你不能希望在不知道确切细节（可能包括私钥或内部状态）。当然，其中一些算法是现代密码学的基础。

如果您事先知道有一种模式需要被发现，那么有时有一些方法可以解决这个问题。例如，如果数据集包含多个相差 1 的输入值，请比较相应的输出值：

7262627 -> 8172
7262628 -> 819
7262629 -> 1732
...
7262631 -> 3558

这里非常清楚（给定几分钟和一个计算器），当输入增加 1 时，输出增加 913 modulo 8266 （即一个简单的线性同余生成器）。

差分密码分析是一种相对现代的技术，用于分析密码分组密码的强度，依赖于类似的方法但更复杂的想法是密码算法已知，但假设私钥不知道。考虑彼此相差单个位的输入块，并通过密码跟踪该位的影响，以推断出每个输出位因此“翻转”的可能性。

解决此类问题的其他方法是查看极端值（最大值、最小值）、分布（导致频率分析）、方向（数字是否总是增加？减少？）以及（如果允许的话）考虑发现数据集的上下文。例如，某些类型的 PIN 码总是包含重复的数字，以便更容易记住（我并不是说 PIN 码一定可以从其他任何内容推导出来 - 只是重复的数字是一个需要担心的更少数字！）。

Stabbing in the dark will drive you to insanity. There are some algorithms that, given current understanding, you couldn't hope to deduce the inner workings of between now and the [predicted] end of the universe without knowing the exact details (potentially including private keys or internal state). Of course, some of these algorithms are the foundations of modern cryptography.

If you know in advance that there's a pattern to be discovered though, there are sometimes ways of approaching this. For instance, if the dataset contains several input values that differ by 1, compare the corresponding output values:

7262627 -> 8172
7262628 -> 819
7262629 -> 1732
...
7262631 -> 3558

Here it's fairly clear (given a few minutes and a calculator) that when the input increases by 1, the output increases by 913 modulo 8266 (i.e. a simple linear congruential generator).

Differential cryptanalysis is a relatively modern technique used to analyse the strength of cryptographic block ciphers, relying on a similar but more complex idea for where the cipher algorithm is known, but it's assumed the private key isn't. Input blocks differing from each other by a single bit are considered and the effect of that bit is traced through the cipher to deduce how likely each output bit is to "flip" as a result.

Other ways of approaching this kind of problem would be to look at the extremes (maximum, minimum values), distribution (leading to frequency analysis), direction (do the numbers always increase? decrease?) and (if this is allowed) consider the context in which the data sets were found. For instance, some types of PIN codes always contain a repeated digit to make them easier to remember (I'm not saying a PIN code can necessarily be deduced from anything else - just that a repeated digit is one less digit to worry about!).

回复收藏 0 原文