使用多线程暴力破解密码
我现在正在做十年级的科学博览会项目,但我有点碰壁了。我的项目正在测试并行性对暴力破解 md5 密码哈希的效率的影响。我将使用 1、4、16、32、64、128、512 和 1024 个线程计算每秒测试的密码组合数,以了解其效率。我不确定我是否会进行字典暴力或纯粹的暴力。我认为字典会更容易并行化;只需将列表分成每个线程的相等部分即可。我还没有写太多代码;我只是想在开始编码之前先计划一下。
我的问题是:
计算每秒测试的密码组合是根据线程数确定性能的最佳方法吗?
字典还是纯粹的蛮力?如果是纯粹的暴力,你如何将任务分割成可变数量的线程?
还有其他建议吗?
I'm working on my 10th grade science fair project right now and I've kind of hit a wall. My project is testing the effect of parallelism on the efficiency of brute forcing md5 password hashes. I'll be calculating the # of password combinations/second it tests to see how efficient it is, using 1, 4,16,32,64,128,512,and 1024 threads. I'm not sure if I'll do dictionary brute force or pure brute force. I figure that dictionary would be easier to parallelize; just split the list up into equal parts for each thread. I haven't written much code yet; I'm just trying to plan it out before I start coding.
My questions are:
Is calculating the password combinations tested/second the best way to determine the performance based on # of threads?
Dictionary or pure brute force? If pure brute force, how would you split up the task into a variable number of threads?
Any other suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我并不是想打击你的热情,但这已经是一个很好理解的问题了。我将尝试解释下面会发生什么。但也许在另一个领域做你的项目会更好。那么“最大化 MD5 散列吞吐量”怎么样,那么您就不会局限于只关注线程。
我认为,当您编写项目时,您需要提供某种分析,以确定并行处理何时合适、何时不合适。
每次 CPU 切换到另一个线程时,它都必须保留当前线程上下文并加载新的线程上下文。这种开销不会发生在单线程进程中(垃圾收集等托管服务除外)。因此,在其他条件相同的情况下,添加线程不会提高性能,因为它必须执行原始工作负载以及所有上下文切换。
但是,如果您有多个 CPU(核心)可供使用,则为每个 CPU 创建一个线程意味着您可以并行计算,而不会产生上下文切换成本。如果线程多于 CPU,那么上下文切换将成为一个问题。
有 2 类计算:IO 密集型和计算密集型。 IO 密集型计算可能会花费大量 CPU 周期来等待某些硬件(例如网卡或硬盘)的响应。由于这种开销,您可以增加线程数,直到 CPU 再次达到最大,这可以抵消上下文切换的成本。然而,线程数量是有限制的,超过这个限制,上下文切换所花费的时间将比线程阻塞 IO 所花费的时间还要多。
计算密集型计算只需要 CPU 时间来进行数字运算。这是密码破解者使用的一种计算方法。计算密集型操作不会被阻塞,因此添加比 CPU 多的线程会降低整体吞吐量。
C# ThreadPool 已经为您处理了所有这些 - 您只需添加任务,并将它们排队,直到有可用的线程。仅当线程被阻塞时才会创建新线程。这样,上下文切换就被最小化了。
我有一台四核机器 - 将问题分解为 4 个线程,每个线程在自己的核心上执行,或多或少会与我的机器暴力破解密码一样快。
要认真并行化这个问题,您将需要大量的 CPU。我读过有关使用显卡的 GPU 来解决此问题的文章。
我在此处写了一份攻击向量分析(如果有的话)给你用。彩虹表和处理器/内存权衡将是开展项目的另一个有趣领域。
I'm not trying to dampen your enthusiasm, but this is already quite a well understood problem. I'll try to explain what to expect below. But maybe it would be better to do your project in another area. How's about "Maximising MD5 hashing throughput" then you wouldn't be restricted to just looking at threading.
I think that when you write up your project, you'll need to offer some kind of analysis as to when parallel processing is appropriate and when it isn't.
Each time that your CPU changes to another thread, it has to persist the current thread context and load the new thread context. This overhead does not occur in a single-threaded process (except for managed services like garbage collection). So all else equal, adding threads won't improve performance because it must do the original workload plus all of the context switching.
But if you have multiple CPUs (cores) at your disposal, creating one thread per CPU will mean that you can parallelize your calculations without incurring context switching costs. If you have more threads than CPUs then context switching will become an issue.
There are 2 classes of computation: IO-bound and compute-bound. An IO-bound computation can spend large amounts of CPU cycles waiting for a response from some hardware like a network card or a hard disk. Because of this overhead, you can increase the number of threads to the point where the CPU is maxed out again, and this can cancel out the cost of context switching. However there is a limit to the number of threads, beyond which context switching will take up more time than the threads spend blocking for IO.
Compute-bound computations simply require CPU time for number crunching. This is the kind of computation used by a password cracker. Compute-bound operations do not get blocked, so adding more threads than CPUs will slow down your overall throughput.
The C# ThreadPool already takes care of all of this for you - you just add tasks, and it queues them until a Thread is available. New Threads are only created when a thread is blocked. That way, context switches are minimised.
I have a quad-core machine - breaking the problem into 4 threads, each executing on its own core, will be more or less as fast as my machine can brute force passwords.
To seriously parallelize this problem, you're going to need a lot of CPUs. I've read about using the GPU of a graphics card to attack this problem.
There's an analysis of attack vectors that I wrote up here if it's any use to you. Rainbow tables and the processor/memory trade offs would be another interesting area to do a project in.
回答你的问题:
1) 没有什么比测试线程性能的最佳方法更好的了。不同的问题对线程的扩展程度不同,具体取决于目标问题中每个操作的独立程度。所以你可以尝试一下字典的东西。但是,当您分析结果时,您得到的结果可能并不适用于所有问题。然而,一个非常流行的例子是,人们尝试共享计数器,其中每个线程都会将计数器增加固定的次数。
2)暴力破解可以覆盖大量的情况。事实上,通过暴力,可以有无限多种可能性。因此,您可能必须通过一些约束来限制您的密码,例如密码的最大长度等。分配暴力的一种方法是为每个线程分配不同的密码起始字符。然后该线程测试该起始字符的所有可能的密码。一旦线程完成其工作,它就会获得另一个起始字符,直到您使用所有可能的起始符号。
3)我想给您的一个建议是在少量线程上进行测试。您最多可以使用 1024 个线程。这不是一个好主意。一台机器的核心数一般为4到10个。因此,线程数尽量不要超过核心数太多。因为,一个处理器不能同时运行多个线程。在任何给定时间,每个处理器都有一个线程。相反,尝试测量将问题分配给不同线程的不同方案的性能。
让我知道这是否有帮助!
To answer your question:
1) There is nothing like the best way to test thread performance. Different problems scale differently with threads, depending on how independent each operation in the target problem is. So you can try the dictionary thing. But, when you analyse the results, the results that you get might not be applicable on all problems. One very popular example however, is that people try a shared counter, where the counter is increased by a fixed number of times by each thread.
2) Brute force will cover a large number of cases. In fact, by brute force, there can be an infinite number of possibilities. So, you might have to limit your password by some constraints like the maximum length of the password and so on. One way to distribute brute force is to assign each thread a different starting character for the password. The thread then tests all possible passwords for that starting character. Once the thread finishes its work, it gets another starting character till you use all possible starting symbols.
3) One suggestion that I would like to give you is to test on a little smaller number of threads. You are going upto 1024 threads. That is not a good idead. The number of cores on a machine is generally 4 to 10. So, try not to exceed the number of threads by a huge number than the number of cores. Because, a processor cannot run multiple threads at the same time. Its one thread per processor at any given time. Instead, try to measure performace for different schemes for assigning the problem to different threads.
Let me know if this helps!
一种既适用于字典又适用于所有可能密码的暴力破解的解决方案是使用一种基于将作业划分为工作单元的方法。有一个共享对象负责将问题空间划分为工作单元 - 理想情况下,每个工作单元需要 100 毫秒到 5 秒的时间 - 并向您启动的每个线程提供对此对象的引用。然后,每个线程都在这样的循环中运行:
与预先将整个工作区分成每个线程一个块的优点是,如果一个线程比其他线程工作得更快,它不会耗尽工作并只是坐着空闲 - 它会拾取更多块。
理想情况下,您的工作项生成器将有一个接口,在调用该接口时,会返回一个迭代器,该迭代器本身会返回要测试的各个密码。然后,基于字典的方法从字典中选择一个范围,而暴力方法则选择一个前缀来测试每个批次。当然,您需要使用同步原语来阻止不同线程之间试图获取工作单元的竞争。
One solution that will work for both a dictionary and a brute-force of all possible passwords is to use a approach based around dividing the job up into work units. Have a shared object responsible for dividing the problem space up into units of work - ideally, something like 100ms to 5 seconds worth of work each - and give a reference to this object to each thread you start. Each thread then operates in a loop like this:
The advantage of this over just parcelling up the whole workspace into one chunk per thread up-front is that if one thread works faster than others, it won't run out of work and just sit idle - it'll pick up more chunks.
Ideally your work item generator would have an interface that, when called, returns an iterator, which itself returns individual passwords to test. The dictionary-based one, then, selects a range from the dictionary, while the brute force one selects a prefix to test for each batch. You'll need to use synchronization primitives to stop races between different threads trying to grab work units, of course.
在字典和暴力方法中,问题都是Embarrassingly Parallel。
要将 n 个线程的暴力破解问题划分为 n 个部分,只需将前两个(或三个)字母(“前缀”)分成 n 个部分即可。然后,每个线程都有一组分配的前缀,例如“aa - fz”,它只负责测试其前缀后面的所有内容。
在实践中,字典通常在破解更多密码方面略胜一筹,但由于暴力破解涵盖了所有内容,因此不会错过目标长度内的密码。
In both the dictionary and brute force methods, the problem is Embarrassingly Parallel.
To divide the problem for brute force with n threads, just say, the first two (or three) letters (the "prefix") into n pieces. Then, each thread has a set of assigned prefixes, like "aa - fz" where it is responsible only for testing everything that follows its prefixes.
Dictionary is usually statistically slightly better in practice for cracking more passwords, but brute force, since it covers everything, cannot miss a password within the target length.