使用多线程暴力破解密码

发布于 2024-12-09 00:32:56 字数 355 浏览 0 评论 0原文

我现在正在做十年级的科学博览会项目，但我有点碰壁了。我的项目正在测试并行性对暴力破解 md5 密码哈希的效率的影响。我将使用 1、4、16、32、64、128、512 和 1024 个线程计算每秒测试的密码组合数，以了解其效率。我不确定我是否会进行字典暴力或纯粹的暴力。我认为字典会更容易并行化；只需将列表分成每个线程的相等部分即可。我还没有写太多代码；我只是想在开始编码之前先计划一下。

我的问题是：

计算每秒测试的密码组合是根据线程数确定性能的最佳方法吗？
字典还是纯粹的蛮力？如果是纯粹的暴力，你如何将任务分割成可变数量的线程？
还有其他建议吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

糖果控 2024-12-16 00:32:56

我并不是想打击你的热情，但这已经是一个很好理解的问题了。我将尝试解释下面会发生什么。但也许在另一个领域做你的项目会更好。那么“最大化 MD5 散列吞吐量”怎么样，那么您就不会局限于只关注线程。

我认为，当您编写项目时，您需要提供某种分析，以确定并行处理何时合适、何时不合适。

每次 CPU 切换到另一个线程时，它都必须保留当前线程上下文并加载新的线程上下文。这种开销不会发生在单线程进程中（垃圾收集等托管服务除外）。因此，在其他条件相同的情况下，添加线程不会提高性能，因为它必须执行原始工作负载以及所有上下文切换。

但是，如果您有多个 CPU（核心）可供使用，则为每个 CPU 创建一个线程意味着您可以并行计算，而不会产生上下文切换成本。如果线程多于 CPU，那么上下文切换将成为一个问题。

有 2 类计算：IO 密集型和计算密集型。 IO 密集型计算可能会花费大量 CPU 周期来等待某些硬件（例如网卡或硬盘）的响应。由于这种开销，您可以增加线程数，直到 CPU 再次达到最大，这可以抵消上下文切换的成本。然而，线程数量是有限制的，超过这个限制，上下文切换所花费的时间将比线程阻塞 IO 所花费的时间还要多。

计算密集型计算只需要 CPU 时间来进行数字运算。这是密码破解者使用的一种计算方法。计算密集型操作不会被阻塞，因此添加比 CPU 多的线程会降低整体吞吐量。

C# ThreadPool 已经为您处理了所有这些 - 您只需添加任务，并将它们排队，直到有可用的线程。仅当线程被阻塞时才会创建新线程。这样，上下文切换就被最小化了。

我有一台四核机器 - 将问题分解为 4 个线程，每个线程在自己的核心上执行，或多或少会与我的机器暴力破解密码一样快。

要认真并行化这个问题，您将需要大量的 CPU。我读过有关使用显卡的 GPU 来解决此问题的文章。

我在此处写了一份攻击向量分析（如果有的话）给你用。彩虹表和处理器/内存权衡将是开展项目的另一个有趣领域。

I'm not trying to dampen your enthusiasm, but this is already quite a well understood problem. I'll try to explain what to expect below. But maybe it would be better to do your project in another area. How's about "Maximising MD5 hashing throughput" then you wouldn't be restricted to just looking at threading.

I think that when you write up your project, you'll need to offer some kind of analysis as to when parallel processing is appropriate and when it isn't.

Each time that your CPU changes to another thread, it has to persist the current thread context and load the new thread context. This overhead does not occur in a single-threaded process (except for managed services like garbage collection). So all else equal, adding threads won't improve performance because it must do the original workload plus all of the context switching.

But if you have multiple CPUs (cores) at your disposal, creating one thread per CPU will mean that you can parallelize your calculations without incurring context switching costs. If you have more threads than CPUs then context switching will become an issue.

There are 2 classes of computation: IO-bound and compute-bound. An IO-bound computation can spend large amounts of CPU cycles waiting for a response from some hardware like a network card or a hard disk. Because of this overhead, you can increase the number of threads to the point where the CPU is maxed out again, and this can cancel out the cost of context switching. However there is a limit to the number of threads, beyond which context switching will take up more time than the threads spend blocking for IO.

Compute-bound computations simply require CPU time for number crunching. This is the kind of computation used by a password cracker. Compute-bound operations do not get blocked, so adding more threads than CPUs will slow down your overall throughput.

The C# ThreadPool already takes care of all of this for you - you just add tasks, and it queues them until a Thread is available. New Threads are only created when a thread is blocked. That way, context switches are minimised.

I have a quad-core machine - breaking the problem into 4 threads, each executing on its own core, will be more or less as fast as my machine can brute force passwords.

To seriously parallelize this problem, you're going to need a lot of CPUs. I've read about using the GPU of a graphics card to attack this problem.

There's an analysis of attack vectors that I wrote up here if it's any use to you. Rainbow tables and the processor/memory trade offs would be another interesting area to do a project in.

回复收藏 0 原文

青柠芒果 2024-12-16 00:32:56

回答你的问题：
1) 没有什么比测试线程性能的最佳方法更好的了。不同的问题对线程的扩展程度不同，具体取决于目标问题中每个操作的独立程度。所以你可以尝试一下字典的东西。但是，当您分析结果时，您得到的结果可能并不适用于所有问题。然而，一个非常流行的例子是，人们尝试共享计数器，其中每个线程都会将计数器增加固定的次数。

2）暴力破解可以覆盖大量的情况。事实上，通过暴力，可以有无限多种可能性。因此，您可能必须通过一些约束来限制您的密码，例如密码的最大长度等。分配暴力的一种方法是为每个线程分配不同的密码起始字符。然后该线程测试该起始字符的所有可能的密码。一旦线程完成其工作，它就会获得另一个起始字符，直到您使用所有可能的起始符号。

3）我想给您的一个建议是在少量线程上进行测试。您最多可以使用 1024 个线程。这不是一个好主意。一台机器的核心数一般为4到10个。因此，线程数尽量不要超过核心数太多。因为，一个处理器不能同时运行多个线程。在任何给定时间，每个处理器都有一个线程。相反，尝试测量将问题分配给不同线程的不同方案的性能。

让我知道这是否有帮助！

回复收藏 0 原文

秋千易 2024-12-16 00:32:56

一种既适用于字典又适用于所有可能密码的暴力破解的解决方案是使用一种基于将作业划分为工作单元的方法。有一个共享对象负责将问题空间划分为工作单元 - 理想情况下，每个工作单元需要 100 毫秒到 5 秒的时间 - 并向您启动的每个线程提供对此对象的引用。然后，每个线程都在这样的循环中运行：

for work_block in work_block_generator.get():
  for item in work_block:
    # Do work

与预先将整个工作区分成每个线程一个块的优点是，如果一个线程比其他线程工作得更快，它不会耗尽工作并只是坐着空闲 - 它会拾取更多块。

理想情况下，您的工作项生成器将有一个接口，在调用该接口时，会返回一个迭代器，该迭代器本身会返回要测试的各个密码。然后，基于字典的方法从字典中选择一个范围，而暴力方法则选择一个前缀来测试每个批次。当然，您需要使用同步原语来阻止不同线程之间试图获取工作单元的竞争。

One solution that will work for both a dictionary and a brute-force of all possible passwords is to use a approach based around dividing the job up into work units. Have a shared object responsible for dividing the problem space up into units of work - ideally, something like 100ms to 5 seconds worth of work each - and give a reference to this object to each thread you start. Each thread then operates in a loop like this:

for work_block in work_block_generator.get():
  for item in work_block:
    # Do work

The advantage of this over just parcelling up the whole workspace into one chunk per thread up-front is that if one thread works faster than others, it won't run out of work and just sit idle - it'll pick up more chunks.

Ideally your work item generator would have an interface that, when called, returns an iterator, which itself returns individual passwords to test. The dictionary-based one, then, selects a range from the dictionary, while the brute force one selects a prefix to test for each batch. You'll need to use synchronization primitives to stop races between different threads trying to grab work units, of course.

回复收藏 0 原文