当前位置：文江博客话题详情

在 R 中为 pentium 4 HT 机器使用多核

发布于 2024-09-15 12:55:05 字数 137 浏览 5 评论 0原文

我在办公室使用Pentium 4 HT机器来运行R，一些代码需要plyr包，我通常需要等待6-7分钟才能完成脚本运行，而我看到我的处理器只使用了一半。

我听说在 R 中使用多核包可以更好地利用多核处理器，我的情况适合吗？

谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

℉服软 2024-09-22 12:55:05

有很多软件包可以进行多核处理。请参阅doMPI、doSNOW、doMC 和doSMP。它们都是运行并行化的其他程序的前端（例如 MPI/OpenMPI、多核包...）。在 Windows 上，我有很好的经验 doSMP 而在 Linux 上 doMC 看起来很有前途（对 Windows 的一些支持正在出现，但有些人对“fork”的模拟有疑问）。

话虽这么说，我同意 Vince 的评论，即需要编写 plyr 函数才能使用并行计算的功能。您可以编写自己的函数来模拟使用 %dopar% 的 plyr （或编辑 plyr）（请参阅 foreach包也是如此）。

两个“CPU 使用历史记录”窗口可能意味着两个核心或多线程。例如，我有一个 4 核 i7-920 处理器，但我看到 8 个历史窗口，因为每个核都是多线程的。

请原谅我的词汇和/或逻辑，但当涉及到这类事情时，我会是文斯帖子中的那条鱼。

替代文本

回复收藏 0 原文

心作怪 2024-09-22 12:55:05

这听起来可能是一个愚蠢的问题，但是您的处理器是否有多个内核？据我了解，P4 没有，但我对硬件的了解就像鱼对天体物理学的了解一样多。

当您说“进程仅使用了一半”时，您的意思是您正在监视两个核心但只有一个正在使用，还是单个核心正在使用一半？如果是后者，您的应用程序可能受内存限制（并且可能会占用交换空间），而不是 CPU，因此并行化不会有帮助。

此外，plyr 包看起来并不使用 multicore 包，因此您必须显式重写 plyr 的部分内容才能实现并行化。但是，如果 plyr 的某些部分令人尴尬地并行，我敢打赌它们已经被并行化了。

所以我不认为你的问题是CPU限制的，我认为它是内存限制的（并且命中交换）。监控你的内存，也许可以将其转移到内存更高的机器上。

希望这有帮助！

编辑：

@Vince 正如我在 romunov 的回答中所写；
HT核心将执行2个进程
比 1 快（但比 2 慢）
核心），所以它值得制作
并行。甚至还有内存限制
进程也将占用 100% 的核心。
（我的重点）

值得并行吗？这个等式还有更多内容。在探索 Python 的多处理和线程模块时，我无数次重写了整个程序——甚至是“容易并行化”的程序——而且它们运行得更慢。为什么？打开新线程、进程、将数据转移到不同进程等都会产生固定成本。事情没那么简单；根据我的经验，并行化从来都不是这里讨论的灵丹妙药。我认为这些答案具有误导性。

首先，我们讨论的是并行化需要“6-7 分钟”的任务。除非操作员知道他/她的数据会大量增长，否则并行化甚至不值得花费大量时间进行编程。在实现并行版本所需的时间内，他/她也许可以完成 100 次非并行运行。在我的工作环境中，挂钟时间很重要。这些计算需要考虑到运行时方程中（除非您这样做是为了学习/娱乐）

其次，如果它占用交换空间，最大的减慢不是 CPU，而是磁盘 I/O。即使有一种简单的方法可以对 plyr 代码进行洗牌以使某些部分并行化（我对此表示怀疑），与添加更多内存相比，在 I/O 密集的进程上这样做会稍微加快速度。

例如，我曾经运行了 reshape 包中的一个命令，该命令演示了这种确切的行为。它运行在一台具有 4GB 内存的多核 OS X 机器上，几秒钟之内它就开始爬行（嗯，我的整台电脑都在爬行！），两个核心的 CPU 利用率为 60-70%，并且全部使用了 4GB 内存。我让它作为实验运行一个小时，然后杀死 R，看到我的内存跳回 3GB 可用空间。我将其转移到 512GB RAM 服务器（是的，我们很幸运拥有它），并在 7 分钟内完成。核心使用量没有变化。

This may sound like a silly question, but does your processor have more than one core? It was my understanding P4's didn't, but I have as much knowledge about hardware as a fish does astrophysics.

When you say your "process is only half utilized", do you mean that you are monitoring two cores and only one is being used, or a single core is being half used? If it's the latter, your application is probably memory bound (and probably hitting swap space), not CPU, so parallelization won't help.

Also, it doesn't look like the plyr package uses the multicore package, so you would have to explicitly rewrite parts of plyr to get parallelization. But, if parts of plyr were embarrassingly parallel, I bet they'd already be parallelized.

So I don't think your problem is CPU bound, I think it's memory-bound (and hitting swap). Monitor your memory, and maybe move it to a higher memory machine.

Hope this helps!

Edit:

@Vince As I wrote on romunov's answer;
HT core will execute 2 processes
faster than one (yet slower than 2
cores), so it is worth making
parallel. Also even memory bound
process will also take 100% of core.
(my emphasis)

Worth making parallel? There's much more that goes into that equation. Countless times when exploring Python's multiprocessing and threading modules I've rewritten entire programs - even "easily parallelizable" ones - and they've run slower. Why? There are fixed costs to opening new threads, processes, shuffling data around to different processes, etc. It's just not that simple; parallelization, in my experience, has never been the magic bullet it's being talked about here. I think these answers are misleading.

First off, we're talking about parallelizing a task that takes "6-7 minutes". Unless the OP knows his/her data is going to grow a lot, parallelization isn't even worth the wall clock time it takes to program. In the time it takes to implement the parallel version, perhaps he/she could have done 100 non-parallel runs. In my work environment, that wall clock time matters. These calculations need to be factored in to the runtime equation (unless you're doing it for learning/fun)

Second, if it is hitting swap space, the largest slow down isn't the CPU, it's disk I/O. Even if there was an easy way to shuffle around plyr code to get some parts parallelized (which I doubt), doing so on an I/O-bound process would speed things up trivially compared to adding more memory.

As an example, I once ran a command from the reshape package that demonstrated this exact behavior. It was on a multicore OS X machine with 4GB of memory, and in seconds it was crawling (well, my whole computer was crawling!) with 60-70% CPU across two cores, and all 4GB of memory used. I let it run as an experiment for an hour, then killed R and saw my memory jump back to 3GB free. I shuffled it to a 512GB RAM server (yes, we are lucky enough to have that), and it finished in 7 minutes. No amount of core usage changed.

回复收藏 0 原文

~没有更多了~