Objective-C：从多个线程调用和复制同一块

发布于 2024-10-27 13:05:43 字数 1109 浏览 11 评论 0原文

我在这里处理神经网络，但可以安全地忽略它，因为真正的问题必须处理 Objective-C 中的块。这是我的问题。我找到了一种将神经网络转换为可以一次性执行的大块的方法。然而，相对于激活网络来说，它的速度真的非常慢。这似乎有点违反直觉。

如果我给你一组嵌套函数，例如

CGFloat answer = sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias))
//or in block notation
^(CGFloat x, CGFloat y, CGFloat d, CGFloat bias) {
 return sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias));
};

理论上，多次运行该函数应该比循环一堆连接以及设置节点活动/非活动等更容易/更快，所有这些本质上都是在结尾。

但是，当我创建一个块时（请参阅线程：如何在运行时创建函数）并运行此代码，对于任何中等规模的网络来说，它都非常慢。

现在，我不太明白的是：

当你复制一个块时，你到底在复制什么？
假设我将一个块复制两次，copy1 和 copy2。如果我在同一个线程上调用 copy1 和 copy2，是否会调用相同的函数？我不明白文档对于块副本的含义： Apple Block Docs
现在，如果我再次制作该副本，copy1 和 copy2，但相反，我会调用在单独线程上复制，现在这些函数的行为如何？当每个线程尝试访问同一个块时，这会导致某种程度的减慢吗？

原文

I'm dealing with neural networks here, but it's safe to ignore that, as the real question has to deal with blocks in objective-c. Here is my issue. I found a way to convert a neural network into a big block that can be executed all at once. However, it goes really, really slow, relative to activating the network. This seems a bit counterintuitive.

If I gave you a group of nested functions like

CGFloat answer = sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias))
//or in block notation
^(CGFloat x, CGFloat y, CGFloat d, CGFloat bias) {
 return sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias));
};

In theory, running that function multiple times should be easier/quicker than looping through a bunch of connections, and setting nodes active/inactive, etc, all of which essentially calculate this same function in the end.

However, when I create a block (see thread: how to create function at runtime) and run this code, it is slow as all hell for any moderately sized network.

Now, what I don't quite understand is:

When you copy a block, what exactly are you copying?
Let's say, I copy a block twice, copy1 and copy2. If I call copy1 and copy2 on the same thread, is the same function called? I don't understand exactly what the docs mean for block copies: Apple Block Docs
Now if I make that copy again, copy1 and copy2, but instead, I call the copies on separate threads, now how do the functions behave? Will this cause some sort of slowdown, as each thread attempts to access the same block?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

×纯※雪 2024-11-03 13:05:44

当你复制一个块时，到底是什么
你在抄袭吗？

您正在复制块已捕获的任何状态。如果该块没有捕获任何状态（该块似乎没有捕获任何状态），那么该副本应该是“自由的”，因为该块将是一个常量（类似于@“”的工作方式）。

比方说，我复制一个块两次，copy1
和副本2。如果我调用 copy1 和 copy2
在同一个线程上，是相同的
函数调用？我不明白
文档对于 block 的含义正是如此
副本：Apple Block Docs

复制块时，永远不会复制该块的代码。仅捕获状态。所以，是的，您将执行完全相同的指令集。

现在，如果我再次复制该副本，则复制1
和 copy2，但相反，我称之为
在单独的线程上复制，现在该怎么办
函数的行为？这是否会导致
某种程度的减慢，因为每个线程
尝试访问同一个块？

块内捕获的数据不会以任何方式受到多线程访问的保护，因此，不，不会有任何减速（但会有您可能想象到的所有并发同步乐趣）。

您是否尝试过对应用程序进行采样以查看哪些内容消耗了 CPU 周期？另外，考虑到您要这样做，您可能希望熟悉友好的本地反汇编程序（otool -TtVv binary/or/.o/file），因为它对于确定如何进行反汇编非常有帮助。块复制确实很昂贵。

如果您正在采样并在块本身中看到大量时间，那么这只是您的计算消耗了大量 CPU 时间。如果块在复制期间消耗 CPU，您将在复制助手中看到消耗情况。

尝试创建一个包含一堆不同类型的块的源文件；带参数、不带、带捕获状态、不带、带捕获块（带/不带捕获状态）等。以及一个在每个块上调用 Block_copy() 的函数。

拆解它，您将深入了解复制块时会发生什么。就我个人而言，我发现 x86_64 汇编比 ARM 更容易阅读。（这一切听起来像是很好的博客素材——我应该把它写下来）。

When you copy a block, what exactly
are you copying?

You are copying any state the block has captured. If that block captures no state -- which that block appears not to -- then the copy should be "free" in that the block will be a constant (similar to how @"" works).

Let's say, I copy a block twice, copy1
and copy2. If I call copy1 and copy2
on the same thread, is the same
function called? I don't understand
exactly what the docs mean for block
copies: Apple Block Docs

When a block is copied, the code of the block is never copied. Only the captured state. So, yes, you'll be executing the exact same set of instructions.

Now if I make that copy again, copy1
and copy2, but instead, I call the
copies on separate threads, now how do
the functions behave? Will this cause
some sort of slowdown, as each thread
attempts to access the same block?

The data captured within a block is not protected from multi-threaded access in any way so, no, there would be no slowdown (but there will be all the concurrency synchronization fun you might imagine).

Have you tried sampling the app to see what is consuming the CPU cycles? Also, given where you are going with this, you might want to become acquainted with your friendly local disassembler (otool -TtVv binary/or/.o/file) as it can be quite helpful in determining how costly a block copy really is.

If you are sampling and seeing lots of time in the block itself, then that is just your computation consuming lots of CPU time. If the block were to consume CPU during the copy, you would see the consumption in a copy helper.

Try creating a source file that contains a bunch of different kinds of blocks; with parameters, without, with captured state, without, with captured blocks with/without captured state, etc.. and a function that calls Block_copy() on each.

Disassemble that and you'll gain a deep understanding on what happens when blocks are copied. Personally, I find x86_64 assembly to be easier to read than ARM. (This all sounds like good blog fodder -- I should write it up).

回复收藏 0 原文

~没有更多了~