ffmpeg(-mt) 和 TBB
我刚刚开始使用最新版本的 ffmpeg,其中 ffmpeg-mt 已合并到其中。
但是,由于我的应用程序使用 TBB(英特尔线程构建模块),因此具有新线程创建和同步功能的 ffmpeg-mt 实现不太适合,因为它可能会阻止我的 TBB 任务执行解码函数。它还会不必要地浪费缓存。
我在 pthread.c 中查找,它似乎实现了 ffmpeg 用于启用多线程的接口。
我的问题是是否可以创建一个 tbb.c 来实现相同的功能,但使用 tbb 任务而不是显式线程?
我对 C 没有经验,但我的猜测是不可能轻松地将 tbb (C++)编译成 ffmpeg。那么也许在运行时以某种方式覆盖 ffmpeg 函数指针将是可行的方法?
如果有任何关于将 TBB 实现到 ffmpeg 线程 api 的建议或意见,我将不胜感激。
I just started using the latest build of ffmpeg into which ffmpeg-mt has been merged.
However, since my application uses TBB (Intel Threading Building Blocks), the ffmpeg-mt imlementation with new thread creation and synchronization does not quite fit, as it could potentially block my TBB tasks executing the decode functions. Also it would trash the cache unnecessarily.
I was looking around in pthread.c which seems to implement the interface which ffmpeg uses to enable multithreading.
My question is whether it would be possible to create a tbb.c which implements the same functions but using tbb tasks instead of explicit threads?
I am not experienced with C, but my guess is that it would not be possible to easily compile tbb (which is C++) into ffmpeg. So maybe somehow overwriting the ffmpeg function pointers during run-time would be the way to go?
I would appreciate any suggestions or comments in regards to implementing TBB into ffmpeg threading api.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
所以我通过阅读 ffmpeg 代码弄清楚了如何做到这一点。
基本上,您所要做的就是包含以下代码并使用
tbb_avcodec_open/tbb_avcodec_close
而不是 ffmpegs 的avcodec_open/avcodec_close
。这将使用 TBB 任务并行执行解码。
So I figured out how to do it by reading through the ffmpeg code.
Basicly all you have to do is to include the code below and use
tbb_avcodec_open/tbb_avcodec_close
instead of ffmpegs'avcodec_open/avcodec_close
.This will use TBB tasks to execute decoding in parallel.
在此重新发布我在TBB 论坛上给您的回复,为了 SO 的任何人都可以感兴趣。
上面答案中的代码对我来说看起来不错;这是在考虑本机线程设计的上下文中使用 TBB 的巧妙方法。我想知道它是否可以变得更加TBish,可以这么说。我有一些想法,如果你有时间和意愿,你可以尝试一下。
如果希望/需要控制线程数量,则可能会对以下两项感兴趣。
s->thread_opaque
中;如果没有,可能的解决方案是使用全局映射将 AVCodecContext* 映射到相应的task_scheduler_init 的地址。与上述无关,另一个潜在的变化是如何调用 tbb::parallel_for 。它不能仅仅用于创建足够的线程,而是可以用于其直接目的,如下所示?
如果
count
显着大于thread_count
,则性能会更好,因为 a) 更多并行裕量意味着 TBB 工作更高效(您显然知道这一点),b) 的开销集中式原子计数器分布在更多迭代中。请注意,我为blocked_range
选择了粒度 2;这是因为计数器在循环体内同时递增和递减,因此每个任务至少需要两次迭代(相应地,count>=2*thread_count
)才能“匹配”您的变体。Re-posting here my response to you at the TBB forum, for sake of whoever at SO can be interested.
Your code in the answer above looks good to me; a clever way to use TBB in a context that was designed with native threads in mind. I wonder if it can be made even more TBBish, so to say. I have some ideas which you can try if you have time and desire.
The following two items can be of interest if there is a desire/need to control the number of threads.
tbb::task_scheduler_init
(TSI) object, and initialize it with as many threads as desired (not necessary MAX_THREADS). Keep the address of this object ins->thread_opaque
if possible/allowed; if not, a possible solution is a global map that mapsAVCodecContext*
to the address of the correspondingtask_scheduler_init
.Independently of the above, another potential change is in how to call
tbb::parallel_for
. Instead of using it to merely create enough threads, cannot it be used for its direct purpose, like below?This can perform better if
count
is significantly greater thanthread_count
, because a) more parallel slack means TBB works more efficiently (which you apparently know), and b) the overhead of the centralized atomic counter is spread over more iterations. Note that I selected the grain size of 2 forblocked_range
; this is because the counter is both incremented and decremented inside the loop body, and so at least two iterations per task (and correspondingly,count>=2*thread_count
) are necessary to "match" your variant.