python 中的 ffmpeg - 提取元数据
我使用 ffmpeg for Python 从视频文件中提取元数据。我认为官方文档可以在这里找到: https://kkroening.github.io/ffmpeg-python/
为了提取元数据(持续时间、分辨率、每秒帧数等),我使用提供的函数“ffmpeg.probe”。遗憾的是,当在大量视频文件上运行它时,它的效率相当低,因为它似乎(显然?)每次都将整个文件加载到内存中以读取少量数据。
如果这不是它的作用,也许有人可以解释相当长的运行时间的原因可能是什么。
否则,有没有办法使用 ffmpeg 或其他库以更有效的方式检索元数据?
非常感谢任何反馈或帮助。
编辑:为了清楚起见,我在这里添加了我的代码:
pool = mp.Pool()
videos = []
for file in os.listdir(directory):
pool.apply_async(ffmpeg.probe, args=[os.path.join(directory, file)], callback=videos.append)
pool.close()
pool.join()
缺少路径的导入和定义,但它应该足以理解正在发生的事情。
I use ffmpeg for Python to extract meta data from video files. I think the official documentation is available here: https://kkroening.github.io/ffmpeg-python/
To extract meta data (duration, resolution, frames per second, etc.) I use the function "ffmpeg.probe" as provided. Sadly, when running it on a large amount of video files, it is rather inefficient as it seems to (obviously?) load the whole file into memory each time to read just a small amount of data.
If this is not what it does, maybe someone could explain what the cause might be for the rather extensive runtime.
Otherwise, is there any way to retrieve meta data in a more efficient way using ffmpeg or some other library?
Any feedback or help is very much appreciated.
Edit: For clarity I added my code here:
pool = mp.Pool()
videos = []
for file in os.listdir(directory):
pool.apply_async(ffmpeg.probe, args=[os.path.join(directory, file)], callback=videos.append)
pool.close()
pool.join()
The imports and the definition of the paths are missing, but it should suffice to understand what is going on.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果速度减慢来自子进程生成(而不是实际 I/O),那么多线程/多处理可能会有所帮助。这可能没有帮助,因为与几乎所有其他事情相比,文件 I/O 通常需要时间。
这是不正确的断言。它应该只读取相关的标头/数据包来检索元数据。您支付的子流程税可能比其他任何东西都多。
(1) 除了上面 @Peter Hassaballeh 所说的之外,ffprobe 还可以选择限制要查找的内容。如果您只需要获取容器(格式)级信息或仅获取特定流的信息,则可以准确指定您需要的内容(在一定程度上)。这可以节省一些时间。
(2) 您可以尝试 MediaInfo(另一个免费工具,如 ffprobe),您也应该能够从 Python 调用它。
(3) 如果您正在处理特定的文件格式,最快的方法是在 Pyton 中自行解码,只读取对您重要的字节。不过,根据当前的瓶颈是什么,它可能不会有那么大的改进。
This is where multithreading/multiprocessing could potentially be helpful IF the slowdown comes from subprocess spawning (and not from actual I/O). This may not help as file I/O in general takes time compared to virtually everything else.
This is incorrect assertion IMO. It should only read relevant headers/packets to retrieve the metadata. You are likely paying subprocess tax more than anything else.
(1) Adding to what @Peter Hassaballeh said above, ffprobe has options to limit what to look up. If you only need to get the container(format)-level info or only of a particular stream, you can specify exactly what you need (to an extent). This could save some time.
(2) You can try MediaInfo (another free tool like ffprobe) which you should be able to call from Python as well.
(3) If you are dealing with a particular file format, the fastest way is to decode it yourself in Pyton, read only the bytes that matters to you. Depending on what is the current bottleneck, it may not be that drastic of an improvement, tho.
我建议直接使用 ffprobe 。不幸的是,ffmpeg 有时可能会占用 CPU 资源,但这完全取决于您的硬件规格。
I suggest using the ffprobe directly. Unfortunately ffmpeg can be CPU expensive sometimes but it all depends on your hardware specs.