寻找视频中有趣的帧

发布于 2024-07-09 07:35:16 字数 168 浏览 7 评论 0原文

有谁知道我可以使用一种算法来查找视频的“有趣”代表性缩略图?

我有 30 张位图,我想选择最具代表性的一张作为视频缩略图。

显而易见的第一步是消除所有黑框。 然后也许寻找各个帧之间的“距离”并选择接近平均值的东西。

这里有什么想法或发表的论文可以提供帮助吗?

Does anyone know of an algorithm that I could use to find an "interesting" representative thumbnail for a video?

I have say 30 bitmaps and I would like to choose the most representative one as the video thumbnail.

The obvious first step would be eliminate all black frames. Then perhaps look for the "distance" between the various frames and choose something that is close to the avg.

Any ideas here or published papers that could help out?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

御守 2024-07-16 07:35:16

如果视频包含结构,即多个镜头,则视频摘要的标准技术涉及(a)镜头检测,然后(b)使用第一帧、中间帧或第n帧来表示每个镜头。 参见[1]。

但是,让我们假设您希望在从单个相机源获取的单个连续帧流中找到有趣的帧。 即一枪。 这就是IR/CV(信息检索,计算机视觉)文本中广泛讨论的“关键帧检测”问题。 一些说明性方法:

  • 在[2]中,计算所有帧的平均颜色直方图,并且关键帧是具有最接近直方图的帧。 即我们根据颜色分布选择最佳框架。
  • 在[3]中,我们假设相机静止是帧重要性的指标。 正如上面 Beds 所建议的那样。 我们使用光流选择静止帧并使用它。
  • 在[4]中,每个帧都被投影到一些高维内容空间中,我们在空间的角落找到这些帧并用它们来表示视频。
  • 在[5]中,使用帧在内容空间中的长度和新颖性来评估其重要性。

总的来说,这是一个很大的领域,有很多方法。 你可以看看国际图像和视频检索会议(CIVR)等学术会议来了解最新的想法。 我发现[6]提出了视频抽象的有用的详细总结(关键帧检测和总结)。

对于“找到 30 个位图中最好的一个”问题,我会使用类似 [2] 的方法。 计算帧表示空间(例如帧的颜色直方图),计算直方图来表示所有帧,并使用两者之间距离最小的帧(例如选择最适合您的空间的距离度量。我会尝试地球移动者的距离)。

  1. 卢女士。 视觉信息检索原理。 Springer Verlag,2001。B
  2. . Gunsel、Y. Fu 和 AM Tekalp。 分层时间视频分割和内容表征。 多媒体存储和归档系统 II,SPIE,3229:46-55,1997。W
  3. . Wolf。 通过运动分析选择关键帧。 IEEE 国际声学、语音和信号处理会议,第 1228-1231 页,1996 年。L
  4. . Zhu、W. Qi、SZ Li、SQ Yang 和 HJ Zhu。 使用最近特征线进行关键​​帧提取和镜头检索。 载于 IW-MIR,ACM MM,第 217-220 页,2000 年。S
  5. . Uchihashi。 视频漫画:生成语义上有意义的视频摘要。
    在过程中。 ACM Multimedia 99,奥兰多,佛罗里达州,11 月,第 383-292 页,1999 年
  6. 。Y. Li、T. Zhang 和 D. Tretter。 视频抽象技术概述。 技术报告,HP 实验室,2001 年 7 月。

If the video contains structure, i.e. several shots, then the standard techniques for video summarisation involve (a) shot detection, then (b) use the first, mid, or nth frame to represent each shot. See [1].

However, let us assume you wish to find an interesting frame in a single continuous stream of frames taken from a single camera source. I.e. a shot. This is the "key frame detection" problem that is widely discussed in IR/CV (Information Retrieval, Computer Vision) texts. Some illustrative approaches:

  • In [2] a mean colour histogram is computed for all frames and the key-frame is that with the closest histogram. I.e. we select the best frame in terms of it's colour distribution.
  • In [3] we assume that camera stillness is an indicator of frame importance. As suggested by Beds, above. We pick the still frames using optic-flow and use that.
  • In [4] each frame is projected into some high dimensional content space, we find those frames at the corners of the space and use them to represent the video.
  • In [5] frames are evaluated for importance using their length and novelty in content space.

In general, this is a large field and there are lots of approaches. You can look at the academic conferences such as The International Conference on Image and Video Retrieval (CIVR) for the latest ideas. I find that [6] presents a useful detailed summary of video abstraction (key-frame detection and summarisation).

For your "find the best of 30 bitmaps" problem I would use an approach like [2]. Compute a frame representation space (e.g. a colour histogram for the frame), compute a histogram to represent all frames, and use the frame with the minimum distance between the two (e.g. pick a distance metric that's best for your space. I would try Earth Mover's Distance).

  1. M.S. Lew. Principles of Visual Information Retrieval. Springer Verlag, 2001.
  2. B. Gunsel, Y. Fu, and A.M. Tekalp. Hierarchical temporal video segmentation and content characterization. Multimedia Storage and Archiving Systems II, SPIE, 3229:46-55, 1997.
  3. W. Wolf. Key frame selection by motion analysis. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 1228-1231, 1996.
  4. L. Zhao, W. Qi, S.Z. Li, S.Q. Yang, and H.J. Zhang. Key-frame extraction and shot retrieval using Nearest Feature Line. In IW-MIR, ACM MM, pages 217-220, 2000.
  5. S. Uchihashi. Video Manga: Generating semantically meaningful video summaries.
    In Proc. ACM Multimedia 99, Orlando, FL, Nov., pages 383-292, 1999.
  6. Y. Li, T. Zhang, and D. Tretter. An overview of video abstraction techniques. Technical report, HP Laboratory, July 2001.
过度放纵 2024-07-16 07:35:16

你要文件,所以我找到了一些。 如果您不在校园内或没有通过 VPN 连接到校园,则可能很难访问这些论文。

PanoramaExcerpts:提取和打包全景图以供视频浏览

http://portal .acm.org/itation.cfm?id=266396

这解释了一种生成漫画书风格关键帧表示的方法。

摘要:

本文介绍了自动创建类似于漫画书的图画视频摘要的方法。 视频片段的相对重要性是根据其长度和新颖性计算的。 图像和音频分析用于自动检测和强调有意义的事件。 基于这个重要性度量,我们选择相关的关键帧。 选定的关键帧按重要性调整大小,然后有效地打包成图形摘要。 我们提出了一种定量方法来衡量摘要捕获视频中显着事件的程度,并展示如何使用它来改进我们的摘要。 结果是一个紧凑且视觉上令人愉悦的摘要,捕获了语义上重要的事件,并且适合打印或 Web 访问。 通过包含源自 OCR 或其他方法的文本标题可以进一步增强此类摘要。 我们描述了如何使用自动生成的摘要来简化对大量视频集合的访问。

根据场景内容自动提取代表性关键帧

http:// ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=751008

摘要:

为电影生成索引是一个繁琐且昂贵的过程,我们希望将其自动化。 虽然寻找场景边界的算法很容易获得,但在选择单个帧来简洁地表示场景方面几乎没有做任何工作。 在本文中,我们提出了基于场景内容自动选择代表性关键帧的新颖算法。 对几种算法的详细描述之后是对人类感觉所选帧代表场景的程度的分析。 最后,我们讨论如何将这些算法与现有算法集成以查找场景边界。

You asked for papers so I found a few. If you are not on campus or on VPN connection to campus these papers might be hard to reach.

PanoramaExcerpts: extracting and packing panoramas for video browsing

http://portal.acm.org/citation.cfm?id=266396

This one explains a method for generating a comicbook style keyframe representation.

Abstract:

This paper presents methods for automatically creating pictorial video summaries that resem- ble comic books. The relative importance of video segments is computed from their length and novelty. Image and audio analysis is used to automatically detect and emphasize mean- ingful events. Based on this importance mea- sure, we choose relevant keyframes. Selected keyframes are sized by importance, and then efficiently packed into a pictorial summary. We present a quantitative measure of how well a summary captures the salient events in a video, and show how it can be used to improve our summaries. The result is a compact and visually pleasing summary that captures semantically important events, and is suitable for printing or Web access. Such a summary can be further enhanced by including text cap- tions derived from OCR or other methods. We describe how the automatically generated sum- maries are used to simplify access to a large collection of videos.

Automatic extraction of representative keyframes based on scenecontent

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=751008

Abstract:

Generating indices for movies is a tedious and expensive process which we seek to automate. While algorithms for finding scene boundaries are readily available, there has been little work performed on selecting individual frames to concisely represent the scene. In this paper we present novel algorithms for automated selection of representative keyframes, based on scene content. Detailed description of several algorithms is followed by an analysis of how well humans feel the selected frames represent the scene. Finally we address how these algorithms can be integrated with existing algorithms for finding scene boundaries.

凯凯我们等你回来 2024-07-16 07:35:16

我认为你应该只看关键帧。

如果视频未使用基于关键帧的压缩进行编码,您可以根据以下文章创建算法:通过运动分析选择关键帧

根据视频的压缩方式,您可以每 2 秒或 30 秒获得一个关键帧。 我认为您应该使用本文中的算法来找到所有关键帧中“最”的关键帧。

I think you should only look at key frames.

If the video is not encoded using a compression which is based on key frames, you create an algorithm based on the following article: Key frame selection by motion analysis.

Depending on the compression of the video you can have key frames every 2 seconds or 30 seconds. Than I think you should use the algorithm in the article to find the "most" keyframe out of all the key frames.

辞别 2024-07-16 07:35:16

青睐美观的镜框也可能是有益的。 也就是说,寻找摄影的共同属性——长宽比、对比度、平衡等。

如果你不知道自己在寻找什么,就很难找到有代表性的照片。 但通过一些启发和我的建议,至少你可以想出一些好看的东西。

It may also be beneficial to favor frames that are aesthetically pleasing. That is, look for common attributes of photography-- aspect ratio, contrast, balance, etc.

It would be hard to find a representative shot if you don't know what you're looking for. But with some heuristics and my suggestion, at least you could come up with something good looking.

爱给你人给你 2024-07-16 07:35:16

我最近在做一个项目,我们做了一些视频处理,我们使用 OpenCV 来完成繁重的工作就视频处理而言,提升了。 我们必须提取帧、计算差异、提取人脸等。OpenCV 有一些内置算法可以计算帧之间的差异。 它适用于各种视频和图像格式。

I worked on a project recently where we did some video processing, and we used OpenCV to do the heavy lifting as far as video processing was concerned. We had to extract frames, calculate differences, extract faces, etc. OpenCV has some built-in algorithms that will calculate differences between frames. It works with a variety of video and image formats.

绿光 2024-07-16 07:35:16

哇,这是一个多么好的问题 - 我想第二步是迭代地删除它与其后继者之间几乎没有或没有变化的框架。 但您真正要做的只是减少潜在有趣的框架集。 我认为如何准确地确定“兴趣”是一个特殊的调味料,因为您没有像 Flickr 那样依赖的用户交互统计数据。

Wow, what a great question - I guess a second step would be to iteratively remove frames where there's little or no change between it and it's successors. But all you're really doing there is reducing the set of potentially interesting frames. How exactly you determine "interestingness" is the special sauce I suppose as you don't have the user interaction statistics to rely on like Flickr does.

神经大条 2024-07-16 07:35:16

导演有时会在一个特别“有趣”或美丽的镜头上流连忘返,那么找到一个5秒不变的部分,然后删除那些几乎是黑色的部分怎么样?

Directors will sometimes linger on a particularly 'insteresting' or beautiful shot so how about finding a 5 second section that doesn't change and then eliminating those sections that are almost black?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文