寻找视频中有趣的帧

发布于 2024-07-09 07:35:16 字数 168 浏览 7 评论 0原文

有谁知道我可以使用一种算法来查找视频的“有趣”代表性缩略图？

我有 30 张位图，我想选择最具代表性的一张作为视频缩略图。

显而易见的第一步是消除所有黑框。然后也许寻找各个帧之间的“距离”并选择接近平均值的东西。

这里有什么想法或发表的论文可以提供帮助吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

御守 2024-07-16 07:35:16

如果视频包含结构，即多个镜头，则视频摘要的标准技术涉及（a）镜头检测，然后（b）使用第一帧、中间帧或第n帧来表示每个镜头。参见[1]。

但是，让我们假设您希望在从单个相机源获取的单个连续帧流中找到有趣的帧。即一枪。这就是IR/CV（信息检索，计算机视觉）文本中广泛讨论的“关键帧检测”问题。一些说明性方法：

在[2]中，计算所有帧的平均颜色直方图，并且关键帧是具有最接近直方图的帧。即我们根据颜色分布选择最佳框架。
在[3]中，我们假设相机静止是帧重要性的指标。正如上面 Beds 所建议的那样。我们使用光流选择静止帧并使用它。
在[4]中，每个帧都被投影到一些高维内容空间中，我们在空间的角落找到这些帧并用它们来表示视频。
在[5]中，使用帧在内容空间中的长度和新颖性来评估其重要性。

总的来说，这是一个很大的领域，有很多方法。你可以看看国际图像和视频检索会议（CIVR）等学术会议来了解最新的想法。我发现[6]提出了视频抽象的有用的详细总结（关键帧检测和总结）。

对于“找到 30 个位图中最好的一个”问题，我会使用类似 [2] 的方法。计算帧表示空间（例如帧的颜色直方图），计算直方图来表示所有帧，并使用两者之间距离最小的帧（例如选择最适合您的空间的距离度量。我会尝试地球移动者的距离）。

卢女士。视觉信息检索原理。 Springer Verlag，2001。B
. Gunsel、Y. Fu 和 AM Tekalp。分层时间视频分割和内容表征。多媒体存储和归档系统 II，SPIE，3229:46-55，1997。W
. Wolf。通过运动分析选择关键帧。 IEEE 国际声学、语音和信号处理会议，第 1228-1231 页，1996 年。L
. Zhu、W. Qi、SZ Li、SQ Yang 和 HJ Zhu。使用最近特征线进行关键帧提取和镜头检索。载于 IW-MIR，ACM MM，第 217-220 页，2000 年。S
. Uchihashi。视频漫画：生成语义上有意义的视频摘要。
在过程中。 ACM Multimedia 99，奥兰多，佛罗里达州，11 月，第 383-292 页，1999 年
。Y. Li、T. Zhang 和 D. Tretter。视频抽象技术概述。技术报告，HP 实验室，2001 年 7 月。

If the video contains structure, i.e. several shots, then the standard techniques for video summarisation involve (a) shot detection, then (b) use the first, mid, or nth frame to represent each shot. See [1].

However, let us assume you wish to find an interesting frame in a single continuous stream of frames taken from a single camera source. I.e. a shot. This is the "key frame detection" problem that is widely discussed in IR/CV (Information Retrieval, Computer Vision) texts. Some illustrative approaches:

In [2] a mean colour histogram is computed for all frames and the key-frame is that with the closest histogram. I.e. we select the best frame in terms of it's colour distribution.
In [3] we assume that camera stillness is an indicator of frame importance. As suggested by Beds, above. We pick the still frames using optic-flow and use that.
In [4] each frame is projected into some high dimensional content space, we find those frames at the corners of the space and use them to represent the video.
In [5] frames are evaluated for importance using their length and novelty in content space.

In general, this is a large field and there are lots of approaches. You can look at the academic conferences such as The International Conference on Image and Video Retrieval (CIVR) for the latest ideas. I find that [6] presents a useful detailed summary of video abstraction (key-frame detection and summarisation).

For your "find the best of 30 bitmaps" problem I would use an approach like [2]. Compute a frame representation space (e.g. a colour histogram for the frame), compute a histogram to represent all frames, and use the frame with the minimum distance between the two (e.g. pick a distance metric that's best for your space. I would try Earth Mover's Distance).

M.S. Lew. Principles of Visual Information Retrieval. Springer Verlag, 2001.
B. Gunsel, Y. Fu, and A.M. Tekalp. Hierarchical temporal video segmentation and content characterization. Multimedia Storage and Archiving Systems II, SPIE, 3229:46-55, 1997.
W. Wolf. Key frame selection by motion analysis. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 1228-1231, 1996.
L. Zhao, W. Qi, S.Z. Li, S.Q. Yang, and H.J. Zhang. Key-frame extraction and shot retrieval using Nearest Feature Line. In IW-MIR, ACM MM, pages 217-220, 2000.
S. Uchihashi. Video Manga: Generating semantically meaningful video summaries.
In Proc. ACM Multimedia 99, Orlando, FL, Nov., pages 383-292, 1999.
Y. Li, T. Zhang, and D. Tretter. An overview of video abstraction techniques. Technical report, HP Laboratory, July 2001.

回复收藏 0 原文

过度放纵 2024-07-16 07:35:16

你要文件，所以我找到了一些。如果您不在校园内或没有通过 VPN 连接到校园，则可能很难访问这些论文。

PanoramaExcerpts：提取和打包全景图以供视频浏览

http://portal .acm.org/itation.cfm?id=266396

这解释了一种生成漫画书风格关键帧表示的方法。

摘要：

本文介绍了自动创建类似于漫画书的图画视频摘要的方法。视频片段的相对重要性是根据其长度和新颖性计算的。图像和音频分析用于自动检测和强调有意义的事件。基于这个重要性度量，我们选择相关的关键帧。选定的关键帧按重要性调整大小，然后有效地打包成图形摘要。我们提出了一种定量方法来衡量摘要捕获视频中显着事件的程度，并展示如何使用它来改进我们的摘要。结果是一个紧凑且视觉上令人愉悦的摘要，捕获了语义上重要的事件，并且适合打印或 Web 访问。通过包含源自 OCR 或其他方法的文本标题可以进一步增强此类摘要。我们描述了如何使用自动生成的摘要来简化对大量视频集合的访问。

根据场景内容自动提取代表性关键帧

http:// ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=751008

摘要：

为电影生成索引是一个繁琐且昂贵的过程，我们希望将其自动化。虽然寻找场景边界的算法很容易获得，但在选择单个帧来简洁地表示场景方面几乎没有做任何工作。在本文中，我们提出了基于场景内容自动选择代表性关键帧的新颖算法。对几种算法的详细描述之后是对人类感觉所选帧代表场景的程度的分析。最后，我们讨论如何将这些算法与现有算法集成以查找场景边界。

You asked for papers so I found a few. If you are not on campus or on VPN connection to campus these papers might be hard to reach.

PanoramaExcerpts: extracting and packing panoramas for video browsing

http://portal.acm.org/citation.cfm?id=266396

This one explains a method for generating a comicbook style keyframe representation.

Abstract:

This paper presents methods for automatically creating pictorial video summaries that resem- ble comic books. The relative importance of video segments is computed from their length and novelty. Image and audio analysis is used to automatically detect and emphasize mean- ingful events. Based on this importance mea- sure, we choose relevant keyframes. Selected keyframes are sized by importance, and then efficiently packed into a pictorial summary. We present a quantitative measure of how well a summary captures the salient events in a video, and show how it can be used to improve our summaries. The result is a compact and visually pleasing summary that captures semantically important events, and is suitable for printing or Web access. Such a summary can be further enhanced by including text cap- tions derived from OCR or other methods. We describe how the automatically generated sum- maries are used to simplify access to a large collection of videos.

Automatic extraction of representative keyframes based on scenecontent

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=751008

Abstract:

Generating indices for movies is a tedious and expensive process which we seek to automate. While algorithms for finding scene boundaries are readily available, there has been little work performed on selecting individual frames to concisely represent the scene. In this paper we present novel algorithms for automated selection of representative keyframes, based on scene content. Detailed description of several algorithms is followed by an analysis of how well humans feel the selected frames represent the scene. Finally we address how these algorithms can be integrated with existing algorithms for finding scene boundaries.

回复收藏 0 原文