如何检测HTML页面是否包含视频?
我想知道是否可以检测 HTML 页面是否包含视频。
我知道一种可能的方法是在 HTML 源代码中查找“.swf”。但大多数页面不包含文件名。
例如,给定以下 URL 及其源代码,是否可以查明它是否包含视频: http://www.cnn.com/video/
I would like to know whether it a possible to detect whether a HTML page contains a video.
I know that one possible way is to look for ".swf" in the HTML source code. But most of the pages do not contain the file name.
For example, given following URL and possibly its source code, is it possible to find out whether it contains a video:
http://www.cnn.com/video/
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
有很多方法可以将视频嵌入 HTML 页面 - 作为 Flash 视频或通过
然后,有一些 JavaScript 库会在包含页面加载后初始化播放器 - 这些库几乎不可能检测到。
将视频可靠地输入网页仍然是一个非常复杂的问题,随后将其找出来就更加复杂了。根据您想要实现的目标,我会考虑放弃它。
There are many ways to embed Video into a HTML page - as Flash Video or instances of Platform-Specific players through
<object>
and<embed>
tags (but not every one of those tags is a video! The same holds true for.swf
- it's just the file extension of Flash files, Video or not), the new HTML 5<video>
tag... They are not impossible to find out but it's a lot of work to catch all possible player types, formats and embed codes, and will result in a lot of false positives / negatives.Then, there are JavaScript libraries that initialize players after the containing page has loaded - those are almost impossible to detect.
It's still a very complex issue to get video into a web page reliably, and subsequently, it's even more complex to find it out. Depending on what you are trying to achieve, I would consider dropping it.
对于您的情况(CNN 网站),您可以解析开放图谱微标记以获取视频信息。
诸如
og:video:type, og:image
之类的元标记将为您提供帮助。视频托管服务通常支持微标记,例如开放图或scheme.org。
这样您就可以解析这些标记。
For your case (CNN site) you can parse Open Graph micro-markup for a video information.
Meta tags such as
og:video:type, og:image
will help you.Video hosting services usually support micro-markup, e.g. open graph or scheme.org.
So you can parse these markups.
检查 DOM 中是否存在
Check if an
<object>
tag exists in the DOM and check its content type and parameters. You will find the pattern by yourself.您还可以在源代码中搜索 .flv 或 .mp4。
You can also search for .flv, or .mp4 in the source code.