当前位置：文江博客话题详情

JavaScript youtube web-scraping

如何在＆＃x27;查看页面源之间的差异＆＃x27; and document.queryselector（“ html＆quot”）。innerhtml？

发布于 2025-02-01 10:32:32 字数 543 浏览 1 评论 0 原文

我想从此YouTube页面提取字幕（ br> 通过“查看页面源”查找时，我找到了 timedtext 。

但是当我通过JavaScript控制台搜索时，不是。它找不到：

document.querySelector("html").innerHTML.match("timedtext")

但是，对于 this 其他YouTube页面，它实际上确实可以使用。

差异如何以及如何解决？

原文

I want to extract subtitles from this YouTube page (link).
I found timedtext, when looking via 'View page source'.

But not when I search via javascript console. It won't find it:

document.querySelector("html").innerHTML.match("timedtext")

But for this other YouTube page, it does actually work both.

How come the difference and how to fix it?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

燃情 2025-02-08 10:32:32

，如果要使用以下方式提取字幕，请考虑搜索具有 ytinitialdata variable = action = variable =这是具有的脚本标签定时文字的URL。

我无法分辨出区别，但是，我假设一旦加载页面后，JavaScript代码注入HTML代码。

粘贴线之后，您可以在评论：

ytInitialPlayerResponse.captions.playerCaptionsTracklistRenderer.captionTracks

我以可用语言的时间获得了定时式。不过，请记住，可能并非所有视频都具有自动生成的字幕 -

那个示例，我没有得到字幕，所以我认为检查页面的源代码不适合所有视频。

As I commented, if you want to extract the subtitles using this way, consider instead search for the script tag that has the ytInitialData variable = that's the one that has the url of the timedtext.

I can't tell the difference, but, I assume the javascript code injects the HTML code once the page is loaded.

After pasting the line you share in your comment:

ytInitialPlayerResponse.captions.playerCaptionsTracklistRenderer.captionTracks

I got the timedtexts in the available languages. Keep in mind, though, probably not all videos has auto-generated captions - example

In that example, I didn't get the captions, so, I don't think that inspecting the source code of the page works for all videos.

回复收藏 0 原文

~没有更多了~

关于作者

秋心╮凉

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何在＆＃x27;查看页面源之间的差异＆＃x27; and document.queryselector（“ html＆quot”）。innerhtml？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

如何在＆＃x27;查看页面源之间的差异＆＃x27; and document.queryselector（“ html＆quot”）。innerhtml？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

5040234068

樱花雨梦

≈。彩虹

雨轻弹

血之狂魔

qq_0bIjwE

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。