选择通过脚本添加到 DOM 的元素
我一直在尝试使用以下方法获取
HtmlNode videoObjectNode = doc.DocumentNode.SelectSingleNode("//object");
HtmlNode videoEmbedNode = doc.DocumentNode.SelectSingleNode("//embed");
这似乎不起作用。
谁能告诉我如何获取这些标签及其 InnerHtml?
YouTube 嵌入视频如下所示:
<embed height="385" width="640" type="application/x-shockwave-flash"
src="http://s.ytimg.com/yt/swf/watch-vfl184368.swf" id="movie_player" flashvars="..."
allowscriptaccess="always" allowfullscreen="true" bgcolor="#000000">
我有一种感觉 JavaScript 可能会阻止 swf 播放器工作,希望不会……
干杯
I've been trying to get either an <object>
or an <embed>
tag using:
HtmlNode videoObjectNode = doc.DocumentNode.SelectSingleNode("//object");
HtmlNode videoEmbedNode = doc.DocumentNode.SelectSingleNode("//embed");
This doesn't seem to work.
Can anyone please tell me how to get these tags and their InnerHtml?
A YouTube embedded video looks like this:
<embed height="385" width="640" type="application/x-shockwave-flash"
src="http://s.ytimg.com/yt/swf/watch-vfl184368.swf" id="movie_player" flashvars="..."
allowscriptaccess="always" allowfullscreen="true" bgcolor="#000000">
I got a feeling the JavaScript might stop the swf player from working, hope not...
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
更新2010-08-26(回应OP的评论):
我认为你的想法是错误的,Alex。假设我编写了一些如下所示的 C# 代码:
现在,如果我编写了一个 C# 解析器,它是否应该将上面的字符串文字的内容识别为 C# 代码并如此突出显示它(或其他内容)? 否,因为在格式良好的 C# 文件的上下文中,该文本表示要为其分配
codeBlock
变量的字符串
。同样,在 YouTube 页面的 HTML 中,
事实上,如果
HtmlAgilityPack
did 忽略了这一事实并尝试识别可能的 HTML 文本部分,它仍然不会成功这些元素是因为,在 JavaScript 内部,它们被\
字符严重转义(请注意我发布的用于解决此问题的代码中不稳定的Unescape
方法)。我并不是说我下面的黑客解决方案是解决这个问题的正确方法;我只是解释为什么获取这些元素并不像使用
HtmlAgilityPack
获取它们那么简单。YouTubeScraper
好的,Alex:您要求的,所以它就在这里。一些真正的 hacky 代码可以从 JavaScript 海洋中提取宝贵的
如果您感兴趣,这里是我整理的一个小演示(我知道超级花哨):
原始答案
为什么不尝试使用元素的 Id 呢?
更新:哦,伙计,您正在寻找本身在 JavaScript 中的 HTML 标记吗?这绝对是为什么这行不通的原因。 (从
HtmlAgilityPack
的角度来看,它们并不是真正要解析的标签;所有这些 JavaScript 实际上都是标签内的一个大字符串。)也许有您可以通过某种方式将
标记的内部文本本身解析为 HTML,然后从那里开始。
Update 2010-08-26 (in response to OP's comment):
I think you're thinking about it the wrong way, Alex. Suppose I wrote some C# code that looked like this:
Now, if I wrote a C# parser, should it recognize the contents of the string literal above as C# code and highlight it (or whatever) as such? No, because in the context of a well-formed C# file, that text represents a
string
to which thecodeBlock
variable is being assigned.Similarly, in the HTML on YouTube's pages, the
<object>
and<embed>
elements are not really elements at all in the context of the current HTML document. They are the contents of string values residing within JavaScript code.In fact, if
HtmlAgilityPack
did ignore this fact and attempted to recognize all portions of text that could be HTML, it still wouldn't succeed with these elements because, being inside JavaScript, they're heavily escaped with\
characters (notice the precariousUnescape
method in the code I posted to get around this issue).I'm not saying my hacky solution below is the right way to approach this problem; I'm just explaining why obtaining these elements isn't as straightforward as grabbing them with
HtmlAgilityPack
.YouTubeScraper
OK, Alex: you asked for it, so here it is. Some truly hacky code to extract your precious
<object>
and<embed>
elements out from that sea of JavaScript.And in case you're interested, here's a little demo I threw together (super fancy, I know):
Original Answer
Why not try using the element's Id instead?
Update: Oh man, you're searching for HTML tags that are themselves within JavaScript? That's definitely why this isn't working. (They aren't really tags to be parsed from the perspective of
HtmlAgilityPack
; all of that JavaScript is really one big string inside a<script>
tag.) Maybe there's some way you can parse the<script>
tag's inner text itself as HTML and go from there.