使用 hpricot 构建 flashvar 数组

发布于 2024-11-02 15:00:57 字数 499 浏览 3 评论 0原文

我之前曾使用 hpricot 从某些 HTML 标签内的网站获取内容,但是我正在尝试构建此页面上找到的所有 flashvar 的数组 http://view-source:http://megavideo.com/?v=014U2YO9

require 'hpricot'
require 'open-uri'

flashvars = Array.new
doc = Hpricot(open("http://megavideo.com/?v=014U2YO9"))

for flashvars in (doc/"/param[@name='flashvars']") do
  flashvars << flashvar
end

我一直在尝试使用上面的代码片段,希望我在右边Tracks,有人可以进一步帮助我吗?

谢谢

I have used hpricot before for grabing content from websites that are within some HTML tags however I am trying to build an array of all the flashvars found on this page http://view-source:http://megavideo.com/?v=014U2YO9

require 'hpricot'
require 'open-uri'

flashvars = Array.new
doc = Hpricot(open("http://megavideo.com/?v=014U2YO9"))

for flashvars in (doc/"/param[@name='flashvars']") do
  flashvars << flashvar
end

I have been trying with the above code snippet, hopefully I was on the right tracks, would anyone be able to help me further?

Thankyou

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

黒涩兲箜 2024-11-09 15:00:58

您使用的语法表明您正在尝试从 元素获取属性,但该页面上不存在此类标记。对于 flashvar 对象的属性有大量 JavaScript 赋值。假设这些就是您想要的,您不需要 Hpricot,只需要 JS 的正则表达式。这似乎有效:

require 'open-uri'
html = open("http://megavideo.com/?v=014U2YO9").read

flashvars = Hash[ html.scan( /flashvars\.(\w+)\s*=\s*["']?(.+?)["']?;/ ) ]

require 'pp' # Just for pretty output here
pp flashvars

#=> {"logintxt"=>"Login",
#=>  "registertxt"=>"Register",
#=>  "searchtxt"=>"Search videos",
#=>  "searchrestxt"=>"\"",
#=>  "useSystemFont"=>"0",
#=>  "size"=>"17",
#=>  "loginAct"=>"?c=login%26next%3Dv%253D014U2YO9",
#=>  "registerAct"=>"?c=signup",
#=>  "userAct"=>"?c=account",
#=>  "signoutAct"=>"javascript:signout()",
#=>  "myvideostxt"=>"My Videos",
#=>  "videosAct"=>"?c=myvideos",
#=>  "added"=>"2011-04-14",
#=>  "username"=>"beenerkeekee19952",
#=>  etc.

请注意,这会将所有值保留为 Ruby 中的字符串,甚至是 JavaScript 中的数字值。由于它去掉了 JavaScript 字符串的前导/尾随引号,结果是您无法区分 flashvars.foo = 42;flashvars.bar = "42";

You have used syntax indicating that you are trying to fetch attributes from <param> elements, but no such markup exists on that page. There are a plethora of JavaScript assignments to properties of a flashvar object. Assuming that these are what you want, you don't need Hpricot, just a regex for the JS. This seems to work:

require 'open-uri'
html = open("http://megavideo.com/?v=014U2YO9").read

flashvars = Hash[ html.scan( /flashvars\.(\w+)\s*=\s*["']?(.+?)["']?;/ ) ]

require 'pp' # Just for pretty output here
pp flashvars

#=> {"logintxt"=>"Login",
#=>  "registertxt"=>"Register",
#=>  "searchtxt"=>"Search videos",
#=>  "searchrestxt"=>"\"",
#=>  "useSystemFont"=>"0",
#=>  "size"=>"17",
#=>  "loginAct"=>"?c=login%26next%3Dv%253D014U2YO9",
#=>  "registerAct"=>"?c=signup",
#=>  "userAct"=>"?c=account",
#=>  "signoutAct"=>"javascript:signout()",
#=>  "myvideostxt"=>"My Videos",
#=>  "videosAct"=>"?c=myvideos",
#=>  "added"=>"2011-04-14",
#=>  "username"=>"beenerkeekee19952",
#=>  etc.

Note that this leaves all values as strings in Ruby, even values that were numbers in JavaScript. As it strips off leading/trailing quote marks for the JavaScript strings, the result is that you cannot discern flashvars.foo = 42; from flashvars.bar = "42";.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文