ruby nokogiri Restclient 来抓取 javascript 变量

发布于 2024-11-02 04:39:16 字数 592 浏览 0 评论 0原文

我正在使用restclient和nokogiri来解析一些效果很好的html,但是有一条信息存储在我需要返回的js(jquery)变量中,并且我不确定如何解析它。我可以使用 Nokogiri 来解析 javascript 块,但我需要它的一个子集,这可能很简单,但我不知道该怎么做。我可能可以对其进行正则表达式,但我假设有一种更简单的方法可以使用 JS 来请求它。

@resource = RestClient.get 'http://example.com'

doc = Nokogiri::HTML(@resource)

doc.css('script').each do |script|
    puts script.content
end

我想要得到什么:

        <script type="text/javascript">
            $(function(){
                //this is it
                $.Somenamespace.theCurrency = 'EUR';
                //a lot more stuff

I'm using restclient and nokogiri to parse some html which works great, but there is one piece of information stored in a js (jquery) variable which I need to return and I'm not sure how to parse it. I can use Nokogiri to parse the javascript block, but I need one subset of it which is probably simple but I'm not sure how to do it. I could probably regex it but I'm assuming there's an easier way to just ask for it using JS.

@resource = RestClient.get 'http://example.com'

doc = Nokogiri::HTML(@resource)

doc.css('script').each do |script|
    puts script.content
end

What I'm trying to get:

        <script type="text/javascript">
            $(function(){
                //this is it
                $.Somenamespace.theCurrency = 'EUR';
                //a lot more stuff

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

∞琼窗梦回ˉ 2024-11-09 04:39:16

不确定这是否合适,但您可以按如下方式检索它:

irb(main):017:0>

string
=> "<script type=\"text/javascript\">    $(function(){$.Somenamespace.theCurrency = \"EUR\"}); "

irb(主):018:0>

string.scan(/\$\.Somenamespace\.(.*)}\);/)
=> [["theCurrency = \"EUR\""]]

not sure if that fits, but you could retrieve it as follows:

irb(main):017:0>

string
=> "<script type=\"text/javascript\">    $(function(){$.Somenamespace.theCurrency = \"EUR\"}); "

irb(main):018:0>

string.scan(/\$\.Somenamespace\.(.*)}\);/)
=> [["theCurrency = \"EUR\""]]
旧城空念 2024-11-09 04:39:16

Nokogiri 是一个 XML 和 HTML 解析器。它不会解析节点的 CDATA 或文本内容,但它可以为您提供内容,让您使用字符串解析或正则表达式来获取所需的数据。

对于 Javascript,如果它嵌入在页面中,那么您可以获得父节点的 text。通常这很简单:

js = doc.at('script').text

如果页面的 块中有常见的

动态加载脚本时会变得更令人兴奋,但您仍然可以通过解析脚本的 src 参数中的 URL 来获取数据,然后检索它并再次处理。

有时Javascript会嵌入到其他标签的链接中,但这只是前两种方法的另一种旋转,以获取脚本并对其进行处理。

Nokogiri is an XML and HTML parser. It doesn't parse the CDATA or text content of nodes, but it can give you the content, letting you use string parsing or regex to get at the data you want.

In the case of Javascript, if it's embedded in the page then you can get the text of the parent node. Often that is simple:

js = doc.at('script').text

if there is the usual <script> tag in the <head> block of the page. If there are multiple script tags you have to extend the accessor to retrieve the right node, then process away.

It gets more exciting when the scripts are loaded dynamically, but you can still get the data by parsing the URL from the script's src parameter, then retrieving it, and processing away again.

Sometimes Javascript is embedded in the links of other tags, but it's just another spin on the previous two methods to get the script and process it.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文