使用 Nokogiri 提取一些 JSON

发布于 2024-12-18 07:38:29 字数 209 浏览 2 评论 0原文

require 'open-uri'
require 'json'
require 'nokogiri'

doc = Nokogiri::HTML(open("http://www.highcharts.com/demo/"))

puts doc

但是我希望能够从这个网页中提取json,使用正则表达式似乎不起作用,如何通过XPath提取JSON?

require 'open-uri'
require 'json'
require 'nokogiri'

doc = Nokogiri::HTML(open("http://www.highcharts.com/demo/"))

puts doc

But I want to be able to extract the json from this webpage, using regular expressions doesn't seem to work, and how to do extract JSON through XPath?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

溺ぐ爱和你が 2024-12-25 07:38:29

以下是从 URL 访问脚本标记(不引用外部文件)的方法:

require 'open-uri'
require 'nokogiri'
doc = Nokogiri.HTML(open('http://www.highcharts.com/demo/'))
inline_script = doc.xpath('//script[not(@src)]')
inline_script.each do |script|
  puts "-"*50, script.text
end

现在您只需找到所需的脚本块并提取所需的数据(使用正则表达式)。如果没有更多细节,很难猜测您想要什么并且依赖什么。

这是一个相当脆弱的正则表达式,可以找到我猜您正在寻找的内容:

inline = doc.xpath('//script[not(@src)]').map(&:text)
data   = inline.map{ |js| js[/new Highcharts\.Chart\((.+?\})\);/m,1] }.compact[0]
puts data

这是您得到的结果:

{
  chart: {
    renderTo: 'container',
    defaultSeriesType: 'line',
    marginRight: 130,
    marginBottom: 25
  },
  title: {
    text: 'Monthly Average Temperature',
    x: -20 //center
  },
  subtitle: {
    text: 'Source: WorldClimate.com',
    x: -20
  },
  xAxis: {
    categories: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
      'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
  },
  yAxis: {
    title: {
      text: 'Temperature (°C)'
    },
    plotLines: [{
      value: 0,
      width: 1,
      color: '#808080'
    }]
  },
  tooltip: {
    formatter: function() {
                return '<b>'+ this.series.name +'</b><br/>'+
        this.x +': '+ this.y +'°C';
    }
  },
  legend: {
    layout: 'vertical',
    align: 'right',
    verticalAlign: 'top',
    x: -10,
    y: 100,
    borderWidth: 0
  },
  series: [{
    name: 'Tokyo',
    data: [7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]
  }, {
    name: 'New York',
    data: [-0.2, 0.8, 5.7, 11.3, 17.0, 22.0, 24.8, 24.1, 20.1, 14.1, 8.6, 2.5]
  }, {
    name: 'Berlin',
    data: [-0.9, 0.6, 3.5, 8.4, 13.5, 17.0, 18.6, 17.9, 14.3, 9.0, 3.9, 1.0]
  }, {
    name: 'London',
    data: [3.9, 4.2, 5.7, 8.5, 11.9, 15.2, 17.0, 16.6, 14.2, 10.3, 6.6, 4.8]
  }]
}

请注意,这不是 JSON;这是一个表示 JavaScript 代码的字符串,包含对象、字符串、数组、数字和函数文字。

Here's how you can access the script tags (that don't reference an external file) from a URL:

require 'open-uri'
require 'nokogiri'
doc = Nokogiri.HTML(open('http://www.highcharts.com/demo/'))
inline_script = doc.xpath('//script[not(@src)]')
inline_script.each do |script|
  puts "-"*50, script.text
end

Now you just need to find the script block you want and extract just the data you want (using regex). Without more details, it's hard to guess what you want and are relying upon.

Here's a fairly fragile regex that finds what I'm guessing you were looking for:

inline = doc.xpath('//script[not(@src)]').map(&:text)
data   = inline.map{ |js| js[/new Highcharts\.Chart\((.+?\})\);/m,1] }.compact[0]
puts data

Here's what you get out:

{
  chart: {
    renderTo: 'container',
    defaultSeriesType: 'line',
    marginRight: 130,
    marginBottom: 25
  },
  title: {
    text: 'Monthly Average Temperature',
    x: -20 //center
  },
  subtitle: {
    text: 'Source: WorldClimate.com',
    x: -20
  },
  xAxis: {
    categories: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
      'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
  },
  yAxis: {
    title: {
      text: 'Temperature (°C)'
    },
    plotLines: [{
      value: 0,
      width: 1,
      color: '#808080'
    }]
  },
  tooltip: {
    formatter: function() {
                return '<b>'+ this.series.name +'</b><br/>'+
        this.x +': '+ this.y +'°C';
    }
  },
  legend: {
    layout: 'vertical',
    align: 'right',
    verticalAlign: 'top',
    x: -10,
    y: 100,
    borderWidth: 0
  },
  series: [{
    name: 'Tokyo',
    data: [7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3, 18.3, 13.9, 9.6]
  }, {
    name: 'New York',
    data: [-0.2, 0.8, 5.7, 11.3, 17.0, 22.0, 24.8, 24.1, 20.1, 14.1, 8.6, 2.5]
  }, {
    name: 'Berlin',
    data: [-0.9, 0.6, 3.5, 8.4, 13.5, 17.0, 18.6, 17.9, 14.3, 9.0, 3.9, 1.0]
  }, {
    name: 'London',
    data: [3.9, 4.2, 5.7, 8.5, 11.9, 15.2, 17.0, 16.6, 14.2, 10.3, 6.6, 4.8]
  }]
}

Note that this is not JSON; this is a string representing JavaScript code with object, string, array, numeric, and function literals.

待天淡蓝洁白时 2024-12-25 07:38:29
require 'open-uri'
require 'json'
doc = JSON.parse(open("http://www.highcharts.com/demo/"))
require 'open-uri'
require 'json'
doc = JSON.parse(open("http://www.highcharts.com/demo/"))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文