使用 Nokogiri 提取一些 JSON
require 'open-uri'
require 'json'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://www.highcharts.com/demo/"))
puts doc
但是我希望能够从这个网页中提取json,使用正则表达式似乎不起作用,如何通过XPath提取JSON?
require 'open-uri'
require 'json'
require 'nokogiri'
doc = Nokogiri::HTML(open("http://www.highcharts.com/demo/"))
puts doc
But I want to be able to extract the json from this webpage, using regular expressions doesn't seem to work, and how to do extract JSON through XPath?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
以下是从 URL 访问脚本标记(不引用外部文件)的方法:
现在您只需找到所需的脚本块并提取所需的数据(使用正则表达式)。如果没有更多细节,很难猜测您想要什么并且依赖什么。
这是一个相当脆弱的正则表达式,可以找到我猜您正在寻找的内容:
这是您得到的结果:
请注意,这不是 JSON;这是一个表示 JavaScript 代码的字符串,包含对象、字符串、数组、数字和函数文字。
Here's how you can access the script tags (that don't reference an external file) from a URL:
Now you just need to find the script block you want and extract just the data you want (using regex). Without more details, it's hard to guess what you want and are relying upon.
Here's a fairly fragile regex that finds what I'm guessing you were looking for:
Here's what you get out:
Note that this is not JSON; this is a string representing JavaScript code with object, string, array, numeric, and function literals.