Ruby Hpricot RegEx 将
替换为
'
有人可以告诉我如何使用 Hpricot & 将这行 Javascript 转换为 Ruby吗?正则表达式?
// Replace all doubled-up <BR> tags with <P> tags, and remove fonts.
var pattern = new RegExp ("<br/?>[ \r\n\s]*<br/?>", "g");
document.body.innerHTML = document.body.innerHTML.replace(pattern, "</p><p>").replace(/<\/?font[^>]*>/g, '');
我设置的代码是:
require 'rubygems'
require 'hpricot'
require 'open-uri'
@file = Hpricot(open("http://www.bubl3r.com/article.html"))
谢谢
Can someone please tell me how to convert this line of Javascript to Ruby using Hpricot & RegEx?
// Replace all doubled-up <BR> tags with <P> tags, and remove fonts.
var pattern = new RegExp ("<br/?>[ \r\n\s]*<br/?>", "g");
document.body.innerHTML = document.body.innerHTML.replace(pattern, "</p><p>").replace(/<\/?font[^>]*>/g, '');
The code I have setup is:
require 'rubygems'
require 'hpricot'
require 'open-uri'
@file = Hpricot(open("http://www.bubl3r.com/article.html"))
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
OP URL 的内容似乎已经发生了变化,就像互联网上经常发生的那样,因此我拼凑了一些示例 HTML 来展示我将如何解决这个问题。
另外,我推荐 Nokogiri 作为 Ruby HTML/XML 解析器,因为它得到了非常积极的支持、强大且灵活。
The content for the OPs URL seems to have changed, as commonly happens on the internet, so I cobbled up some sample HTML to show how I'd go about this.
Also, Nokogiri is what I recommend as a Ruby HTML/XML parser because it's very actively supported, robust and flexible.
我认为清理 html 文件的更好方法是 beautiful soup。我将它用于 python,它做得非常好,因为它模拟了 html 浏览器语义的某些部分。
http://www.crummy.com/software/RubyfulSoup/
I think that a better way to clean a html file is beautiful soup. I use it for python and it does a very good job because it emulate some part of the semantic of html browser.
http://www.crummy.com/software/RubyfulSoup/
尽管它不会生成有效的 HTML,但类似这样的方法可以工作:
Even though it won't produce valid HTML, something like this works: