Ruby Hpricot RegEx 将
替换为
'

发布于 2024-09-13 02:20:08 字数 552 浏览 6 评论 0原文

有人可以告诉我如何使用 Hpricot & 将这行 Javascript 转换为 Ruby吗？正则表达式？

// Replace all doubled-up <BR> tags with <P> tags, and remove fonts.
    var pattern =  new RegExp ("<br/?>[ \r\n\s]*<br/?>", "g");
    document.body.innerHTML = document.body.innerHTML.replace(pattern, "</p><p>").replace(/<\/?font[^>]*>/g, '');

我设置的代码是：

require 'rubygems'
require 'hpricot'
require 'open-uri'

@file = Hpricot(open("http://www.bubl3r.com/article.html"))

谢谢

原文

Can someone please tell me how to convert this line of Javascript to Ruby using Hpricot & RegEx?

// Replace all doubled-up <BR> tags with <P> tags, and remove fonts.
    var pattern =  new RegExp ("<br/?>[ \r\n\s]*<br/?>", "g");
    document.body.innerHTML = document.body.innerHTML.replace(pattern, "</p><p>").replace(/<\/?font[^>]*>/g, '');

The code I have setup is:

require 'rubygems'
require 'hpricot'
require 'open-uri'

@file = Hpricot(open("http://www.bubl3r.com/article.html"))

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对风讲故事 2024-09-20 02:20:08

OP URL 的内容似乎已经发生了变化，就像互联网上经常发生的那样，因此我拼凑了一些示例 HTML 来展示我将如何解决这个问题。

另外，我推荐 Nokogiri 作为 Ruby HTML/XML 解析器，因为它得到了非常积极的支持、强大且灵活。

require 'nokogiri'

html = <<EOT
<html>
<body>
  some<br><br>text
  <font>
    text wrapped with font
  </font>
  some<br>more<br>text
</body>
</html>
EOT

doc = Nokogiri::HTML(html)

# Replace all doubled-up <BR> tags with <P> tags, and remove fonts.
doc.search('br').each do |n|
  if (n.previous.name == 'br')
    n.previous.remove 
    n.replace('<p>')
  end
end

doc.search('font').each do |n|
  n.replace(n.content)
end

print doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >>   some<p></p>text
# >>   
# >>     text wrapped with font
# >>   
# >>   some<br>more<br>text
# >> </body></html>

The content for the OPs URL seems to have changed, as commonly happens on the internet, so I cobbled up some sample HTML to show how I'd go about this.

Also, Nokogiri is what I recommend as a Ruby HTML/XML parser because it's very actively supported, robust and flexible.

require 'nokogiri'

html = <<EOT
<html>
<body>
  some<br><br>text
  <font>
    text wrapped with font
  </font>
  some<br>more<br>text
</body>
</html>
EOT

doc = Nokogiri::HTML(html)

# Replace all doubled-up <BR> tags with <P> tags, and remove fonts.
doc.search('br').each do |n|
  if (n.previous.name == 'br')
    n.previous.remove 
    n.replace('<p>')
  end
end

doc.search('font').each do |n|
  n.replace(n.content)
end

print doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html><body>
# >>   some<p></p>text
# >>   
# >>     text wrapped with font
# >>   
# >>   some<br>more<br>text
# >> </body></html>

回复收藏 0 原文

魔 2024-09-20 02:20:08

我认为清理 html 文件的更好方法是 beautiful soup。我将它用于 python，它做得非常好，因为它模拟了 html 浏览器语义的某些部分。

http://www.crummy.com/software/RubyfulSoup/

回复收藏 0 原文

恰似旧人归 2024-09-20 02:20:08

尽管它不会生成有效的 HTML，但类似这样的方法可以工作：

require 'rubygems'
require 'hpricot'
require 'open-uri'

@file = Hpricot(open("http://www.bubl3r.com/article.html"))
puts @file.html.gsub('<br />', '<p>')

Even though it won't produce valid HTML, something like this works:

require 'rubygems'
require 'hpricot'
require 'open-uri'

@file = Hpricot(open("http://www.bubl3r.com/article.html"))
puts @file.html.gsub('<br />', '<p>')

回复收藏 0 原文

~没有更多了~