使用 Sanitize 将变压器中的节点列入白名单
我在使用 Ruby 的 Sanitize 库创建转换器 lambda 的这个示例时遇到了一些问题。
我已经完成并拼凑了一个简单的脚本,尝试清理我的 options[:content]
变量中的所有内容,但尽管遇到了返回包含名为 :node_whitelist 的节点数组的哈希的位,似乎我的节点没有列入白名单。
这是我的代码:
#!/usr/bin/ruby
require 'rubygems'
require 'sanitize'
options = { :content => "<p>Here is my content. It has a video: <object width='480' height='390'><param name='movie' value='http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US'></param><param name='allowFullScreen' value='true'></param><param name='allowscriptaccess' value='always'></param><embed src='http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US' type='application/x-shockwave-flash' allowscriptaccess='always' allowfullscreen='true' width='480' height='390'></embed></object></p>" }
# adapted from example at https://github.com/rgrove/sanitize/
video_embed_sanitizer = lambda do |env|
node = env[:node]
node_name = env[:node_name]
puts "[video_embed_sanitizer] Starting up"
puts "[video_embed_sanitizer] node is #{node}"
puts "[video_embed_sanitizer] node.name.to_s.downcase is #{node.name.to_s.downcase}"
# Don't continue if this node is already whitelisted or is not an element.
if env[:is_whitelisted] then
puts "[video_embed_sanitizer] Already whitelisted"
end
return nil if env[:is_whitelisted] || !node.element?
parent = node.parent
# Since the transformer receives the deepest nodes first, we look for a
# <param> element or an <embed> element whose parent is an <object>.
return nil unless (node.name.to_s.downcase == 'param' || node.name.to_s.downcase == 'embed') &&
parent.name.to_s.downcase == 'object'
if node.name.to_s.downcase == 'param'
# Quick XPath search to find the <param> node that contains the video URL.
return nil unless movie_node = parent.search('param[@name="movie"]')[0]
url = movie_node['value']
else
# Since this is an <embed>, the video URL is in the "src" attribute. No
# extra work needed.
url = node['src']
end
# Verify that the video URL is actually a valid YouTube video URL.
puts "[video_embed_sanitizer] URL is #{url}"
return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//
# We're now certain that this is a YouTube embed, but we still need to run
# it through a special Sanitize step to ensure that no unwanted elements or
# attributes that don't belong in a YouTube embed can sneak in.
puts "[video_embed_sanitizer] Node before cleaning is #{node}"
Sanitize.clean_node!(parent, {
:elements => %w[embed object param],
:attributes => {
'embed' => %w[allowfullscreen allowscriptaccess height src type width],
'object' => %w[height width],
'param' => %w[name value]
}
})
puts "[video_embed_sanitizer] Node after cleaning is #{node}"
# Now that we're sure that this is a valid YouTube embed and that there are
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
# to whitelist the current node (<param> or <embed>) and its parent
# (<object>).
puts "[video_embed_sanitizer] Marking node as whitelisted and returning"
{:node_whitelist => [node, parent]}
end
options[:content] = Sanitize.clean(options[:content], :elements => ['a', 'b', 'blockquote', 'br', 'em', 'i', 'img', 'li', 'ol', 'p', 'span', 'strong', 'ul'],
:attributes => {'a' => ['href', 'title'], 'span' => ['class', 'style'], 'img' => ['src', 'alt']},
:protocols => {'a' => {'href' => ['http', 'https', :relative]}},
:add_attributes => { 'a' => {'rel' => 'nofollow'}},
:transformers => [video_embed_sanitizer])
puts options[:content]
这是正在生成的输出:
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer] Node before cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] Node after cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="allowFullScreen" value="true">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="allowscriptaccess" value="always">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] node.name.to_s.downcase is embed
[video_embed_sanitizer] URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer] Node before cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] Node after cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <object width="480" height="390"></object>
[video_embed_sanitizer] node.name.to_s.downcase is object
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <p>Here is my content. It has a video: </p>
[video_embed_sanitizer] node.name.to_s.downcase is p
<p>Here is my content. It has a video: </p>
我做错了什么?
I'm having some trouble with this example of creating a transformer lambda with the Sanitize library for Ruby.
I've gone through and thrown together a simple script that tries to Sanitize whatever's in my options[:content]
variable, but despite hitting the bit where a hash containing an array of nodes called :node_whitelist is returned, it seems somehow my nodes aren't making the whitelist.
Here's my code:
#!/usr/bin/ruby
require 'rubygems'
require 'sanitize'
options = { :content => "<p>Here is my content. It has a video: <object width='480' height='390'><param name='movie' value='http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US'></param><param name='allowFullScreen' value='true'></param><param name='allowscriptaccess' value='always'></param><embed src='http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US' type='application/x-shockwave-flash' allowscriptaccess='always' allowfullscreen='true' width='480' height='390'></embed></object></p>" }
# adapted from example at https://github.com/rgrove/sanitize/
video_embed_sanitizer = lambda do |env|
node = env[:node]
node_name = env[:node_name]
puts "[video_embed_sanitizer] Starting up"
puts "[video_embed_sanitizer] node is #{node}"
puts "[video_embed_sanitizer] node.name.to_s.downcase is #{node.name.to_s.downcase}"
# Don't continue if this node is already whitelisted or is not an element.
if env[:is_whitelisted] then
puts "[video_embed_sanitizer] Already whitelisted"
end
return nil if env[:is_whitelisted] || !node.element?
parent = node.parent
# Since the transformer receives the deepest nodes first, we look for a
# <param> element or an <embed> element whose parent is an <object>.
return nil unless (node.name.to_s.downcase == 'param' || node.name.to_s.downcase == 'embed') &&
parent.name.to_s.downcase == 'object'
if node.name.to_s.downcase == 'param'
# Quick XPath search to find the <param> node that contains the video URL.
return nil unless movie_node = parent.search('param[@name="movie"]')[0]
url = movie_node['value']
else
# Since this is an <embed>, the video URL is in the "src" attribute. No
# extra work needed.
url = node['src']
end
# Verify that the video URL is actually a valid YouTube video URL.
puts "[video_embed_sanitizer] URL is #{url}"
return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//
# We're now certain that this is a YouTube embed, but we still need to run
# it through a special Sanitize step to ensure that no unwanted elements or
# attributes that don't belong in a YouTube embed can sneak in.
puts "[video_embed_sanitizer] Node before cleaning is #{node}"
Sanitize.clean_node!(parent, {
:elements => %w[embed object param],
:attributes => {
'embed' => %w[allowfullscreen allowscriptaccess height src type width],
'object' => %w[height width],
'param' => %w[name value]
}
})
puts "[video_embed_sanitizer] Node after cleaning is #{node}"
# Now that we're sure that this is a valid YouTube embed and that there are
# no unwanted elements or attributes hidden inside it, we can tell Sanitize
# to whitelist the current node (<param> or <embed>) and its parent
# (<object>).
puts "[video_embed_sanitizer] Marking node as whitelisted and returning"
{:node_whitelist => [node, parent]}
end
options[:content] = Sanitize.clean(options[:content], :elements => ['a', 'b', 'blockquote', 'br', 'em', 'i', 'img', 'li', 'ol', 'p', 'span', 'strong', 'ul'],
:attributes => {'a' => ['href', 'title'], 'span' => ['class', 'style'], 'img' => ['src', 'alt']},
:protocols => {'a' => {'href' => ['http', 'https', :relative]}},
:add_attributes => { 'a' => {'rel' => 'nofollow'}},
:transformers => [video_embed_sanitizer])
puts options[:content]
and here's the output that's being generated:
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer] Node before cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] Node after cleaning is <param name="movie" value="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US">
[video_embed_sanitizer] Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="allowFullScreen" value="true">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <param name="allowscriptaccess" value="always">
[video_embed_sanitizer] node.name.to_s.downcase is param
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] node.name.to_s.downcase is embed
[video_embed_sanitizer] URL is http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US
[video_embed_sanitizer] Node before cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] Node after cleaning is <embed src="http://www.youtube.com/v/wjthx1GKhUI?fs=1&hl=en_US" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="390"></embed>
[video_embed_sanitizer] Marking node as whitelisted and returning
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <object width="480" height="390"></object>
[video_embed_sanitizer] node.name.to_s.downcase is object
[video_embed_sanitizer] Starting up
[video_embed_sanitizer] node is <p>Here is my content. It has a video: </p>
[video_embed_sanitizer] node.name.to_s.downcase is p
<p>Here is my content. It has a video: </p>
What am I doing wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我对 YouTube 的例子也有疑问。以下是我如何允许脚本标签,但仅限于 Ooyala 视频播放器:
我还通过创建自己的初始化器配置来极大地清理了一切
I too had problems with the YouTube example. Here is how I went about allowing script tags, but only for the Ooyala video player:
I also cleaned things up tremendously by creating my own initializers config as well:
有时会出现一些错误,请确保您使用的是最新版本。
这是我的(我认为)youtube iframe 的工作。在其他地方禁止 iframe,然后:
There have been some bugs with this from time to time, make sure you're using the latest version.
Here's my working (i think) one for youtube iframe. Disallow iframe elsewhere, then: