确定两个 Nokogiri 节点是否等效

发布于 2024-12-10 23:52:37 字数 8714 浏览 0 评论 0原文

想象一下,您有代表以下两个文档中的 元素的 Nokogiri 节点:

<r xmlns:x="foo"><a foo="bar" jim="jam" x:oh="no"><x:b>Hello</x:b></a></r>
<r xmlns:i="foo"><a jim="jam" i:oh="no" foo="bar"><i:b>Hello</i:b></a></r>

从 DOM 的角度来看,这两个节点是等效的。我想有效地检测到这一点,但是 Nokogiri::XML::Node#== 只是检查对象相等性。由于 Nokogiri 1.5.0 尚不支持规范化,因此我不能只序列化节点并比较字符串。

比较两个节点以确保它们的名称、属性和内容在规范上等效的最快方法是什么?

答案可能依赖于仅在 Ruby 1.9.2+ 中提供的功能。

如果需要,

ORIG1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
ORIG2 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b jim='jam' foo='bar'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NOTEXT = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c/></a3>
</a>"
EXTRATEXT1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foobar</a3c></a3>
</a>"
EXTRATEXT2 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b>hi</a3b><a3c>foo</a3c></a3>
</a>"
MISSINGNODE = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/></a3>
</a>"
EXTRANODE = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c><a3d/></a3>
</a>"
SWAPNODE = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3x/><a3b/><a3c>foo</a3c></a3>
</a>"
MISSINGATTRIB = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
EXTRAATTRIB1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam' kits='meow'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
EXTRAATTRIB2 = "<a>
  <a1><a1a/><a1b/><a1c kits='meow'/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
SWAPATTRIB1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='zzz'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
SWAPATTRIB2 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' zzz='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE1 = "<a xmlns:x='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><x:a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE1B = "<a xmlns:z='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><z:a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE1C = "<a xmlns:x='bar'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><x:a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE2 = "<a xmlns:x='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' x:jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE2B= "<a xmlns:z='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' z:jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE2C= "<a xmlns:x='bar'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' x:jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"


require 'nokogiri'
require 'minitest/autorun'
class NodeEquivalence < MiniTest::Unit::TestCase
  def setup
    @o1 = Nokogiri::XML(ORIG1,&:noblanks).root
  end
  def test_equivalence
    o2 = Nokogiri::XML(ORIG2,&:noblanks).root
    assert @o1 =~ o2, "Equivalent nodes should be equivalent"
    assert o2 =~ @o1, "Equivalent nodes should be equivalent"
  end
  def test_textnodes
    no_text = Nokogiri::XML(NOTEXT,&:noblanks).root
    extra1  = Nokogiri::XML(EXTRATEXT1,&:noblanks).root
    extra2  = Nokogiri::XML(EXTRATEXT2,&:noblanks).root
    refute @o1 =~ no_text, "Notice missing text node child"
    refute no_text =~ @o1, "Notice missing text node child"
    refute @o1 =~ extra1,  "Notice different text in text node"
    refute extra1 =~ @o1,  "Notice different text in text node"
    refute @o1 =~ extra2,  "Notice extra text node"
    refute extra2 =~ @o1,  "Notice extra text node"
  end
  def test_nodes
    missing = Nokogiri::XML(MISSINGNODE,&:noblanks).root
    extra   = Nokogiri::XML(EXTRANODE,&:noblanks).root
    changed = Nokogiri::XML(SWAPNODE,&:noblanks).root
    refute @o1 =~ missing, "Notice missing node"
    refute missing =~ @o1, "Notice missing node"
    refute @o1 =~ extra,   "Notice extra node"
    refute extra =~ @o1,   "Notice extra node"
    refute @o1 =~ changed, "Notice renamed node"
    refute changed =~ @o1, "Notice renamed node"
  end
  def test_attributes
    missing = Nokogiri::XML(MISSINGATTRIB,&:noblanks).root
    extra1  = Nokogiri::XML(EXTRAATTRIB1,&:noblanks).root
    extra2  = Nokogiri::XML(EXTRAATTRIB2,&:noblanks).root
    swap1   = Nokogiri::XML(SWAPATTRIB1,&:noblanks).root
    swap2   = Nokogiri::XML(SWAPATTRIB2,&:noblanks).root
    refute @o1 =~ missing, "Notice missing attribute"
    refute missing =~ @o1, "Notice missing attribute"
    refute @o1 =~ extra1,  "Notice extra attribute"
    refute extra1 =~ @o1,  "Notice extra attribute"
    refute @o1 =~ extra2,  "Notice new attribute"
    refute extra2 =~ @o1,  "Notice new attribute"
    refute @o1 =~ swap1,   "Notice changed attribute value"
    refute swap1 =~ @o1,   "Notice changed attribute value"
    refute @o1 =~ swap2,   "Notice changed attribute name"
    refute swap2 =~ @o1,   "Notice changed attribute name"
  end
  def test_namespaces
    ns1  = Nokogiri::XML(NAMESPACE1,&:noblanks).root
    ns2  = Nokogiri::XML(NAMESPACE2,&:noblanks).root
    ns1b = Nokogiri::XML(NAMESPACE1B,&:noblanks).root
    ns2b = Nokogiri::XML(NAMESPACE2B,&:noblanks).root
    ns1c = Nokogiri::XML(NAMESPACE1C,&:noblanks).root
    ns2c = Nokogiri::XML(NAMESPACE2C,&:noblanks).root
    refute @o1 =~ ns1,  "Notice added node namespace"
    refute ns1 =~ @o1,  "Notice removed node namespace"
    refute @o1 =~ ns2,  "Notice added attribute namespace"
    refute ns2 =~ @o1,  "Notice removed attribute namespace"
    assert ns1 =~ ns1b, "Different namespace names on nodes don't matter"
    assert ns2 =~ ns2b, "Different namespace names on attributes don't matter"
    refute ns1 =~ ns1c, "Notice different namespace hrefs on nodes"
    refute ns2 =~ ns2c, "Notice different namespace hrefs on attributes"
  end
end

Imagine that you have Nokogiri nodes representing the <a> elements in the following two documents:

<r xmlns:x="foo"><a foo="bar" jim="jam" x:oh="no"><x:b>Hello</x:b></a></r>
<r xmlns:i="foo"><a jim="jam" i:oh="no" foo="bar"><i:b>Hello</i:b></a></r>

The two are equivalent from a DOM standpoint. I'd like to detect this efficiently, but Nokogiri::XML::Node#== just checks object equality. Since Nokogiri 1.5.0 does not yet have support for canonicalization, I can't just serialize the nodes and compare the strings.

What's the fastest way to compare two nodes to ensure that their names, attributes, and contents are canonically equivalent?

Answers may rely on features only available in Ruby 1.9.2+, if desired.

Test Cases

ORIG1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
ORIG2 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b jim='jam' foo='bar'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NOTEXT = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c/></a3>
</a>"
EXTRATEXT1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foobar</a3c></a3>
</a>"
EXTRATEXT2 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b>hi</a3b><a3c>foo</a3c></a3>
</a>"
MISSINGNODE = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/></a3>
</a>"
EXTRANODE = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c><a3d/></a3>
</a>"
SWAPNODE = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3x/><a3b/><a3c>foo</a3c></a3>
</a>"
MISSINGATTRIB = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
EXTRAATTRIB1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam' kits='meow'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
EXTRAATTRIB2 = "<a>
  <a1><a1a/><a1b/><a1c kits='meow'/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
SWAPATTRIB1 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='zzz'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
SWAPATTRIB2 = "<a>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' zzz='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE1 = "<a xmlns:x='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><x:a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE1B = "<a xmlns:z='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><z:a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE1C = "<a xmlns:x='bar'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' jim='jam'/><a2c/></a2>
  <a3><x:a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE2 = "<a xmlns:x='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' x:jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE2B= "<a xmlns:z='foo'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' z:jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"
NAMESPACE2C= "<a xmlns:x='bar'>
  <a1><a1a/><a1b/><a1c/></a1>
  <a2><a2a/><a2b foo='bar' x:jim='jam'/><a2c/></a2>
  <a3><a3a/><a3b/><a3c>foo</a3c></a3>
</a>"


require 'nokogiri'
require 'minitest/autorun'
class NodeEquivalence < MiniTest::Unit::TestCase
  def setup
    @o1 = Nokogiri::XML(ORIG1,&:noblanks).root
  end
  def test_equivalence
    o2 = Nokogiri::XML(ORIG2,&:noblanks).root
    assert @o1 =~ o2, "Equivalent nodes should be equivalent"
    assert o2 =~ @o1, "Equivalent nodes should be equivalent"
  end
  def test_textnodes
    no_text = Nokogiri::XML(NOTEXT,&:noblanks).root
    extra1  = Nokogiri::XML(EXTRATEXT1,&:noblanks).root
    extra2  = Nokogiri::XML(EXTRATEXT2,&:noblanks).root
    refute @o1 =~ no_text, "Notice missing text node child"
    refute no_text =~ @o1, "Notice missing text node child"
    refute @o1 =~ extra1,  "Notice different text in text node"
    refute extra1 =~ @o1,  "Notice different text in text node"
    refute @o1 =~ extra2,  "Notice extra text node"
    refute extra2 =~ @o1,  "Notice extra text node"
  end
  def test_nodes
    missing = Nokogiri::XML(MISSINGNODE,&:noblanks).root
    extra   = Nokogiri::XML(EXTRANODE,&:noblanks).root
    changed = Nokogiri::XML(SWAPNODE,&:noblanks).root
    refute @o1 =~ missing, "Notice missing node"
    refute missing =~ @o1, "Notice missing node"
    refute @o1 =~ extra,   "Notice extra node"
    refute extra =~ @o1,   "Notice extra node"
    refute @o1 =~ changed, "Notice renamed node"
    refute changed =~ @o1, "Notice renamed node"
  end
  def test_attributes
    missing = Nokogiri::XML(MISSINGATTRIB,&:noblanks).root
    extra1  = Nokogiri::XML(EXTRAATTRIB1,&:noblanks).root
    extra2  = Nokogiri::XML(EXTRAATTRIB2,&:noblanks).root
    swap1   = Nokogiri::XML(SWAPATTRIB1,&:noblanks).root
    swap2   = Nokogiri::XML(SWAPATTRIB2,&:noblanks).root
    refute @o1 =~ missing, "Notice missing attribute"
    refute missing =~ @o1, "Notice missing attribute"
    refute @o1 =~ extra1,  "Notice extra attribute"
    refute extra1 =~ @o1,  "Notice extra attribute"
    refute @o1 =~ extra2,  "Notice new attribute"
    refute extra2 =~ @o1,  "Notice new attribute"
    refute @o1 =~ swap1,   "Notice changed attribute value"
    refute swap1 =~ @o1,   "Notice changed attribute value"
    refute @o1 =~ swap2,   "Notice changed attribute name"
    refute swap2 =~ @o1,   "Notice changed attribute name"
  end
  def test_namespaces
    ns1  = Nokogiri::XML(NAMESPACE1,&:noblanks).root
    ns2  = Nokogiri::XML(NAMESPACE2,&:noblanks).root
    ns1b = Nokogiri::XML(NAMESPACE1B,&:noblanks).root
    ns2b = Nokogiri::XML(NAMESPACE2B,&:noblanks).root
    ns1c = Nokogiri::XML(NAMESPACE1C,&:noblanks).root
    ns2c = Nokogiri::XML(NAMESPACE2C,&:noblanks).root
    refute @o1 =~ ns1,  "Notice added node namespace"
    refute ns1 =~ @o1,  "Notice removed node namespace"
    refute @o1 =~ ns2,  "Notice added attribute namespace"
    refute ns2 =~ @o1,  "Notice removed attribute namespace"
    assert ns1 =~ ns1b, "Different namespace names on nodes don't matter"
    assert ns2 =~ ns2b, "Different namespace names on attributes don't matter"
    refute ns1 =~ ns1c, "Notice different namespace hrefs on nodes"
    refute ns2 =~ ns2c, "Notice different namespace hrefs on attributes"
  end
end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

飘过的浮云 2024-12-17 23:52:37

这是我当前的实现。它notnow命名空间感知:

class Nokogiri::XML::Node
  # Return true if this node is content-equivalent to other, false otherwise
  def =~(other)
    return true if self == other
    return false unless name == other.name
    stype = node_type; otype = other.node_type
    return false unless stype == otype
    sa = attributes; oa = other.attributes
    return false unless sa.length == oa.length
    sa = sa.sort.map{ |n,a| [n,a.value,a.namespace && a.namespace.href] }
    oa = oa.sort.map{ |n,a| [n,a.value,a.namespace && a.namespace.href] }
    return false unless sa == oa
    skids = children; okids = other.children
    return false unless skids.length == okids.length
    return false if stype == TEXT_NODE && (content != other.content)
    sns = namespace; ons = other.namespace
    return false if !sns ^ !ons
    return false if sns && (sns.href != ons.href)
    skids.to_enum.with_index.all?{ |ski,i| ski =~ okids[i] }
  end
end

这是我的基准代码(使用上面测试用例中的常量):

require 'benchmark'
Benchmark.bm(10) do |x|
  N = 1000
  NODES = [
    ORIG1, ORIG2, NOTEXT, EXTRATEXT1, EXTRATEXT2,
    MISSINGNODE, EXTRANODE, SWAPNODE,
    MISSINGATTRIB, EXTRAATTRIB1, EXTRAATTRIB2, SWAPATTRIB1, SWAPATTRIB2,
    NAMESPACE1, NAMESPACE2
  ].map{ |xml| Nokogiri::XML(xml,&:noblanks).root }
  MAIN = NODES.shift
  x.report("Phrogz"){ N.times{
    NODES.each{ |other| MAIN =~ other }
  }}
end

Here's my current implementation. It is not now namespace aware:

class Nokogiri::XML::Node
  # Return true if this node is content-equivalent to other, false otherwise
  def =~(other)
    return true if self == other
    return false unless name == other.name
    stype = node_type; otype = other.node_type
    return false unless stype == otype
    sa = attributes; oa = other.attributes
    return false unless sa.length == oa.length
    sa = sa.sort.map{ |n,a| [n,a.value,a.namespace && a.namespace.href] }
    oa = oa.sort.map{ |n,a| [n,a.value,a.namespace && a.namespace.href] }
    return false unless sa == oa
    skids = children; okids = other.children
    return false unless skids.length == okids.length
    return false if stype == TEXT_NODE && (content != other.content)
    sns = namespace; ons = other.namespace
    return false if !sns ^ !ons
    return false if sns && (sns.href != ons.href)
    skids.to_enum.with_index.all?{ |ski,i| ski =~ okids[i] }
  end
end

Here's my benchmark code (using the constants from the test cases above):

require 'benchmark'
Benchmark.bm(10) do |x|
  N = 1000
  NODES = [
    ORIG1, ORIG2, NOTEXT, EXTRATEXT1, EXTRATEXT2,
    MISSINGNODE, EXTRANODE, SWAPNODE,
    MISSINGATTRIB, EXTRAATTRIB1, EXTRAATTRIB2, SWAPATTRIB1, SWAPATTRIB2,
    NAMESPACE1, NAMESPACE2
  ].map{ |xml| Nokogiri::XML(xml,&:noblanks).root }
  MAIN = NODES.shift
  x.report("Phrogz"){ N.times{
    NODES.each{ |other| MAIN =~ other }
  }}
end
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文