查找 xml 页面中的重复项

发布于 2025-01-02 03:35:46 字数 1504 浏览 3 评论 0原文

我正在尝试使用 ruby​​ 和 nokogiri 在 Web 服务调用返回的 xml 中查找重复项。

我从下面的代码中得到的输出是这样的:

found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["2", "2"]

我想知道的是 skus 1 和 2 已被复制。所以类似这样的“发现重复的 sku [重复的 sku]”。

xml是这样的:

<Root>
  <Context>
  <ID>1234</ID> 
<Item>
  <ID>4567</ID> 
  </Item>
<Item>
  <ID>4567</ID> 
</Item>
<Item>
  <ID>5678</ID> 
</Item>

#Context Items that will produce duplicates. 
$context = ['a','b','c']

#Class that will search through an array to find duplicates
class Array
  def only_duplicates
  duplicates = []
  self.each {|each| duplicates << each if self.count(each) > 1}
  duplicates
 end
end

#loops through each item in the $context array
 $context.each do |item|
 puts "C_ItemID = " + item
 #Creates a url string using the context item
url = "url to the call"
#Creates a xml doc
doc = Nokogiri::XML(open(url))
    #Declare a blank array that the text from the node will be stored in
    values = []
    #loops through each item_id node to find duplicates. 
    doc.xpath('//item/id').each do |node|
        values << node.text
        @values = values.to_a
        if @values.only_duplicates.count > 1
            puts "found duplicate" + @values.only_duplicates.inspect
        end
    end
end

I'm trying to find duplicates within the xml returned by a web service call using ruby and nokogiri.

The output that i'm getting from the code below is something like this:

found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["2", "2"]

What I want to know is that skus 1 and 2 have been duplicated. So something like this "found duplicate skus [Duplicated skus]."

the xml is like this:

<Root>
  <Context>
  <ID>1234</ID> 
<Item>
  <ID>4567</ID> 
  </Item>
<Item>
  <ID>4567</ID> 
</Item>
<Item>
  <ID>5678</ID> 
</Item>

#Context Items that will produce duplicates. 
$context = ['a','b','c']

#Class that will search through an array to find duplicates
class Array
  def only_duplicates
  duplicates = []
  self.each {|each| duplicates << each if self.count(each) > 1}
  duplicates
 end
end

#loops through each item in the $context array
 $context.each do |item|
 puts "C_ItemID = " + item
 #Creates a url string using the context item
url = "url to the call"
#Creates a xml doc
doc = Nokogiri::XML(open(url))
    #Declare a blank array that the text from the node will be stored in
    values = []
    #loops through each item_id node to find duplicates. 
    doc.xpath('//item/id').each do |node|
        values << node.text
        @values = values.to_a
        if @values.only_duplicates.count > 1
            puts "found duplicate" + @values.only_duplicates.inspect
        end
    end
end

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

唐婉 2025-01-09 03:35:46
downloaded_from_url = "<Root><Context><ID>1234</ID><Item><ID>4567</ID></Item><Item><ID>4567</ID></Item><Item><ID>5678</ID></Item><Item><ID>5678</ID></Item>"
parsed_xml_document = Nokogiri::XML(downloaded_from_url)

list_of_item_ids    = parsed_xml_document.xpath("//Item/ID").map { |x| x.text }

def find_duplicate_items( in_collection=[] )
  in_collection.select do |item|
    in_collection.count(item) > 1
  end.uniq
end

duplicate_item_ids  = find_duplicate_items( list_of_item_ids )
#=> ["4567", "5678"]

查找重复项的更快方法(来源:Ryan LeCompte)。稍作修改&较短的版本。

def fast_find_duplicate_items ( in_collection=[] )
  collection.group_by do |element|
    element
  end.select do |key, value|
    value.size > 1
  end.keys
end
downloaded_from_url = "<Root><Context><ID>1234</ID><Item><ID>4567</ID></Item><Item><ID>4567</ID></Item><Item><ID>5678</ID></Item><Item><ID>5678</ID></Item>"
parsed_xml_document = Nokogiri::XML(downloaded_from_url)

list_of_item_ids    = parsed_xml_document.xpath("//Item/ID").map { |x| x.text }

def find_duplicate_items( in_collection=[] )
  in_collection.select do |item|
    in_collection.count(item) > 1
  end.uniq
end

duplicate_item_ids  = find_duplicate_items( list_of_item_ids )
#=> ["4567", "5678"]

A faster way to find duplicates(Credits: Ryan LeCompte). A slightly modified & shorter version.

def fast_find_duplicate_items ( in_collection=[] )
  collection.group_by do |element|
    element
  end.select do |key, value|
    value.size > 1
  end.keys
end
2025-01-09 03:35:46

这是在数组中查找重复项的更有效方法(使用 #count 会使算法变慢,因为它必须为每个项目遍历整个数组,O(N^2)):

list_of_item_ids.group_by { |e| e }.select { |k,v| v.size > 1 }.map(&:first)

Here's a more efficient way to find duplicates in an array (using #count makes the algorithm slower because it has to traverse the whole array for each item, O(N^2)):

list_of_item_ids.group_by { |e| e }.select { |k,v| v.size > 1 }.map(&:first)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文