查找 xml 页面中的重复项

发布于 2025-01-02 03:35:46 字数 1504 浏览 3 评论 0原文

我正在尝试使用 ruby 和 nokogiri 在 Web 服务调用返回的 xml 中查找重复项。

我从下面的代码中得到的输出是这样的：

found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["2", "2"]

我想知道的是 skus 1 和 2 已被复制。所以类似这样的“发现重复的 sku [重复的 sku]”。

xml是这样的：

<Root>
  <Context>
  <ID>1234</ID> 
<Item>
  <ID>4567</ID> 
  </Item>
<Item>
  <ID>4567</ID> 
</Item>
<Item>
  <ID>5678</ID> 
</Item>

#Context Items that will produce duplicates. 
$context = ['a','b','c']

#Class that will search through an array to find duplicates
class Array
  def only_duplicates
  duplicates = []
  self.each {|each| duplicates << each if self.count(each) > 1}
  duplicates
 end
end

#loops through each item in the $context array
 $context.each do |item|
 puts "C_ItemID = " + item
 #Creates a url string using the context item
url = "url to the call"
#Creates a xml doc
doc = Nokogiri::XML(open(url))
    #Declare a blank array that the text from the node will be stored in
    values = []
    #loops through each item_id node to find duplicates. 
    doc.xpath('//item/id').each do |node|
        values << node.text
        @values = values.to_a
        if @values.only_duplicates.count > 1
            puts "found duplicate" + @values.only_duplicates.inspect
        end
    end
end

原文

I'm trying to find duplicates within the xml returned by a web service call using ruby and nokogiri.

The output that i'm getting from the code below is something like this:

found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["1", "1"]
found duplicate["2", "2"]

What I want to know is that skus 1 and 2 have been duplicated. So something like this "found duplicate skus [Duplicated skus]."

the xml is like this:

<Root>
  <Context>
  <ID>1234</ID> 
<Item>
  <ID>4567</ID> 
  </Item>
<Item>
  <ID>4567</ID> 
</Item>
<Item>
  <ID>5678</ID> 
</Item>

#Context Items that will produce duplicates. 
$context = ['a','b','c']

#Class that will search through an array to find duplicates
class Array
  def only_duplicates
  duplicates = []
  self.each {|each| duplicates << each if self.count(each) > 1}
  duplicates
 end
end

#loops through each item in the $context array
 $context.each do |item|
 puts "C_ItemID = " + item
 #Creates a url string using the context item
url = "url to the call"
#Creates a xml doc
doc = Nokogiri::XML(open(url))
    #Declare a blank array that the text from the node will be stored in
    values = []
    #loops through each item_id node to find duplicates. 
    doc.xpath('//item/id').each do |node|
        values << node.text
        @values = values.to_a
        if @values.only_duplicates.count > 1
            puts "found duplicate" + @values.only_duplicates.inspect
        end
    end
end

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

唐婉 2025-01-09 03:35:46

downloaded_from_url = "<Root><Context><ID>1234</ID><Item><ID>4567</ID></Item><Item><ID>4567</ID></Item><Item><ID>5678</ID></Item><Item><ID>5678</ID></Item>"
parsed_xml_document = Nokogiri::XML(downloaded_from_url)

list_of_item_ids    = parsed_xml_document.xpath("//Item/ID").map { |x| x.text }

def find_duplicate_items( in_collection=[] )
  in_collection.select do |item|
    in_collection.count(item) > 1
  end.uniq
end

duplicate_item_ids  = find_duplicate_items( list_of_item_ids )
#=> ["4567", "5678"]

查找重复项的更快方法（来源：Ryan LeCompte）。稍作修改&较短的版本。

def fast_find_duplicate_items ( in_collection=[] )
  collection.group_by do |element|
    element
  end.select do |key, value|
    value.size > 1
  end.keys
end

downloaded_from_url = "<Root><Context><ID>1234</ID><Item><ID>4567</ID></Item><Item><ID>4567</ID></Item><Item><ID>5678</ID></Item><Item><ID>5678</ID></Item>"
parsed_xml_document = Nokogiri::XML(downloaded_from_url)

list_of_item_ids    = parsed_xml_document.xpath("//Item/ID").map { |x| x.text }

def find_duplicate_items( in_collection=[] )
  in_collection.select do |item|
    in_collection.count(item) > 1
  end.uniq
end

duplicate_item_ids  = find_duplicate_items( list_of_item_ids )
#=> ["4567", "5678"]

A faster way to find duplicates(Credits: Ryan LeCompte). A slightly modified & shorter version.

def fast_find_duplicate_items ( in_collection=[] )
  collection.group_by do |element|
    element
  end.select do |key, value|
    value.size > 1
  end.keys
end

回复收藏 0 原文

颜 2025-01-09 03:35:46

这是在数组中查找重复项的更有效方法（使用 #count 会使算法变慢，因为它必须为每个项目遍历整个数组，O(N^2)）：

list_of_item_ids.group_by { |e| e }.select { |k,v| v.size > 1 }.map(&:first)

Here's a more efficient way to find duplicates in an array (using #count makes the algorithm slower because it has to traverse the whole array for each item, O(N^2)):

list_of_item_ids.group_by { |e| e }.select { |k,v| v.size > 1 }.map(&:first)

回复收藏 0 原文

~没有更多了~

关于作者

可可

暂无简介

文章

26 人气

关注发私信

牛↙奶布丁

文章 0 评论 0

关注

COSO

文章 0 评论 0

关注

落叶

文章 0 评论 0

关注

暗地喜欢

文章 0 评论 0

关注

qq_i8qOEG

文章 0 评论 0

关注

qq_Wl4Sbi

文章 0 评论 0

友情链接

文江博客

查找 xml 页面中的重复项

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

查找 xml 页面中的重复项

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

牛↙奶布丁

COSO

落叶

暗地喜欢

qq_i8qOEG

qq_Wl4Sbi

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。